Merge branch 'audio' into io

0550d092 · Hui Zhang · eead92af · 56eb1f0e · 0550d092 · 0550d092
638 changed file
--- a/.gitignore
+++ b/.gitignore
@@ -39,6 +39,9 @@ tools/env.sh
 tools/openfst-1.8.1/
 tools/libsndfile/
 tools/python-soundfile/
+tools/onnx
+tools/onnxruntime
+tools/Paddle2ONNX

 speechx/fc_patch/


--- a/.mergify.yml
+++ b/.mergify.yml
@@ -52,7 +52,7 @@ pull_request_rules:
        add: ["T2S"]
  - name: "auto add label=Audio"
    conditions:
-      - files~=^paddleaudio/
+      - files~=^paddlespeech/audio/
    actions:
      label:
        add: ["Audio"]
@@ -100,7 +100,7 @@ pull_request_rules:
        add: ["README"]
  - name: "auto add label=Documentation"
    conditions:
-      - files~=^(docs/|CHANGELOG.md|paddleaudio/CHANGELOG.md)
+      - files~=^(docs/|CHANGELOG.md)
    actions:
      label:
        add: ["Documentation"]

--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -51,12 +51,12 @@ repos:
        language: system
        files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|cuh|proto)$
        exclude: (?=speechx/speechx/kaldi|speechx/patch|speechx/tools/fstbin|speechx/tools/lmbin).*(\.cpp|\.cc|\.h|\.py)$
-    -   id: copyright_checker
-        name: copyright_checker
-        entry: python .pre-commit-hooks/copyright-check.hook
-        language: system
-        files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|proto|py)$
-        exclude: (?=third_party|pypinyin|speechx/speechx/kaldi|speechx/patch|speechx/tools/fstbin|speechx/tools/lmbin).*(\.cpp|\.cc|\.h|\.py)$
+    #-   id: copyright_checker
+    #    name: copyright_checker
+    #    entry: python .pre-commit-hooks/copyright-check.hook
+    #    language: system
+    #    files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|proto|py)$
+    #    exclude: (?=third_party|pypinyin|speechx/speechx/kaldi|speechx/patch|speechx/tools/fstbin|speechx/tools/lmbin).*(\.cpp|\.cc|\.h|\.py)$
 -   repo: https://github.com/asottile/reorder_python_imports
    rev: v2.4.0
    hooks:

--- a/.pre-commit-hooks/copyright-check.hook
+++ b/.pre-commit-hooks/copyright-check.hook
@@ -19,7 +19,7 @@ import subprocess
 import platform

 COPYRIGHT = '''
-Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.

 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.

--- a/README.md
+++ b/README.md
 ([简体中文](./README_cn.md)|English)
-
 <p align="center">
  <img src="./docs/images/PaddleSpeech_logo.png" />
 </p>
-<div align="center">  
-
-  <h3>
-  <a href="#quick-start"> Quick Start </a>
-  | <a href="#quick-start-server"> Quick Start Server </a>
-  | <a href="#documents"> Documents </a>
-  | <a href="#model-list"> Models List </a>
-</div>
-
------------------------------------------------------------------------------------
-

 <p align="center">
    <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-red.svg"></a>
@@ -28,7 +16,20 @@
    <a href="=https://pypi.org/project/paddlespeech/"><img src="https://static.pepy.tech/badge/paddlespeech"></a>
    <a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
 </p>
+<div align="center">  
+<h4>
+    <a href="#quick-start"> Quick Start </a>
+  | <a href="#quick-start-server"> Quick Start Server </a>
+  | <a href="#quick-start-streaming-server"> Quick Start Streaming Server</a>
+  | <a href="#documents"> Documents </a>
+  | <a href="#model-list"> Models List </a>
+  | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio Courses </a>
+  | <a href="https://arxiv.org/abs/2205.12007"> Paper </a>
+  | <a href="https://gitee.com/paddlepaddle/PaddleSpeech"> Gitee </a>
+</h4>
+</div>

+------------------------------------------------------------------------------------

 **PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.

@@ -142,53 +143,35 @@ For more synthesized audios, please refer to [PaddleSpeech Text-to-Speech sample

 </div>

-### ⭐ Examples
- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): Use PaddleSpeech TTS to generate virtual human voice.**
-  
-<div align="center"><a href="https://www.bilibili.com/video/BV1cL411V71o?share_source=copy_web"><img src="https://ai-studio-static-online.cdn.bcebos.com/06fd746ab32042f398fb6f33f873e6869e846fe63c214596ae37860fe8103720" / width="500px"></a></div>
-
- [PaddleSpeech Demo Video](https://paddlespeech.readthedocs.io/en/latest/demo_video.html)
-
- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): Use PaddleSpeech TTS and ASR to clone voice from videos.**
-
-<div align="center">
-<img src="https://raw.githubusercontent.com/jerryuhoo/VTuberTalk/main/gui/gui.png"  width = "500px"  />
-</div>
-
-### 🔥 Hot Activities
-
- 2021.12.21~12.24
-
-  4 Days Live Courses: Depth interpretation of PaddleSpeech!
-
-  **Courses videos and related materials: https://aistudio.baidu.com/aistudio/education/group/info/25130**

 ### Features

 Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at:
- 📦  **Ease of Use**: low barriers to install, and [CLI](#quick-start) is available to quick-start your journey.
+- 📦  **Ease of Use**: low barriers to install, [CLI](#quick-start), [Server](#quick-start-server), and [Streaming Server](#quick-start-streaming-server) is available to quick-start your journey.
 - 🏆  **Align to the State-of-the-Art**: we provide high-speed and ultra-lightweight models, and also cutting-edge technology. 
+- 🏆  **Streaming ASR and TTS System**: we provide production ready streaming asr and streaming tts system.
 - 💯  **Rule-based Chinese frontend**: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.
- **Varieties of Functions that Vitalize both Industrial and Academia**:
-  - 🛎️  *Implementation of critical audio tasks*: this toolkit contains audio functions like  Audio Classification, Speech Translation, Automatic Speech Recognition, Text-to-Speech Synthesis, etc.
+- 📦  **Varieties of Functions that Vitalize both Industrial and Academia**:
+  - 🛎️  *Implementation of critical audio tasks*: this toolkit contains audio functions like  Automatic Speech Recognition, Text-to-Speech Synthesis, Speaker Verfication, KeyWord Spotting, Audio Classification, and Speech Translation, etc.
  - 🔬  *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model list](#model-list) for more details.
  - 🧩  *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

 ### Recent Update
+- 👑 2022.05.13: Release [PP-ASR](./docs/source/asr/PPASR.md)、[PP-TTS](./docs/source/tts/PPTTS.md)、[PP-VPR](docs/source/vpr/PPVPR.md)
+- 👏🏻  2022.05.06: `Streaming ASR` with `Punctuation Restoration` and `Token Timestamp`.
+- 👏🏻  2022.05.06: `Server` is available for `Speaker Verification`, and `Punctuation Restoration`.
+- 👏🏻  2022.04.28: `Streaming Server` is available for `Automatic Speech Recognition` and `Text-to-Speech`.
+- 👏🏻  2022.03.28: `Server` is available for `Audio Classification`, `Automatic Speech Recognition` and `Text-to-Speech`.
+- 👏🏻  2022.03.28: `CLI` is available for `Speaker Verification`.
+- 🤗  2021.12.14: [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
+- 👏🏻  2021.12.10: `CLI` is available for `Audio Classification`, `Automatic Speech Recognition`, `Speech Translation (English to Chinese)` and `Text-to-Speech`.

-<!---
-2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
--->
- 👏🏻  2022.03.28: PaddleSpeech Server is available for Audio Classification, Automatic Speech Recognition and Text-to-Speech.
- 👏🏻  2022.03.28: PaddleSpeech CLI is available for Speaker Verification.
- 🤗  2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
- 👏🏻  2021.12.10: PaddleSpeech CLI is available for Audio Classification, Automatic Speech Recognition, Speech Translation (English to Chinese) and Text-to-Speech.

 ### Community
- Scan the QR code below with your Wechat (reply【语音】after your friend's application is approved), you can access to official technical exchange group. Look forward to your participation.
+- Scan the QR code below with your Wechat, you can access to official technical exchange group and get the bonus ( more than 20GB learning materials, such as papers, codes and videos ) and the live link of the lessons. Look forward to your participation.

 <div align="center">
-<img src="https://raw.githubusercontent.com/yt605155624/lanceTest/main/images/wechat_4.jpg"  width = "300"  />
+<img src="https://user-images.githubusercontent.com/23690325/169763015-cbd8e28d-602c-4723-810d-dbc6da49441e.jpg"  width = "200"  />
 </div>

 ## Installation
@@ -196,6 +179,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
 We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7*.
 Up to now, **Linux** supports CLI for the all our tasks, **Mac OSX** and **Windows** only supports PaddleSpeech CLI for Audio Classification, Speech-to-Text and Text-to-Speech. To install `PaddleSpeech`, please see [installation](./docs/source/install.md).

+
 <a name="quickstart"></a>
 ## Quick Start

@@ -257,16 +241,19 @@ If you want to try more functions like training and tuning, please have a look a
 Developers can have a try of our speech server with [PaddleSpeech Server Command Line](./paddlespeech/server/README.md).

 **Start server**     
+
 ```shell
 paddlespeech_server start --config_file ./paddlespeech/server/conf/application.yaml
 ```

 **Access Speech Recognition Services**     
+
 ```shell
 paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
 ```

 **Access Text to Speech Services**     
+
 ```shell
 paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
 ```
@@ -280,6 +267,37 @@ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav
 For more information about server command lines, please see: [speech server demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)


+<a name="quickstartstreamingserver"></a>
+## Quick Start Streaming Server
+
+Developers can have a try of  [streaming asr](./demos/streaming_asr_server/README.md) and [streaming tts](./demos/streaming_tts_server/README.md) server.
+
+**Start Streaming Speech Recognition Server**
+
+```
+paddlespeech_server start --config_file ./demos/streaming_asr_server/conf/application.yaml
+```
+
+**Access Streaming Speech Recognition Services**     
+
+```
+paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
+```
+
+**Start Streaming Text to Speech  Server**
+
+```
+paddlespeech_server start --config_file ./demos/streaming_tts_server/conf/tts_online_application.yaml
+```
+
+**Access Streaming Text to Speech Services**     
+
+```
+paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
+```
+
+For more information please see:  [streaming asr](./demos/streaming_asr_server/README.md) and [streaming tts](./demos/streaming_tts_server/README.md) 
+
 <a name="ModelList"></a>

 ## Model List
@@ -296,7 +314,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
      <th>Speech-to-Text Module Type</th>
      <th>Dataset</th>
      <th>Model Type</th>
-      <th>Link</th>
+      <th>Example</th>
    </tr>
  </thead>
  <tbody>
@@ -371,7 +389,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
      <th> Text-to-Speech Module Type </th>
      <th> Model Type </th>
      <th> Dataset </th>
-      <th> Link </th>
+      <th> Example </th>
    </tr>
  </thead>
  <tbody>
@@ -489,7 +507,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
      <th> Task </th>
      <th> Dataset </th>
      <th> Model Type </th>
-      <th> Link </th>
+      <th> Example </th>
    </tr>
  </thead>
  <tbody>
@@ -514,7 +532,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
      <th> Task </th>
      <th> Dataset </th>
      <th> Model Type </th>
-      <th> Link </th>
+      <th> Example </th>
    </tr>
  </thead>
  <tbody>
@@ -539,7 +557,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
      <th> Task </th>
      <th> Dataset </th>
      <th> Model Type </th>
-      <th> Link </th>
+      <th> Example </th>
    </tr>
  </thead>
  <tbody>
@@ -589,6 +607,21 @@ Normally, [Speech SoTA](https://paperswithcode.com/area/speech), [Audio SoTA](ht

 The Text-to-Speech module is originally called [Parakeet](https://github.com/PaddlePaddle/Parakeet), and now merged with this repository. If you are interested in academic research about this task, please see [TTS research overview](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview). Also, [this document](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/tts/models_introduction.md) is a good guideline for the pipeline components.

+
+## ⭐ Examples
+- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): Use PaddleSpeech TTS to generate virtual human voice.**
+  
+<div align="center"><a href="https://www.bilibili.com/video/BV1cL411V71o?share_source=copy_web"><img src="https://ai-studio-static-online.cdn.bcebos.com/06fd746ab32042f398fb6f33f873e6869e846fe63c214596ae37860fe8103720" / width="500px"></a></div>
+
+- [PaddleSpeech Demo Video](https://paddlespeech.readthedocs.io/en/latest/demo_video.html)
+
+- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): Use PaddleSpeech TTS and ASR to clone voice from videos.**
+
+<div align="center">
+<img src="https://raw.githubusercontent.com/jerryuhoo/VTuberTalk/main/gui/gui.png"  width = "500px"  />
+</div>
+
+
 ## Citation

 To cite PaddleSpeech for research, please use the following format.
@@ -655,7 +688,6 @@ You are warmly welcome to submit questions in [discussions](https://github.com/P

 ## Acknowledgement

-
 - Many thanks to [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) for years of attention, constructive advice and great help.
 - Many thanks to [mymagicpower](https://github.com/mymagicpower) for the Java implementation of ASR upon [short](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk) and [long](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk) audio files.
 - Many thanks to [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) for developing Virtual Uploader(VUP)/Virtual YouTuber(VTuber) with PaddleSpeech TTS function.

--- a/README_cn.md
+++ b/README_cn.md
@@ -2,34 +2,36 @@
 <p align="center">
  <img src="./docs/images/PaddleSpeech_logo.png" />
 </p>
-<div align="center">  

-  <h3>
-  <a href="#quick-start"> 快速开始 </a>
-  | <a href="#quick-start-server"> 快速使用服务 </a>
-  | <a href="#documents"> 教程文档 </a>
-  | <a href="#model-list"> 模型列表 </a>
-</div>

------------------------------------------------------------------------------------
 <p align="center">
    <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-red.svg"></a>
-    <a href="support os"><img src="https://img.shields.io/badge/os-linux-yellow.svg"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleSpeech/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleSpeech?color=ffa"></a>
+    <a href="support os"><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
    <a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
    <a href="https://github.com/PaddlePaddle/PaddleSpeech/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleSpeech?color=9ea"></a>
    <a href="https://github.com/PaddlePaddle/PaddleSpeech/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/PaddleSpeech?color=3af"></a>
    <a href="https://github.com/PaddlePaddle/PaddleSpeech/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleSpeech?color=9cc"></a>
    <a href="https://github.com/PaddlePaddle/PaddleSpeech/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleSpeech?color=ccf"></a>
+    <a href="=https://pypi.org/project/paddlespeech/"><img src="https://img.shields.io/pypi/dm/PaddleSpeech"></a>
+    <a href="=https://pypi.org/project/paddlespeech/"><img src="https://static.pepy.tech/badge/paddlespeech"></a>
    <a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
 </p>
+<div align="center">  
+<h4>
+    <a href="#快速开始"> 快速开始 </a>
+  | <a href="#快速使用服务"> 快速使用服务 </a>
+  | <a href="#快速使用流式服务"> 快速使用流式服务 </a>
+  | <a href="#教程文档"> 教程文档 </a>
+  | <a href="#模型列表"> 模型列表 </a>
+  | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio 课程 </a>
+  | <a href="https://arxiv.org/abs/2205.12007"> 论文 </a>
+  | <a href="https://gitee.com/paddlepaddle/PaddleSpeech"> Gitee 
+</h4>
+</div>
+

-<!---
-from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readmes-readable.md
-1.What is this repo or project? (You can reuse the repo description you used earlier because this section doesn’t have to be long.)
-2.How does it work?
-3.Who will use this repo or project?
-4.What is the goal of this project?
-->
+------------------------------------------------------------------------------------

 **PaddleSpeech** 是基于飞桨 [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) 的语音方向的开源模型库，用于语音和音频中的各种关键任务的开发，包含大量基于深度学习前沿和有影响力的模型，一些典型的应用示例如下：
 ##### 语音识别
@@ -57,7 +59,6 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
      </td>
      <td>我认为跑步最重要的就是给我带来了身体健康。</td>
    </tr>
-    
  </tbody>
 </table>

@@ -143,53 +144,37 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme

 </div>

-### ⭐ 应用案例
- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): 使用 PaddleSpeech 的语音合成模块生成虚拟人的声音。**
-  
-<div align="center"><a href="https://www.bilibili.com/video/BV1cL411V71o?share_source=copy_web"><img src="https://ai-studio-static-online.cdn.bcebos.com/06fd746ab32042f398fb6f33f873e6869e846fe63c214596ae37860fe8103720" / width="500px"></a></div>
-
- [PaddleSpeech 示例视频](https://paddlespeech.readthedocs.io/en/latest/demo_video.html)
-
-
- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): 使用 PaddleSpeech 的语音合成和语音识别从视频中克隆人声。**

-<div align="center">
-<img src="https://raw.githubusercontent.com/jerryuhoo/VTuberTalk/main/gui/gui.png"  width = "500px"  />
-</div>
-
-### 🔥 热门活动
-
- 2021.12.21~12.24
-
-  4 日直播课: 深度解读 PaddleSpeech 语音技术!
-
-  **直播回放与课件资料: https://aistudio.baidu.com/aistudio/education/group/info/25130**
 ### 特性

 本项目采用了易用、高效、灵活以及可扩展的实现，旨在为工业应用、学术研究提供更好的支持，实现的功能包含训练、推断以及测试模块，以及部署过程，主要包括
 - 📦 **易用性**: 安装门槛低，可使用 [CLI](#quick-start) 快速开始。
 - 🏆 **对标 SoTA**: 提供了高速、轻量级模型，且借鉴了最前沿的技术。
+- 🏆 **流式ASR和TTS系统**：工业级的端到端流式识别、流式合成系统。
 - 💯 **基于规则的中文前端**: 我们的前端包含文本正则化和字音转换（G2P）。此外，我们使用自定义语言规则来适应中文语境。
 - **多种工业界以及学术界主流功能支持**:
-  - 🛎️ 典型音频任务: 本工具包提供了音频任务如音频分类、语音翻译、自动语音识别、文本转语音、语音合成等任务的实现。
+  - 🛎️ 典型音频任务: 本工具包提供了音频任务如音频分类、语音翻译、自动语音识别、文本转语音、语音合成、声纹识别、KWS等任务的实现。
  - 🔬 主流模型及数据集: 本工具包实现了参与整条语音任务流水线的各个模块，并且采用了主流数据集如 LibriSpeech、LJSpeech、AIShell、CSMSC，详情请见 [模型列表](#model-list)。
  - 🧩 级联模型应用: 作为传统语音任务的扩展，我们结合了自然语言处理、计算机视觉等任务，实现更接近实际需求的产业级应用。

+
 ### 近期更新
+- 👑 2022.05.13: PaddleSpeech 发布 [PP-ASR](./docs/source/asr/PPASR_cn.md) 流式语音识别系统、[PP-TTS](./docs/source/tts/PPTTS_cn.md) 流式语音合成系统、[PP-VPR](docs/source/vpr/PPVPR_cn.md) 全链路声纹识别系统
+- 👏🏻 2022.05.06: PaddleSpeech Streaming Server 上线! 覆盖了语音识别（标点恢复、时间戳），和语音合成。
+- 👏🏻 2022.05.06: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、语音合成、声纹识别，标点恢复。
+- 👏🏻 2022.03.28: PaddleSpeech CLI 覆盖声音分类、语音识别、语音翻译（英译中）、语音合成，声纹验证。
+- 🤗 2021.12.14: PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!

-<!---
-2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
--->
- 👏🏻 2022.03.28: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、以及语音合成。
- 👏🏻 2022.03.28: PaddleSpeech CLI 上线声纹验证。
- 🤗  2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
- 👏🏻 2021.12.10: PaddleSpeech CLI 上线！覆盖了声音分类、语音识别、语音翻译（英译中）以及语音合成。

-### 技术交流群
-微信扫描二维码（好友申请通过后回复【语音】）加入官方交流群，获得更高效的问题答疑，与各行各业开发者充分交流，期待您的加入。
+ ### 🔥 加入技术交流群获取入群福利
+
+ - 3 日直播课链接: 深度解读 PP-TTS、PP-ASR、PP-VPR 三项核心语音系统关键技术
+ - 20G 学习大礼包：视频课程、前沿论文与学习资料
+  
+微信扫描二维码关注公众号，点击“马上报名”填写问卷加入官方交流群，获得更高效的问题答疑，与各行各业开发者充分交流，期待您的加入。

 <div align="center">
-<img src="https://raw.githubusercontent.com/yt605155624/lanceTest/main/images/wechat_4.jpg"  width = "300"  />
+<img src="https://user-images.githubusercontent.com/23690325/169763015-cbd8e28d-602c-4723-810d-dbc6da49441e.jpg"  width = "200"  />
 </div>

 ## 安装
@@ -197,6 +182,7 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
 我们强烈建议用户在 **Linux** 环境下，*3.7* 以上版本的 *python* 上安装 PaddleSpeech。
 目前为止，**Linux** 支持声音分类、语音识别、语音合成和语音翻译四种功能，**Mac OSX、 Windows** 下暂不支持语音翻译功能。 想了解具体安装细节，可以参考[安装文档](./docs/source/install_cn.md)。

+<a name="快速开始"></a>
 ## 快速开始

 安装完成后，开发者可以通过命令行快速开始，改变 `--input` 可以尝试用自己的音频或文本测试。
@@ -243,7 +229,7 @@ paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
 更多命令行命令请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos)
 > Note: 如果需要训练或者微调，请查看[语音识别](./docs/source/asr/quick_start.md)， [语音合成](./docs/source/tts/quick_start.md)。

-
+<a name="快速使用服务"></a>
 ## 快速使用服务
 安装完成后，开发者可以通过命令行快速使用服务。

@@ -269,7 +255,38 @@ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav

 更多服务相关的命令行使用信息，请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)

+<a name="快速使用流式服务"></a>
+## 快速使用流式服务
+
+开发者可以尝试 [流式 ASR](./demos/streaming_asr_server/README.md) 和 [流式 TTS](./demos/streaming_tts_server/README.md) 服务.
+
+**启动流式 ASR 服务**
+
+```
+paddlespeech_server start --config_file ./demos/streaming_asr_server/conf/application.yaml
+```
+
+**访问流式 ASR 服务**     
+
+```
+paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
+```
+
+**启动流式 TTS 服务**
+
+```
+paddlespeech_server start --config_file ./demos/streaming_tts_server/conf/tts_online_application.yaml
+```
+
+**访问流式 TTS 服务**     
+
+```
+paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
+```
+
+更多信息参看： [流式 ASR](./demos/streaming_asr_server/README.md) 和 [流式 TTS](./demos/streaming_tts_server/README.md) 

+<a name="模型列表"></a>
 ## 模型列表
 PaddleSpeech 支持很多主流的模型，并提供了预训练模型，详情请见[模型列表](./docs/source/released_model.md)。

@@ -282,8 +299,8 @@ PaddleSpeech 的 **语音转文本** 包含语音识别声学模型、语音识
    <tr>
      <th>语音转文本模块类型</th>
      <th>数据集</th>
-      <th>模型种类</th>
-      <th>链接</th>
+      <th>模型类型</th>
+      <th>脚本</th>
    </tr>
  </thead>
  <tbody>
@@ -356,9 +373,9 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
  <thead>
    <tr>
      <th> 语音合成模块类型 </th>
-      <th> 模型种类 </th>
+      <th> 模型类型 </th>
      <th> 数据集  </th>
-      <th> 链接  </th>
+      <th> 脚本  </th>
    </tr>
  </thead>
  <tbody>
@@ -474,8 +491,8 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
    <tr>
      <th> 任务 </th>
      <th> 数据集 </th>
-      <th> 模型种类 </th>
-      <th> 链接</th>
+      <th> 模型类型 </th>
+      <th> 脚本</th>
    </tr>
  </thead>
  <tbody>
@@ -498,10 +515,10 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
 <table style="width:100%">
  <thead>
    <tr>
-      <th> Task </th>
-      <th> Dataset </th>
-      <th> Model Type </th>
-      <th> Link </th>
+      <th> 任务 </th>
+      <th> 数据集 </th>
+      <th> 模型类型 </th>
+      <th> 脚本 </th>
    </tr>
  </thead>
  <tbody>
@@ -525,8 +542,8 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
    <tr>
      <th> 任务 </th>
      <th> 数据集 </th>
-      <th> 模型种类 </th>
-      <th> 链接 </th>
+      <th> 模型类型 </th>
+      <th> 脚本 </th>
    </tr>
  </thead>
  <tbody>
@@ -541,6 +558,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
  </tbody>
 </table>

+<a name="教程文档"></a>
 ## 教程文档

 对于 PaddleSpeech 的所关注的任务，以下指南有助于帮助开发者快速入门，了解语音相关核心思想。
@@ -582,6 +600,21 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声

 语音合成模块最初被称为 [Parakeet](https://github.com/PaddlePaddle/Parakeet)，现在与此仓库合并。如果您对该任务的学术研究感兴趣，请参阅 [TTS 研究概述](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview)。此外，[模型介绍](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/tts/models_introduction.md) 是了解语音合成流程的一个很好的指南。

+## ⭐ 应用案例
+- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): 使用 PaddleSpeech 的语音合成模块生成虚拟人的声音。**
+  
+<div align="center"><a href="https://www.bilibili.com/video/BV1cL411V71o?share_source=copy_web"><img src="https://ai-studio-static-online.cdn.bcebos.com/06fd746ab32042f398fb6f33f873e6869e846fe63c214596ae37860fe8103720" / width="500px"></a></div>
+
+- [PaddleSpeech 示例视频](https://paddlespeech.readthedocs.io/en/latest/demo_video.html)
+
+
+- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): 使用 PaddleSpeech 的语音合成和语音识别从视频中克隆人声。**
+
+<div align="center">
+<img src="https://raw.githubusercontent.com/jerryuhoo/VTuberTalk/main/gui/gui.png"  width = "500px"  />
+</div>
+
+
 ## 引用

 要引用 PaddleSpeech 进行研究，请使用以下格式进行引用。
@@ -607,7 +640,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
 <a name="欢迎贡献"></a>
 ## 参与 PaddleSpeech 的开发

-热烈欢迎您在[Discussions](https://github.com/PaddlePaddle/PaddleSpeech/discussions) 中提交问题，并在[Issues](https://github.com/PaddlePaddle/PaddleSpeech/issues) 中指出发现的 bug。此外，我们非常希望您参与到 PaddleSpeech 的开发中！
+热烈欢迎您在 [Discussions](https://github.com/PaddlePaddle/PaddleSpeech/discussions) 中提交问题，并在 [Issues](https://github.com/PaddlePaddle/PaddleSpeech/issues) 中指出发现的 bug。此外，我们非常希望您参与到 PaddleSpeech 的开发中！

 ### 贡献者
 <p align="center">
@@ -658,6 +691,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
 - 非常感谢 [jerryuhoo](https://github.com/jerryuhoo)/[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk) 基于 PaddleSpeech 的 TTS GUI 界面和基于 ASR 制作数据集的相关代码。

  
+
 此外，PaddleSpeech 依赖于许多开源存储库。有关更多信息，请参阅 [references](./docs/source/reference.md)。

 ## License

--- a/audio/.gitignore
+++ b/audio/.gitignore
-.eggs
-*.wav
--- a/audio/CHANGELOG.md
+++ b/audio/CHANGELOG.md
-# Changelog
-
-Date: 2022-3-15, Author: Xiaojie Chen.
-  - kaldi and librosa mfcc, fbank, spectrogram.
-  - unit test and benchmark.
-
-Date: 2022-2-25, Author: Hui Zhang.
-  - Refactor architecture.
-  - dtw distance and mcd style dtw.
--- a/audio/README.md
+++ b/audio/README.md
-# PaddleAudio
-
-PaddleAudio is an audio library for PaddlePaddle.
-
-## Install
-
-`pip install .`
--- a/audio/docs/Makefile
+++ b/audio/docs/Makefile
-# Minimal makefile for Sphinx documentation
-#
-
-# You can set these variables from the command line.
-SPHINXOPTS    =
-SPHINXBUILD   = sphinx-build
-SOURCEDIR     = source
-BUILDDIR      = build
-
-# Put it first so that "make" without argument is like "make help".
-help:
-	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
-
-.PHONY: help Makefile
-
-# Catch-all target: route all unknown targets to Sphinx using the new
-# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
-%: Makefile
-	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
\ No newline at end of file
--- a/audio/docs/README.md
+++ b/audio/docs/README.md
-# Build docs for PaddleAudio
-
-Execute the following steps in **current directory**.
-
-## 1. Install
-
-`pip install Sphinx sphinx_rtd_theme`
-
-
-## 2. Generate API docs
-
-Generate API docs from doc string.
-
-`sphinx-apidoc -fMeT -o source ../paddleaudio ../paddleaudio/utils --templatedir source/_templates`
-
-
-## 3. Build
-
-`sphinx-build source _html`
-
-
-## 4. Preview
-
-Open `_html/index.html` for page preview.
--- a/audio/docs/images/paddle.png
+++ b/audio/docs/images/paddle.png
--- a/audio/docs/make.bat
+++ b/audio/docs/make.bat
-@ECHO OFF
-
-pushd %~dp0
-
-REM Command file for Sphinx documentation
-
-if "%SPHINXBUILD%" == "" (
-	set SPHINXBUILD=sphinx-build
-)
-set SOURCEDIR=source
-set BUILDDIR=build
-
-if "%1" == "" goto help
-
-%SPHINXBUILD% >NUL 2>NUL
-if errorlevel 9009 (
-	echo.
-	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
-	echo.installed, then set the SPHINXBUILD environment variable to point
-	echo.to the full path of the 'sphinx-build' executable. Alternatively you
-	echo.may add the Sphinx directory to PATH.
-	echo.
-	echo.If you don't have Sphinx installed, grab it from
-	echo.http://sphinx-doc.org/
-	exit /b 1
-)
-
-%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
-goto end
-
-:help
-%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
-
-:end
-popd
--- a/audio/paddleaudio/metric/dtw.py
+++ b/audio/paddleaudio/metric/dtw.py
-# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import numpy as np
-from dtaidistance import dtw_ndim
-
-__all__ = [
-    'dtw_distance',
-]
-
-
-def dtw_distance(xs: np.ndarray, ys: np.ndarray) -> float:
-    """Dynamic Time Warping.
-    This function keeps a compact matrix, not the full warping paths matrix.
-    Uses dynamic programming to compute:
-
-    Examples:
-        .. code-block:: python
-
-            wps[i, j] = (s1[i]-s2[j])**2 + min(
-                            wps[i-1, j  ] + penalty,  // vertical   / insertion / expansion
-                            wps[i  , j-1] + penalty,  // horizontal / deletion  / compression
-                            wps[i-1, j-1])            // diagonal   / match
-
-            dtw = sqrt(wps[-1, -1])
-
-    Args:
-        xs (np.ndarray): ref sequence, [T,D]
-        ys (np.ndarray): hyp sequence, [T,D]
-
-    Returns:
-        float: dtw distance
-    """
-    return dtw_ndim.distance(xs, ys)
--- a/audio/paddleaudio/utils/env.py
+++ b/audio/paddleaudio/utils/env.py
-# Copyright (c) 2021  PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License"
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-'''
-This module is used to store environmental variables in PaddleAudio.
-PPAUDIO_HOME     -->  the root directory for storing PaddleAudio related data. Default to ~/.paddleaudio. Users can change the
-├                            default value through the PPAUDIO_HOME environment variable.
-├─ MODEL_HOME    -->  Store model files.
-└─ DATA_HOME     -->  Store automatically downloaded datasets.
-'''
-import os
-
-__all__ = [
-    'USER_HOME',
-    'PPAUDIO_HOME',
-    'MODEL_HOME',
-    'DATA_HOME',
-]
-
-
-def _get_user_home():
-    return os.path.expanduser('~')
-
-
-def _get_ppaudio_home():
-    if 'PPAUDIO_HOME' in os.environ:
-        home_path = os.environ['PPAUDIO_HOME']
-        if os.path.exists(home_path):
-            if os.path.isdir(home_path):
-                return home_path
-            else:
-                raise RuntimeError(
-                    'The environment variable PPAUDIO_HOME {} is not a directory.'.
-                    format(home_path))
-        else:
-            return home_path
-    return os.path.join(_get_user_home(), '.paddleaudio')
-
-
-def _get_sub_home(directory):
-    home = os.path.join(_get_ppaudio_home(), directory)
-    if not os.path.exists(home):
-        os.makedirs(home)
-    return home
-
-
-USER_HOME = _get_user_home()
-PPAUDIO_HOME = _get_ppaudio_home()
-MODEL_HOME = _get_sub_home('models')
-DATA_HOME = _get_sub_home('datasets')
--- a/audio/setup.py
+++ b/audio/setup.py
-# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import glob
-import os
-import subprocess
-
-import pybind11
-import setuptools
-from setuptools import Extension
-from setuptools.command.build_ext import build_ext
-from setuptools.command.test import test
-
-# set the version here
-VERSION = '1.0.0a'
-
-
-# Inspired by the example at https://pytest.org/latest/goodpractises.html
-class TestCommand(test):
-    def finalize_options(self):
-        test.finalize_options(self)
-        self.test_args = []
-        self.test_suite = True
-
-    def run(self):
-        self.run_benchmark()
-        super(TestCommand, self).run()
-
-    def run_tests(self):
-        # Run nose ensuring that argv simulates running nosetests directly
-        import nose
-        nose.run_exit(argv=['nosetests', '-w', 'tests'])
-
-    def run_benchmark(self):
-        for benchmark_item in glob.glob('tests/benchmark/*py'):
-            os.system(f'pytest {benchmark_item}')
-
-
-class ExtBuildCommand(build_ext):
-    def run(self):
-        try:
-            subprocess.check_output(["cmake", "--version"])
-        except OSError:
-            raise RuntimeError("CMake is not available.") from None
-        super().run()
-
-    def build_extension(self, ext):
-        extdir = os.path.abspath(
-            os.path.dirname(self.get_ext_fullpath(ext.name)))
-        cfg = "Debug" if self.debug else "Release"
-        cmake_args = [
-            f"-DCMAKE_BUILD_TYPE={cfg}",
-            f"-Dpybind11_DIR={pybind11.get_cmake_dir()}",
-            f"-DCMAKE_INSTALL_PREFIX={extdir}",
-            "-DCMAKE_VERBOSE_MAKEFILE=ON",
-            "-DBUILD_SOX:BOOL=ON",
-        ]
-        build_args = ["--target", "install"]
-
-        # Set CMAKE_BUILD_PARALLEL_LEVEL to control the parallel build level
-        # across all generators.
-        if "CMAKE_BUILD_PARALLEL_LEVEL" not in os.environ:
-            if hasattr(self, "parallel") and self.parallel:
-                build_args += ["-j{}".format(self.parallel)]
-
-        if not os.path.exists(self.build_temp):
-            os.makedirs(self.build_temp)
-
-        subprocess.check_call(
-            ["cmake", os.path.abspath(os.path.dirname(__file__))] + cmake_args,
-            cwd=self.build_temp)
-        subprocess.check_call(
-            ["cmake", "--build", "."] + build_args, cwd=self.build_temp)
-
-    def get_ext_filename(self, fullname):
-        ext_filename = super().get_ext_filename(fullname)
-        ext_filename_parts = ext_filename.split(".")
-        without_abi = ext_filename_parts[:-2] + ext_filename_parts[-1:]
-        ext_filename = ".".join(without_abi)
-        return ext_filename
-
-
-def write_version_py(filename='paddleaudio/__init__.py'):
-    with open(filename, "a") as f:
-        f.write(f"__version__ = '{VERSION}'")
-
-
-def remove_version_py(filename='paddleaudio/__init__.py'):
-    with open(filename, "r") as f:
-        lines = f.readlines()
-    with open(filename, "w") as f:
-        for line in lines:
-            if "__version__" not in line:
-                f.write(line)
-
-
-def get_ext_modules():
-    modules = [
-        Extension(name="paddleaudio._paddleaudio", sources=[]),
-    ]
-
-    return modules
-
-
-remove_version_py()
-write_version_py()
-
-setuptools.setup(
-    name="paddleaudio",
-    version=VERSION,
-    author="",
-    author_email="",
-    description="PaddleAudio, in development",
-    long_description="",
-    long_description_content_type="text/markdown",
-    url="",
-    packages=setuptools.find_packages(include=['paddleaudio*']),
-    classifiers=[
-        "Programming Language :: Python :: 3",
-        "License :: OSI Approved :: MIT License",
-        "Operating System :: OS Independent",
-    ],
-    python_requires='>=3.6',
-    install_requires=[
-        'numpy >= 1.15.0', 'scipy >= 1.0.0', 'resampy >= 0.2.2',
-        'soundfile >= 0.9.0', 'colorlog', 'dtaidistance == 2.3.1', 'pathos'
-    ],
-    extras_require={
-        'test': [
-            'nose', 'librosa==0.8.1', 'soundfile==0.10.3.post1',
-            'torchaudio==0.10.2', 'pytest-benchmark'
-        ],
-    },
-    ext_modules=get_ext_modules(),
-    cmdclass={
-        "build_ext": ExtBuildCommand,
-        'test': TestCommand,
-    }, )
-
-remove_version_py()
--- a/audio/tests/.gitkeep
+++ b/audio/tests/.gitkeep
--- a/demos/README.md
+++ b/demos/README.md
@@ -2,14 +2,14 @@

 ([简体中文](./README_cn.md)|English)

-The directory containes many speech applications in multi scenarios.
+This directory contains many speech applications in multiple scenarios.

 * audio searching - mass audio similarity retrieval
 * audio tagging - multi-label tagging of an audio file
-* automatic_video_subtitiles - generate subtitles from a video
+* automatic_video_subtitles - generate subtitles from a video
 * metaverse - 2D AR with TTS  
 * punctuation_restoration - restore punctuation from raw text
-* speech recogintion - recognize text of an audio file 
+* speech recognition - recognize text of an audio file 
 * speech server - Server for Speech Task, e.g. ASR,TTS,CLS
 * streaming asr server - receive audio stream from websocket, and recognize to transcript.
 * speech translation - end to end speech translation  

--- a/demos/audio_content_search/README.md
+++ b/demos/audio_content_search/README.md
+([简体中文](./README_cn.md)|English)
+# ACS (Audio Content Search)
+
+## Introduction
+ACS, or Audio Content Search, refers to the problem of getting the key word time stamp from automatically transcribe spoken language (speech-to-text). 
+
+This demo is an implementation of obtaining the keyword timestamp in the text from a given audio file. It can be done by a single command or a few lines in python using `PaddleSpeech`. 
+Now, the search word in demo is:
+```
+我
+康
+```
+## Usage
+### 1. Installation
+see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
+
+You can choose one way from meduim and hard to install paddlespeech.
+
+The dependency refers to the requirements.txt, and install the dependency as follows:
+
+```
+pip install -r requriement.txt 
+```
+
+### 2. Prepare Input File
+The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
+
+Here are sample files for this demo that can be downloaded:
+```bash
+wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
+```
+
+### 3. Usage
+- Command Line(Recommended)
+  ```bash
+  # Chinese
+  paddlespeech_client acs --server_ip 127.0.0.1 --port 8090 --input ./zh.wav 
+  ```
+  
+  Usage:
+  ```bash
+  paddlespeech asr --help
+  ```
+  Arguments:
+  - `input`(required): Audio file to recognize.
+  - `server_ip`: the server ip.
+  - `port`: the server port.
+  - `lang`: the language type of the model. Default: `zh`.
+  - `sample_rate`: Sample rate of the model. Default: `16000`.
+  - `audio_format`: The audio format.
+
+  Output:
+  ```bash
+  [2022-05-15 15:00:58,185] [    INFO] - acs http client start
+  [2022-05-15 15:00:58,185] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
+  [2022-05-15 15:01:03,220] [    INFO] - acs http client finished
+  [2022-05-15 15:01:03,221] [    INFO] - ACS result: {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
+  [2022-05-15 15:01:03,221] [    INFO] - Response time 5.036084 s.
+  ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import ACSClientExecutor
+
+  acs_executor = ACSClientExecutor()
+  res = acs_executor(
+      input='./zh.wav',
+      server_ip="127.0.0.1",
+      port=8490,)
+  print(res)
+  ```
+
+  Output:
+  ```bash
+  [2022-05-15 15:08:13,955] [    INFO] - acs http client start
+  [2022-05-15 15:08:13,956] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
+  [2022-05-15 15:08:19,026] [    INFO] - acs http client finished
+  {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
+  ```
--- a/demos/audio_content_search/README_cn.md
+++ b/demos/audio_content_search/README_cn.md
+(简体中文|[English](./README.md))
+
+# 语音内容搜索
+## 介绍
+语音内容搜索是一项用计算机程序获取转录语音内容关键词时间戳的技术。
+
+这个 demo 是一个从给定音频文件获取其文本中关键词时间戳的实现，它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。
+
+当前示例中检索词是
+```
+我
+康
+```
+## 使用方法
+### 1. 安装
+请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)。
+
+你可以从 medium，hard 三中方式中选择一种方式安装。
+依赖参见 requirements.txt, 安装依赖
+
+```
+pip install -r requriement.txt 
+```
+
+### 2. 准备输入
+这个 demo 的输入应该是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。
+
+可以下载此 demo 的示例音频：
+```bash
+wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
+```
+### 3. 使用方法
+- 命令行 (推荐使用)
+  ```bash
+  # 中文
+  paddlespeech_client acs --server_ip 127.0.0.1 --port 8090 --input ./zh.wav 
+  ```
+  
+  使用方法：
+  ```bash
+  paddlespeech acs --help
+  ```
+  参数：
+  - `input`(必须输入)：用于识别的音频文件。
+  - `server_ip`: 服务的ip。
+  - `port`：服务的端口。
+  - `lang`：模型语言，默认值：`zh`。
+  - `sample_rate`：音频采样率，默认值：`16000`。
+  - `audio_format`: 音频的格式。
+
+  输出：
+  ```bash
+  [2022-05-15 15:00:58,185] [    INFO] - acs http client start
+  [2022-05-15 15:00:58,185] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
+  [2022-05-15 15:01:03,220] [    INFO] - acs http client finished
+  [2022-05-15 15:01:03,221] [    INFO] - ACS result: {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
+  [2022-05-15 15:01:03,221] [    INFO] - Response time 5.036084 s.
+  ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import ACSClientExecutor
+
+  acs_executor = ACSClientExecutor()
+  res = acs_executor(
+      input='./zh.wav',
+      server_ip="127.0.0.1",
+      port=8490,)
+  print(res)
+  ```
+
+  输出：
+  ```bash
+  [2022-05-15 15:08:13,955] [    INFO] - acs http client start
+  [2022-05-15 15:08:13,956] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
+  [2022-05-15 15:08:19,026] [    INFO] - acs http client finished
+  {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
+  ```
--- a/examples/other/1xt2x/src_deepspeech2x/bin/test.py
+++ b/examples/other/1xt2x/src_deepspeech2x/bin/test.py
@@ -11,49 +11,39 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-"""Evaluation for DeepSpeech2 model."""
-from src_deepspeech2x.test_model import DeepSpeech2Tester as Tester
-from yacs.config import CfgNode
+import argparse

-from paddlespeech.s2t.training.cli import default_argument_parser
-from paddlespeech.s2t.utils.utility import print_arguments
+from paddlespeech.cli.log import logger
+from paddlespeech.server.utils.audio_handler import ASRHttpHandler


-def main_sp(config, args):
-    exp = Tester(config, args)
-    exp.setup()
-    exp.run_test()
-
-
-def main(config, args):
-    main_sp(config, args)
+def main(args):
+    logger.info("asr http client start")
+    audio_format = "wav"
+    sample_rate = 16000
+    lang = "zh"
+    handler = ASRHttpHandler(
+        server_ip=args.server_ip, port=args.port, endpoint=args.endpoint)
+    res = handler.run(args.wavfile, audio_format, sample_rate, lang)
+    # res = res['result']
+    logger.info(f"the final result: {res}")


 if __name__ == "__main__":
-    parser = default_argument_parser()
+    parser = argparse.ArgumentParser(description="audio content search client")
    parser.add_argument(
-        "--model_type", type=str, default='offline', help='offline/online')
-    # save asr result to
+        '--server_ip', type=str, default='127.0.0.1', help='server ip')
+    parser.add_argument('--port', type=int, default=8090, help='server port')
    parser.add_argument(
-        "--result_file", type=str, help="path of save the asr result")
+        "--wavfile",
+        action="store",
+        help="wav file path ",
+        default="./16_audio.wav")
+    parser.add_argument(
+        '--endpoint',
+        type=str,
+        default='/paddlespeech/asr/search',
+        help='server endpoint')
    args = parser.parse_args()
-    print_arguments(args, globals())
-    print("model_type:{}".format(args.model_type))
-
-    # https://yaml.org/type/float.html
-    config = CfgNode(new_allowed=True)
-    if args.config:
-        config.merge_from_file(args.config)
-    if args.decode_cfg:
-        decode_confs = CfgNode(new_allowed=True)
-        decode_confs.merge_from_file(args.decode_cfg)
-        config.decode = decode_confs
-    if args.opts:
-        config.merge_from_list(args.opts)
-    config.freeze()
-    print(config)
-    if args.dump_config:
-        with open(args.dump_config, 'w') as f:
-            print(config, file=f)

-    main(config, args)
+    main(args)
--- a/demos/audio_content_search/conf/acs_application.yaml
+++ b/demos/audio_content_search/conf/acs_application.yaml
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8490
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['acs_python']
+# protocol = ['http'] (only one can be selected). 
+# http only support offline engine type.
+protocol: 'http'
+engine_list: ['acs_python']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ACS #########################################
+################### acs task: engine_type: python ###############################
+acs_python:
+    task: acs
+    asr_protocol: 'websocket' # 'websocket'
+    offset: 1.0 # second
+    asr_server_ip: 127.0.0.1
+    asr_server_port: 8390
+    lang: 'zh'
+    word_list: "./conf/words.txt"
+    sample_rate: 16000
+    device: 'cpu' # set 'gpu:id' or 'cpu'
+    ping_timeout: 100 # seconds
+
+
+
+
--- a/demos/audio_content_search/conf/words.txt
+++ b/demos/audio_content_search/conf/words.txt
+我
+康
\ No newline at end of file
--- a/demos/audio_content_search/conf/ws_conformer_application.yaml
+++ b/demos/audio_content_search/conf/ws_conformer_application.yaml
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8390
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+    model_type: 'conformer_online_multicn'
+    am_model: # the pdmodel file of am static model [optional]
+    am_params:  # the pdiparams file of am static model [optional]
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: 
+    decode_method: 'attention_rescoring' 
+    force_yes: True
+    device: 'cpu' # cpu or gpu:id
+    am_predictor_conf:
+        device:  # set 'gpu:id' or 'cpu'
+        switch_ir_optim: True
+        glog_info: False  # True -> print glog
+        summary: True  # False -> do not show predictor config
+
+    chunk_buffer_conf:
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 25   # ms
+        shift_ms: 10    # ms
+        sample_rate: 16000
+        sample_width: 2
--- a/demos/streaming_asr_server/conf/ws_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_application.yaml
@@ -4,11 +4,11 @@
 #                             SERVER SETTING                                    #
 #################################################################################
 host: 0.0.0.0
-port: 8090
+port: 8390

 # The task format in the engin_list is: <speech task>_<engine type>
-# task choices = ['asr_online', 'tts_online']
-# protocol = ['websocket', 'http'] (only one can be selected).
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
 # websocket only support online engine type.
 protocol: 'websocket'
 engine_list: ['asr_online']
@@ -21,7 +21,7 @@ engine_list: ['asr_online']
 ################################### ASR #########################################
 ################### speech task: asr; engine_type: online #######################
 asr_online:
-    model_type: 'deepspeech2online_aishell'
+    model_type: 'conformer_online_wenetspeech'
    am_model: # the pdmodel file of am static model [optional]
    am_params:  # the pdiparams file of am static model [optional]
    lang: 'zh'
@@ -29,7 +29,8 @@ asr_online:
    cfg_path: 
    decode_method: 
    force_yes: True
-
+    device: 'cpu' # cpu or gpu:id
+    decode_method: "attention_rescoring"
    am_predictor_conf:
        device:  # set 'gpu:id' or 'cpu'
        switch_ir_optim: True
@@ -37,11 +38,9 @@ asr_online:
        summary: True  # False -> do not show predictor config

    chunk_buffer_conf:
-        frame_duration_ms: 80
-        shift_ms: 40
-        sample_rate: 16000
-        sample_width: 2
        window_n: 7     # frame
        shift_n: 4      # frame
-        window_ms: 20   # ms
+        window_ms: 25   # ms
        shift_ms: 10    # ms
+        sample_rate: 16000
+        sample_width: 2
--- a/demos/audio_content_search/requirements.txt
+++ b/demos/audio_content_search/requirements.txt
+websocket-client
\ No newline at end of file
--- a/demos/audio_content_search/run.sh
+++ b/demos/audio_content_search/run.sh
+export CUDA_VISIBLE_DEVICE=0,1,2,3
+# we need the streaming asr server
+nohup python3 streaming_asr_server.py --config_file conf/ws_conformer_application.yaml > streaming_asr.log  2>&1  &
+
+# start the acs server
+nohup paddlespeech_server start --config_file conf/acs_application.yaml > acs.log 2>&1 &
+
--- a/demos/audio_content_search/streaming_asr_server.py
+++ b/demos/audio_content_search/streaming_asr_server.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        prog='paddlespeech_server.start', add_help=True)
+    parser.add_argument(
+        "--config_file",
+        action="store",
+        help="yaml file of the app",
+        default=None,
+        required=True)
+
+    parser.add_argument(
+        "--log_file",
+        action="store",
+        help="log file",
+        default="./log/paddlespeech.log")
+    logger.info("start to parse the args")
+    args = parser.parse_args()
+
+    logger.info("start to launch the streaming asr server")
+    streaming_asr_server = ServerExecutor()
+    streaming_asr_server(config_file=args.config_file, log_file=args.log_file)
--- a/demos/audio_searching/README.md
+++ b/demos/audio_searching/README.md
@@ -89,7 +89,7 @@ Then to start the system server, and it provides HTTP backend services.
  Then start the server with Fastapi.

  ```bash
-  export PYTHONPATH=$PYTHONPATH:./src:../../paddleaudio
+  export PYTHONPATH=$PYTHONPATH:./src
  python src/audio_search.py
  ```


--- a/demos/audio_searching/README_cn.md
+++ b/demos/audio_searching/README_cn.md
@@ -91,7 +91,7 @@ ffce340b3790  minio/minio:RELEASE.2020-12-03T00-03-10Z  "/usr/bin/docker-ent…"
  启动用 Fastapi 构建的服务

  ```bash
-  export PYTHONPATH=$PYTHONPATH:./src:../../paddleaudio
+  export PYTHONPATH=$PYTHONPATH:./src
  python src/audio_search.py
  ```


--- a/demos/audio_searching/src/encode.py
+++ b/demos/audio_searching/src/encode.py
@@ -14,7 +14,7 @@
 import numpy as np
 from logs import LOGGER

-from paddlespeech.cli import VectorExecutor
+from paddlespeech.cli.vector import VectorExecutor

 vector_executor = VectorExecutor()


--- a/demos/audio_tagging/README.md
+++ b/demos/audio_tagging/README.md
@@ -57,7 +57,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespe
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import CLSExecutor
+  from paddlespeech.cli.cls import CLSExecutor

  cls_executor = CLSExecutor()
  result = cls_executor(

--- a/demos/audio_tagging/README_cn.md
+++ b/demos/audio_tagging/README_cn.md
@@ -57,7 +57,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespe
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import CLSExecutor
+  from paddlespeech.cli.cls import CLSExecutor

  cls_executor = CLSExecutor()
  result = cls_executor(

--- a/demos/automatic_video_subtitiles/README.md
+++ b/demos/automatic_video_subtitiles/README.md
@@ -28,7 +28,8 @@ ffmpeg -i subtitle_demo1.mp4 -ac 1 -ar 16000 -vn input.wav
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import ASRExecutor, TextExecutor
+  from paddlespeech.cli.asr import ASRExecutor
+  from paddlespeech.cli.text import TextExecutor

  asr_executor = ASRExecutor()
  text_executor = TextExecutor()

--- a/demos/automatic_video_subtitiles/README_cn.md
+++ b/demos/automatic_video_subtitiles/README_cn.md
@@ -23,7 +23,8 @@ ffmpeg -i subtitle_demo1.mp4 -ac 1 -ar 16000 -vn input.wav
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import ASRExecutor, TextExecutor
+  from paddlespeech.cli.asr import ASRExecutor
+  from paddlespeech.cli.text import TextExecutor

  asr_executor = ASRExecutor()
  text_executor = TextExecutor()

--- a/demos/automatic_video_subtitiles/recognize.py
+++ b/demos/automatic_video_subtitiles/recognize.py
@@ -16,8 +16,8 @@ import os

 import paddle

-from paddlespeech.cli import ASRExecutor
-from paddlespeech.cli import TextExecutor
+from paddlespeech.cli.asr import ASRExecutor
+from paddlespeech.cli.text import TextExecutor

 # yapf: disable
 parser = argparse.ArgumentParser(__doc__)

--- a/demos/custom_streaming_asr/README.md
+++ b/demos/custom_streaming_asr/README.md
+([简体中文](./README_cn.md)|English)
+
+# Customized Auto Speech Recognition
+
+## introduction
+
+In some cases, we need to recognize the specific rare words with high accuracy. eg: address recognition in navigation apps. customized ASR can slove those issues.
+
+this demo is customized for expense account, which need to recognize rare address.
+
+the scripts are in https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx/examples/custom_asr
+
+* G with slot: 打车到 "address_slot"。  
+![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4)
+
+* this is address slot wfst, you can add the address which want to recognize.  
+![](https://ai-studio-static-online.cdn.bcebos.com/47c89100ef8c465bac733605ffc53d76abefba33d62f4d818d351f8cea3c8fe2)
+
+* after replace operation, G = fstreplace(G_with_slot, address_slot), we will get the customized graph.  
+![](https://ai-studio-static-online.cdn.bcebos.com/60a3095293044f10b73039ab10c7950d139a6717580a44a3ba878c6e74de402b)  
+
+## Usage
+### 1. Installation
+install paddle:2.2.2 docker.
+```
+sudo docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2
+
+sudo docker run --privileged  --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash 
+```
+
+### 2. demo
+* run websocket_server.sh.  This script will download resources and libs, and launch the service.
+```
+cd /paddle
+bash websocket_server.sh
+```
+this script run in two steps:  
+1. download the resources.tar.gz, those direcotries will be found in resource directory.  
+model: acustic model  
+graph: the decoder graph (TLG.fst)  
+lib: some libs  
+bin: binary  
+data: audio and wav.scp  
+
+2. websocket_server_main launch the service.  
+some params:  
+port: the service port  
+graph_path: the decoder graph path  
+model_path: acustic model path  
+please refer other params in those files:  
+PaddleSpeech/speechx/speechx/decoder/param.h  
+PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc  
+
+* In other terminal, run script websocket_client.sh, the client will send data and get the results.
+```
+bash websocket_client.sh
+```
+websocket_client_main will launch the client, the wav_scp is the wav set, port is the server service port.
+
+* result:
+In the log of client, you will see the message below:
+```
+0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
+I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
+I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
+I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
+LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90)  the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
+```
--- a/demos/custom_streaming_asr/README_cn.md
+++ b/demos/custom_streaming_asr/README_cn.md
+(简体中文|[English](./README.md))
+
+# 定制化语音识别演示
+## 介绍
+在一些场景中，识别系统需要高精度的识别一些稀有词，例如导航软件中地名识别。而通过定制化识别可以满足这一需求。  
+
+这个 demo 是打车报销单的场景识别，需要识别一些稀有的地名，可以通过如下操作实现。
+
+相关脚本:https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx/examples/custom_asr
+
+* G with slot: 打车到 "address_slot"。  
+![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4)
+
+* 这是 address slot wfst, 可以添加一些需要识别的地名.  
+![](https://ai-studio-static-online.cdn.bcebos.com/47c89100ef8c465bac733605ffc53d76abefba33d62f4d818d351f8cea3c8fe2)
+
+* 通过 replace 操作, G = fstreplace(G_with_slot, address_slot), 最终可以得到定制化的解码图。  
+![](https://ai-studio-static-online.cdn.bcebos.com/60a3095293044f10b73039ab10c7950d139a6717580a44a3ba878c6e74de402b)  
+
+## 使用方法
+### 1. 配置环境
+安装paddle:2.2.2 docker镜像。
+```
+sudo docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2
+
+sudo docker run --privileged  --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash 
+```
+
+### 2. 演示
+* 运行如下命令，完成相关资源和库的下载和服务启动。
+```
+cd /paddle
+bash websocket_server.sh
+```
+上面脚本完成了如下两个功能：
+1. 完成 resource.tar.gz 下载，解压后,会在 resource 中发现如下目录：  
+model: 声学模型  
+graph: 解码构图  
+lib: 相关库  
+bin: 运行程序  
+data: 语音数据  
+
+2. 通过 websocket_server_main 来启动服务。
+这里简单的介绍几个参数:  
+port 是服务端口，  
+graph_path 用来指定解码图文件，  
+其他参数说明可参见代码：  
+PaddleSpeech/speechx/speechx/decoder/param.h  
+PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc  
+
+* 在另一个终端中， 通过 client 发送数据，得到结果。运行如下命令：
+```
+bash websocket_client.sh
+```
+通过 websocket_client_main 来启动 client 服务，其中 wav_scp 是发送的语音句子集合，port 为服务端口。
+
+* 结果：
+client 的 log 中可以看到如下类似的结果
+```
+0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
+I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
+I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
+I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
+LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90)  the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
+```
--- a/demos/custom_streaming_asr/path.sh
+++ b/demos/custom_streaming_asr/path.sh
+export LD_LIBRARY_PATH=$PWD/resource/lib
+export PATH=$PATH:$PWD/resource/bin
--- a/demos/custom_streaming_asr/setup_docker.sh
+++ b/demos/custom_streaming_asr/setup_docker.sh
+sudo nvidia-docker run --privileged  --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
--- a/demos/custom_streaming_asr/websocket_client.sh
+++ b/demos/custom_streaming_asr/websocket_client.sh
+#!/bin/bash
+set +x
+set -e
+
+. path.sh
+# input
+data=$PWD/data
+
+# output
+wav_scp=wav.scp
+
+export GLOG_logtostderr=1
+
+# websocket client
+websocket_client_main \
+    --wav_rspecifier=scp:$data/$wav_scp \
+    --streaming_chunk=0.36 \
+    --port=8881
--- a/demos/custom_streaming_asr/websocket_server.sh
+++ b/demos/custom_streaming_asr/websocket_server.sh
+#!/bin/bash
+set +x
+set -e
+
+export GLOG_logtostderr=1
+
+. path.sh
+#test websocket server 
+
+model_dir=./resource/model
+graph_dir=./resource/graph
+cmvn=./data/cmvn.ark
+
+
+#paddle_asr_online/resource.tar.gz
+if [ ! -f $cmvn ]; then
+    wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/resource.tar.gz
+    tar xzfv resource.tar.gz
+    ln -s ./resource/data .
+fi
+
+websocket_server_main \
+    --cmvn_file=$cmvn \
+    --streaming_chunk=0.1 \
+    --use_fbank=true \
+    --model_path=$model_dir/avg_10.jit.pdmodel \
+    --param_path=$model_dir/avg_10.jit.pdiparams \
+    --model_cache_shapes="5-1-2048,5-1-2048" \
+    --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
+    --word_symbol_table=$graph_dir/words.txt \
+    --graph_path=$graph_dir/TLG.fst --max_active=7500 \
+    --port=8881 \
+    --acoustic_scale=12 
--- a/demos/punctuation_restoration/README.md
+++ b/demos/punctuation_restoration/README.md
@@ -42,7 +42,7 @@ The input of this demo should be a text of the specific language that can be pas
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import TextExecutor
+  from paddlespeech.cli.text import TextExecutor

  text_executor = TextExecutor()
  result = text_executor(

--- a/demos/punctuation_restoration/README_cn.md
+++ b/demos/punctuation_restoration/README_cn.md
@@ -44,7 +44,7 @@
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import TextExecutor
+  from paddlespeech.cli.text import TextExecutor

  text_executor = TextExecutor()
  result = text_executor(

--- a/demos/speaker_verification/README.md
+++ b/demos/speaker_verification/README.md
@@ -14,7 +14,7 @@ see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/doc
 You can choose one way from easy, meduim and hard to install paddlespeech.

 ### 2. Prepare Input File
-The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
+The input of this cli demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.

 Here are sample files for this demo that can be downloaded:
 ```bash
@@ -53,51 +53,50 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
  Output:

  ```bash
-    demo [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
-    1.756596     5.167894    10.80636     -3.8226728   -5.6141334
-    2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
-    -9.723131     0.6619743   -6.976803    10.213478     7.494748
-    2.9105635    3.8949256    3.7999806    7.1061673   16.905321
-    -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
-    11.232214     7.1274667   -4.2828417    2.452362    -5.130748
-    -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
-    0.7618269    1.1253023   -2.083836     4.725744    -8.782597
-    -3.539873     3.814236     5.1420674    2.162061     4.096431
-    -6.4162116   12.747448     1.9429878  -15.152943     6.417416
-    16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
-    11.567354     3.69788     11.258265     7.442363     9.183411
-    4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
-    7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
-    -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
-    0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
-    -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
-    -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
-    -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
-    3.272176     2.8382776    5.134597    -9.190781    -0.5657382
-    -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
-    -0.31784213   9.493548     2.1144536    4.358092   -12.089823
-    8.451689    -7.925461     4.6242585    4.4289427   18.692003
-    -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
-    -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
-    16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
-    -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
-    0.66607     15.443222     4.740594    -3.4725387   11.592567
-    -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
-    -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
-    -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
-    -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
-    1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
-    7.3629923    0.4657332    3.132599    12.438889    -1.8337058
-    4.532936     2.7264361   10.145339    -6.521951     2.897153
-    -3.3925855    5.079156     7.759716     4.677565     5.8457737
-    2.402413     7.7071047    3.9711342   -6.390043     6.1268735
-    -3.7760346  -11.118123  ]
+    demo [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
+    -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
+    -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
+  -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
+    3.7805123    3.0597172    3.429692     8.97601     13.174125
+    -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
+    8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
+  -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
+    3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
+    0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
+    -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
+    16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
+    11.490801     4.2380238    9.550931     8.375046     7.5089145
+    -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
+    6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
+    -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
+    -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
+    -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
+    -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
+  -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
+    1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
+    -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
+    -0.42654222   8.341269     1.356552     7.0966883  -13.102829
+    8.016734    -7.1159344    1.8699781    0.208721    14.699384
+    -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
+    -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
+    11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
+    -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
+    -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
+    -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
+    0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
+    -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
+    -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
+    0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
+    5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
+    2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
+    -2.003628     2.4434285    9.973139     5.03668      2.0051203
+    2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
+    -4.070415    -6.831437  ]
  ```

 - Python API
  ```python
-  import paddle
-  from paddlespeech.cli import VectorExecutor
+  from paddlespeech.cli.vector import VectorExecutor

  vector_executor = VectorExecutor()
  audio_emb = vector_executor(
@@ -128,88 +127,88 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
  ```bash
  # Vector Result:
   Audio embedding Result:
-    [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
-    1.756596     5.167894    10.80636     -3.8226728   -5.6141334
-    2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
-    -9.723131     0.6619743   -6.976803    10.213478     7.494748
-    2.9105635    3.8949256    3.7999806    7.1061673   16.905321
-    -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
-    11.232214     7.1274667   -4.2828417    2.452362    -5.130748
-    -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
-    0.7618269    1.1253023   -2.083836     4.725744    -8.782597
-    -3.539873     3.814236     5.1420674    2.162061     4.096431
-    -6.4162116   12.747448     1.9429878  -15.152943     6.417416
-    16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
-    11.567354     3.69788     11.258265     7.442363     9.183411
-    4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
-    7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
-    -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
-    0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
-    -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
-    -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
-    -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
-    3.272176     2.8382776    5.134597    -9.190781    -0.5657382
-    -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
-    -0.31784213   9.493548     2.1144536    4.358092   -12.089823
-    8.451689    -7.925461     4.6242585    4.4289427   18.692003
-    -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
-    -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
-    16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
-    -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
-    0.66607     15.443222     4.740594    -3.4725387   11.592567
-    -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
-    -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
-    -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
-    -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
-    1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
-    7.3629923    0.4657332    3.132599    12.438889    -1.8337058
-    4.532936     2.7264361   10.145339    -6.521951     2.897153
-    -3.3925855    5.079156     7.759716     4.677565     5.8457737
-    2.402413     7.7071047    3.9711342   -6.390043     6.1268735
-    -3.7760346  -11.118123  ]
+    [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
+      -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
+      -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
+    -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
+      3.7805123    3.0597172    3.429692     8.97601     13.174125
+      -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
+      8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
+    -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
+      3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
+      0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
+      -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
+      16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
+      11.490801     4.2380238    9.550931     8.375046     7.5089145
+      -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
+      6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
+      -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
+      -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
+      -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
+      -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
+    -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
+      1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
+      -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
+      -0.42654222   8.341269     1.356552     7.0966883  -13.102829
+      8.016734    -7.1159344    1.8699781    0.208721    14.699384
+      -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
+      -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
+      11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
+      -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
+      -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
+      -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
+      0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
+      -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
+      -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
+      0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
+      5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
+      2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
+      -2.003628     2.4434285    9.973139     5.03668      2.0051203
+      2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
+      -4.070415    -6.831437  ]
    # get the test embedding
    Test embedding Result:
-    [ -1.902964     2.0690894   -8.034194     3.5472693    0.18089125
-      6.9085927    1.4097427   -1.9487704  -10.021278    -0.20755845
-      -8.04332      4.344489     2.3200977  -14.306299     5.184692
-    -11.55602     -3.8497238    0.6444722    1.2833948    2.6766639
-      0.5878921    0.7946299    1.7207596    2.5791872   14.998469
-      -1.3385371   15.031221    -0.8006958    1.99287     -9.52007
-      2.435466     4.003221    -4.33817     -4.898601    -5.304714
-    -18.033886    10.790787   -12.784645    -5.641755     2.9761686
-    -10.566622     1.4839455    6.152458    -5.7195854    2.8603241
-      6.112133     8.489869     5.5958056    1.2836679   -1.2293907
-      0.89927405   7.0288725   -2.854029    -0.9782962    5.8255906
-      14.905906    -5.025907     0.7866458   -4.2444224  -16.354029
-      10.521315     0.9604709   -3.3257897    7.144871   -13.592733
-      -8.568869    -1.7953678    0.26313916  10.916714    -6.9374123
-      1.857403    -6.2746415    2.8154466   -7.2338667   -2.293357
-      -0.05452765   5.4287076    5.0849075   -6.690375    -1.6183422
-      3.654291     0.94352573  -9.200294    -5.4749465   -3.5235846
-      1.3420814    4.240421    -2.772944    -2.8451524   16.311104
-      4.2969875   -1.762936   -12.5758915    8.595198    -0.8835239
-      -1.5708797    1.568961     1.1413603    3.5032008   -0.45251232
-      -6.786333    16.89443      5.3366146   -8.789056     0.6355629
-      3.2579517   -3.328322     7.5969577    0.66025066  -6.550468
-      -9.148656     2.020372    -0.4615173    1.1965656   -3.8764873
-      11.6562195   -6.0750933   12.182899     3.2218833    0.81969476
-      5.570001    -3.8459578   -7.205299     7.9262037   -7.6611166
-      -5.249467    -2.2671914    7.2658715  -13.298164     4.821147
-      -2.7263982   11.691089    -3.8918593   -2.838112    -1.0336838
-      -3.8034165    2.8536487   -5.60398     -1.1972581    1.3455094
-      -3.4903061    2.2408795    5.5010734   -3.970756    11.99696
-      -7.8858757    0.43160373  -5.5059714    4.3426995   16.322706
-      11.635366     0.72157705  -9.245714    -3.91465     -4.449838
-      -1.5716927    7.713747    -2.2430465   -6.198303   -13.481864
-      2.8156567   -5.7812386    5.1456156    2.7289324  -14.505571
-      13.270688     3.448231    -7.0659585    4.5886116   -4.466099
-      -0.296428   -11.463529    -2.6076477   14.110243    -6.9725137
-      -1.9962958    2.7119343   19.391657     0.01961198  14.607133
-      -1.6695905   -4.391516     1.3131028   -6.670972    -5.888604
-      12.0612335    5.9285784    3.3715196    1.492534    10.723728
-      -0.95514804 -12.085431  ]
+    [  2.5247195    5.119042    -4.335273     4.4583654    5.047907
+      3.5059214    1.6159848    0.49364898 -11.6899185   -3.1014526
+      -5.6589785   -0.42684984   2.674276   -11.937654     6.2248464
+    -10.776924    -5.694543     1.112041     1.5709964    1.0961034
+      1.3976512    2.324352     1.339981     5.279319    13.734659
+      -2.5753925   13.651442    -2.2357535    5.1575427   -3.251567
+      1.4023279    6.1191974   -6.0845175   -1.3646189   -2.6789894
+    -15.220778     9.779349    -9.411551    -6.388947     6.8313975
+      -9.245996     0.31196198   2.5509644   -4.413065     6.1649427
+      6.793837     2.6328635    8.620976     3.4832475    0.52491665
+      2.9115407    5.8392377    0.6702376   -3.2726715    2.6694255
+      16.91701     -5.5811176    0.23362345  -4.5573606  -11.801059
+      14.728292    -0.5198082   -3.999922     7.0927105   -7.0459595
+      -5.4389      -0.46420583  -5.1085467   10.376568    -8.889225
+      -0.37705845  -1.659806     2.6731026   -7.1909504    1.4608804
+      -2.163136    -0.17949677   4.0241547    0.11319201   0.601279
+      2.039692     3.1910992  -11.649526    -8.121584    -4.8707457
+      0.3851982    1.4231744   -2.3321972    0.99332285  14.121717
+      5.899413     0.7384519  -17.760096    10.555021     4.1366534
+      -0.3391071   -0.20792882   3.208204     0.8847948   -8.721497
+      -6.432868    13.006379     4.8956      -9.155822    -1.9441519
+      5.7815638   -2.066733    10.425042    -0.8802383   -2.4314315
+      -9.869258     0.35095334  -5.3549943    2.1076174   -8.290468
+      8.4433365   -4.689333     9.334139    -2.172678    -3.0250976
+      8.394216    -3.2110903   -7.93868      2.3960824   -2.3213403
+      -1.4963245   -3.476059     4.132903   -10.893354     4.362673
+      -0.45456508  10.258634    -1.1655927   -6.7799754    0.22885278
+      -4.399287     2.333433    -4.84745     -4.2752337   -1.3577863
+      -1.0685898    9.505196     7.3062205    0.08708266  12.927811
+      -9.57974      1.3936648   -1.9444873    5.776769    15.251903
+      10.6118355   -1.4903594   -9.535318    -3.6553776   -1.6699586
+      -0.5933151    7.600357    -4.8815503   -8.698617   -15.855757
+      0.25632986  -7.2235737    0.9506656    0.7128582   -9.051738
+      8.74869     -1.6426028   -6.5762258    2.506905    -6.7431564
+      5.129912   -12.189555    -3.6435068   12.068113    -6.0059533
+      -2.3535995    2.9014351   22.3082      -1.5563312   13.193291
+      2.7583609   -7.468798     1.3407065   -4.599617    -6.2345777
+      10.7689295    7.137627     5.099476     0.3473359    9.647881
+      -2.0484571   -5.8549366 ]
    # get the score between enroll and test
-    Eembeddings Score: 0.4292638301849365
+    Eembeddings Score: 0.45332613587379456
  ```

 ### 4.Pretrained Models

--- a/demos/speaker_verification/README_cn.md
+++ b/demos/speaker_verification/README_cn.md
@@ -4,16 +4,16 @@
 ## 介绍
 声纹识别是一项用计算机程序自动提取说话人特征的技术。

-这个 demo 是一个从给定音频文件提取说话人特征，它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。
+这个 demo 是从一个给定音频文件中提取说话人特征，它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。

 ## 使用方法
 ### 1. 安装
 请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)。

-你可以从 easy，medium，hard 三中方式中选择一种方式安装。
+你可以从easy medium，hard 三种方式中选择一种方式安装。

 ### 2. 准备输入
-这个 demo 的输入应该是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。
+声纹cli demo 的输入应该是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。

 可以下载此 demo 的示例音频：
 ```bash
@@ -51,51 +51,51 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav

  输出：
  ```bash
-  demo  [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
-    1.756596     5.167894    10.80636     -3.8226728   -5.6141334
-    2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
-    -9.723131     0.6619743   -6.976803    10.213478     7.494748
-    2.9105635    3.8949256    3.7999806    7.1061673   16.905321
-    -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
-    11.232214     7.1274667   -4.2828417    2.452362    -5.130748
-    -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
-    0.7618269    1.1253023   -2.083836     4.725744    -8.782597
-    -3.539873     3.814236     5.1420674    2.162061     4.096431
-    -6.4162116   12.747448     1.9429878  -15.152943     6.417416
-    16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
-    11.567354     3.69788     11.258265     7.442363     9.183411
-    4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
-    7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
-    -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
-    0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
-    -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
-    -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
-    -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
-    3.272176     2.8382776    5.134597    -9.190781    -0.5657382
-    -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
-    -0.31784213   9.493548     2.1144536    4.358092   -12.089823
-    8.451689    -7.925461     4.6242585    4.4289427   18.692003
-    -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
-    -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
-    16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
-    -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
-    0.66607     15.443222     4.740594    -3.4725387   11.592567
-    -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
-    -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
-    -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
-    -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
-    1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
-    7.3629923    0.4657332    3.132599    12.438889    -1.8337058
-    4.532936     2.7264361   10.145339    -6.521951     2.897153
-    -3.3925855    5.079156     7.759716     4.677565     5.8457737
-    2.402413     7.7071047    3.9711342   -6.390043     6.1268735
-    -3.7760346  -11.118123  ]
+    [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
+    -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
+    -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
+  -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
+    3.7805123    3.0597172    3.429692     8.97601     13.174125
+    -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
+    8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
+  -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
+    3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
+    0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
+    -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
+    16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
+    11.490801     4.2380238    9.550931     8.375046     7.5089145
+    -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
+    6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
+    -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
+    -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
+    -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
+    -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
+  -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
+    1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
+    -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
+    -0.42654222   8.341269     1.356552     7.0966883  -13.102829
+    8.016734    -7.1159344    1.8699781    0.208721    14.699384
+    -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
+    -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
+    11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
+    -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
+    -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
+    -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
+    0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
+    -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
+    -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
+    0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
+    5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
+    2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
+    -2.003628     2.4434285    9.973139     5.03668      2.0051203
+    2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
+    -4.070415    -6.831437  ]
  ```

 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import VectorExecutor
+  from paddlespeech.cli.vector import VectorExecutor

  vector_executor = VectorExecutor()
  audio_emb = vector_executor(
@@ -125,88 +125,88 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
  ```bash
  # Vector Result:
   Audio embedding Result:
-    [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
-    1.756596     5.167894    10.80636     -3.8226728   -5.6141334
-    2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
-    -9.723131     0.6619743   -6.976803    10.213478     7.494748
-    2.9105635    3.8949256    3.7999806    7.1061673   16.905321
-    -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
-    11.232214     7.1274667   -4.2828417    2.452362    -5.130748
-    -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
-    0.7618269    1.1253023   -2.083836     4.725744    -8.782597
-    -3.539873     3.814236     5.1420674    2.162061     4.096431
-    -6.4162116   12.747448     1.9429878  -15.152943     6.417416
-    16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
-    11.567354     3.69788     11.258265     7.442363     9.183411
-    4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
-    7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
-    -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
-    0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
-    -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
-    -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
-    -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
-    3.272176     2.8382776    5.134597    -9.190781    -0.5657382
-    -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
-    -0.31784213   9.493548     2.1144536    4.358092   -12.089823
-    8.451689    -7.925461     4.6242585    4.4289427   18.692003
-    -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
-    -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
-    16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
-    -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
-    0.66607     15.443222     4.740594    -3.4725387   11.592567
-    -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
-    -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
-    -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
-    -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
-    1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
-    7.3629923    0.4657332    3.132599    12.438889    -1.8337058
-    4.532936     2.7264361   10.145339    -6.521951     2.897153
-    -3.3925855    5.079156     7.759716     4.677565     5.8457737
-    2.402413     7.7071047    3.9711342   -6.390043     6.1268735
-    -3.7760346  -11.118123  ]
+    [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
+      -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
+      -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
+    -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
+      3.7805123    3.0597172    3.429692     8.97601     13.174125
+      -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
+      8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
+    -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
+      3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
+      0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
+      -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
+      16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
+      11.490801     4.2380238    9.550931     8.375046     7.5089145
+      -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
+      6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
+      -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
+      -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
+      -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
+      -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
+    -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
+      1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
+      -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
+      -0.42654222   8.341269     1.356552     7.0966883  -13.102829
+      8.016734    -7.1159344    1.8699781    0.208721    14.699384
+      -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
+      -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
+      11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
+      -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
+      -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
+      -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
+      0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
+      -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
+      -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
+      0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
+      5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
+      2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
+      -2.003628     2.4434285    9.973139     5.03668      2.0051203
+      2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
+      -4.070415    -6.831437  ]
    # get the test embedding
    Test embedding Result:
-    [ -1.902964     2.0690894   -8.034194     3.5472693    0.18089125
-      6.9085927    1.4097427   -1.9487704  -10.021278    -0.20755845
-      -8.04332      4.344489     2.3200977  -14.306299     5.184692
-    -11.55602     -3.8497238    0.6444722    1.2833948    2.6766639
-      0.5878921    0.7946299    1.7207596    2.5791872   14.998469
-      -1.3385371   15.031221    -0.8006958    1.99287     -9.52007
-      2.435466     4.003221    -4.33817     -4.898601    -5.304714
-    -18.033886    10.790787   -12.784645    -5.641755     2.9761686
-    -10.566622     1.4839455    6.152458    -5.7195854    2.8603241
-      6.112133     8.489869     5.5958056    1.2836679   -1.2293907
-      0.89927405   7.0288725   -2.854029    -0.9782962    5.8255906
-      14.905906    -5.025907     0.7866458   -4.2444224  -16.354029
-      10.521315     0.9604709   -3.3257897    7.144871   -13.592733
-      -8.568869    -1.7953678    0.26313916  10.916714    -6.9374123
-      1.857403    -6.2746415    2.8154466   -7.2338667   -2.293357
-      -0.05452765   5.4287076    5.0849075   -6.690375    -1.6183422
-      3.654291     0.94352573  -9.200294    -5.4749465   -3.5235846
-      1.3420814    4.240421    -2.772944    -2.8451524   16.311104
-      4.2969875   -1.762936   -12.5758915    8.595198    -0.8835239
-      -1.5708797    1.568961     1.1413603    3.5032008   -0.45251232
-      -6.786333    16.89443      5.3366146   -8.789056     0.6355629
-      3.2579517   -3.328322     7.5969577    0.66025066  -6.550468
-      -9.148656     2.020372    -0.4615173    1.1965656   -3.8764873
-      11.6562195   -6.0750933   12.182899     3.2218833    0.81969476
-      5.570001    -3.8459578   -7.205299     7.9262037   -7.6611166
-      -5.249467    -2.2671914    7.2658715  -13.298164     4.821147
-      -2.7263982   11.691089    -3.8918593   -2.838112    -1.0336838
-      -3.8034165    2.8536487   -5.60398     -1.1972581    1.3455094
-      -3.4903061    2.2408795    5.5010734   -3.970756    11.99696
-      -7.8858757    0.43160373  -5.5059714    4.3426995   16.322706
-      11.635366     0.72157705  -9.245714    -3.91465     -4.449838
-      -1.5716927    7.713747    -2.2430465   -6.198303   -13.481864
-      2.8156567   -5.7812386    5.1456156    2.7289324  -14.505571
-      13.270688     3.448231    -7.0659585    4.5886116   -4.466099
-      -0.296428   -11.463529    -2.6076477   14.110243    -6.9725137
-      -1.9962958    2.7119343   19.391657     0.01961198  14.607133
-      -1.6695905   -4.391516     1.3131028   -6.670972    -5.888604
-      12.0612335    5.9285784    3.3715196    1.492534    10.723728
-      -0.95514804 -12.085431  ]
+    [  2.5247195    5.119042    -4.335273     4.4583654    5.047907
+      3.5059214    1.6159848    0.49364898 -11.6899185   -3.1014526
+      -5.6589785   -0.42684984   2.674276   -11.937654     6.2248464
+    -10.776924    -5.694543     1.112041     1.5709964    1.0961034
+      1.3976512    2.324352     1.339981     5.279319    13.734659
+      -2.5753925   13.651442    -2.2357535    5.1575427   -3.251567
+      1.4023279    6.1191974   -6.0845175   -1.3646189   -2.6789894
+    -15.220778     9.779349    -9.411551    -6.388947     6.8313975
+      -9.245996     0.31196198   2.5509644   -4.413065     6.1649427
+      6.793837     2.6328635    8.620976     3.4832475    0.52491665
+      2.9115407    5.8392377    0.6702376   -3.2726715    2.6694255
+      16.91701     -5.5811176    0.23362345  -4.5573606  -11.801059
+      14.728292    -0.5198082   -3.999922     7.0927105   -7.0459595
+      -5.4389      -0.46420583  -5.1085467   10.376568    -8.889225
+      -0.37705845  -1.659806     2.6731026   -7.1909504    1.4608804
+      -2.163136    -0.17949677   4.0241547    0.11319201   0.601279
+      2.039692     3.1910992  -11.649526    -8.121584    -4.8707457
+      0.3851982    1.4231744   -2.3321972    0.99332285  14.121717
+      5.899413     0.7384519  -17.760096    10.555021     4.1366534
+      -0.3391071   -0.20792882   3.208204     0.8847948   -8.721497
+      -6.432868    13.006379     4.8956      -9.155822    -1.9441519
+      5.7815638   -2.066733    10.425042    -0.8802383   -2.4314315
+      -9.869258     0.35095334  -5.3549943    2.1076174   -8.290468
+      8.4433365   -4.689333     9.334139    -2.172678    -3.0250976
+      8.394216    -3.2110903   -7.93868      2.3960824   -2.3213403
+      -1.4963245   -3.476059     4.132903   -10.893354     4.362673
+      -0.45456508  10.258634    -1.1655927   -6.7799754    0.22885278
+      -4.399287     2.333433    -4.84745     -4.2752337   -1.3577863
+      -1.0685898    9.505196     7.3062205    0.08708266  12.927811
+      -9.57974      1.3936648   -1.9444873    5.776769    15.251903
+      10.6118355   -1.4903594   -9.535318    -3.6553776   -1.6699586
+      -0.5933151    7.600357    -4.8815503   -8.698617   -15.855757
+      0.25632986  -7.2235737    0.9506656    0.7128582   -9.051738
+      8.74869     -1.6426028   -6.5762258    2.506905    -6.7431564
+      5.129912   -12.189555    -3.6435068   12.068113    -6.0059533
+      -2.3535995    2.9014351   22.3082      -1.5563312   13.193291
+      2.7583609   -7.468798     1.3407065   -4.599617    -6.2345777
+      10.7689295    7.137627     5.099476     0.3473359    9.647881
+      -2.0484571   -5.8549366 ]
    # get the score between enroll and test
-    Eembeddings Score: 0.4292638301849365
+    Eembeddings Score: 0.45332613587379456
  ```

 ### 4.预训练模型

--- a/demos/speech_recognition/README.md
+++ b/demos/speech_recognition/README.md
@@ -24,13 +24,13 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Command Line(Recommended)
  ```bash
  # Chinese
-  paddlespeech asr --input ./zh.wav
+  paddlespeech asr --input ./zh.wav -v
  # English
-  paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav
+  paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v
  # Chinese ASR + Punctuation Restoration
-  paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
+  paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v
  ```
-  (It doesn't matter if package `paddlespeech-ctcdecoders` is not found, this package is optional.)
+  (If you don't want to see the log information, you can remove "-v". Besides, it doesn't matter if package `paddlespeech-ctcdecoders` is not found, this package is optional.)
  
  Usage:
  ```bash
@@ -45,6 +45,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
  - `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`.
  - `yes`: No additional parameters required. Once set this parameter, it means accepting the request of the program by default, which includes transforming the audio sample rate. Default: `False`.
  - `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.
+  - `verbose`: Show the log information.

  Output:
  ```bash
@@ -57,7 +58,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import ASRExecutor
+  from paddlespeech.cli.asr import ASRExecutor

  asr_executor = ASRExecutor()
  text = asr_executor(
@@ -84,8 +85,12 @@ Here is a list of pretrained models released by PaddleSpeech that can be used by

 | Model | Language | Sample Rate
 | :--- | :---: | :---: |
-| conformer_wenetspeech| zh| 16k
-| transformer_librispeech| en| 16k
+| conformer_wenetspeech | zh | 16k
+| conformer_online_multicn | zh | 16k
+| conformer_aishell | zh | 16k
+| conformer_online_aishell | zh | 16k
+| transformer_librispeech | en | 16k
+| deepspeech2online_wenetspeech | zh | 16k
 | deepspeech2offline_aishell| zh| 16k
 | deepspeech2online_aishell | zh | 16k
-|deepspeech2offline_librispeech|en| 16k
+| deepspeech2offline_librispeech | en | 16k
--- a/demos/speech_recognition/README_cn.md
+++ b/demos/speech_recognition/README_cn.md
@@ -22,13 +22,13 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - 命令行 (推荐使用)
  ```bash
  # 中文
-  paddlespeech asr --input ./zh.wav
+  paddlespeech asr --input ./zh.wav -v
  # 英文
-  paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav
+  paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v
  # 中文 + 标点恢复
-  paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
+  paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v
  ```
-  (如果显示 `paddlespeech-ctcdecoders` 这个 python 包没有找到的 Error，没有关系，这个包是非必须的。)
+  (如果不想显示 log 信息，可以不使用"-v", 另外如果显示 `paddlespeech-ctcdecoders` 这个 python 包没有找到的 Error，没有关系，这个包是非必须的。)
  
  使用方法：
  ```bash
@@ -43,6 +43,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
  - `ckpt_path`：模型参数文件，若不设置则下载预训练模型使用，默认值：`None`。
  - `yes`；不需要设置额外的参数，一旦设置了该参数，说明你默认同意程序的所有请求，其中包括自动转换输入音频的采样率。默认值：`False`。
  - `device`：执行预测的设备，默认值：当前系统下 paddlepaddle 的默认 device。
+  - `verbose`: 如果使用，显示 logger 信息。

  输出：
  ```bash
@@ -55,7 +56,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import ASRExecutor
+  from paddlespeech.cli.asr import ASRExecutor

  asr_executor = ASRExecutor()
  text = asr_executor(
@@ -82,7 +83,11 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 | 模型 | 语言 | 采样率
 | :--- | :---: | :---: |
 | conformer_wenetspeech | zh | 16k
+| conformer_online_multicn | zh | 16k
+| conformer_aishell | zh | 16k
+| conformer_online_aishell | zh | 16k
 | transformer_librispeech | en | 16k
+| deepspeech2online_wenetspeech | zh | 16k
 | deepspeech2offline_aishell| zh| 16k
 | deepspeech2online_aishell | zh | 16k
 | deepspeech2offline_librispeech | en | 16k
--- a/demos/speech_server/README.md
+++ b/demos/speech_server/README.md
@@ -10,7 +10,7 @@ This demo is an implementation of starting the voice service and accessing the s
 ### 1. Installation
 see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).

-It is recommended to use **paddlepaddle 2.2.1** or above.
+It is recommended to use **paddlepaddle 2.2.2** or above.
 You can choose one way from meduim and hard to install paddlespeech.

 ### 2. Prepare config File
@@ -18,6 +18,7 @@ The configuration file can be found in `conf/application.yaml` .
 Among them, `engine_list` indicates the speech engine that will be included in the service to be started, in the format of `<speech task>_<engine type>`.
 At present, the speech tasks integrated by the service include: asr (speech recognition), tts (text to sppech) and cls (audio classification).
 Currently the engine type supports two forms: python and inference (Paddle Inference)
+**Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.


 The input of  ASR client demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
@@ -83,6 +84,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 ### 4. ASR Client Usage
 **Note:** The response time will be slightly longer when using the client for the first time
 - Command Line (Recommended)
+
+   If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
   ```
   paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
   ```
@@ -131,6 +135,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 ### 5. TTS Client Usage
 **Note:** The response time will be slightly longer when using the client for the first time
 - Command Line (Recommended)
+
+   If `127.0.0.1` is not accessible, you need to use the actual service IP address
+
   ```bash
   paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
   ```
@@ -191,6 +198,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 ### 6. CLS Client Usage
 **Note:** The response time will be slightly longer when using the client for the first time
 - Command Line (Recommended)
+
+   If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
   ```
   paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
   ```
@@ -235,6 +245,173 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
  ```


+### 7. Speaker Verification Client Usage
+
+#### 7.1 Extract speaker embedding
+**Note:** The response time will be slightly longer when using the client for the first time
+- Command Line (Recommended)
+
+  If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+  ``` bash
+  paddlespeech_client vector --task spk  --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
+  ```
+
+  Usage:
+
+  ``` bash
+  paddlespeech_client vector --help
+  ```
+
+  Arguments:
+    * server_ip: server ip. Default: 127.0.0.1
+    * port: server port. Default: 8090
+    * input(required): Input text to generate.
+    * task: the task of vector, can be use 'spk' or 'score。Default is 'spk'。
+    * enroll: enroll audio
+    * test: test audio
+
+  Output:
+
+  ```bash
+    [2022-05-25 12:25:36,165] [    INFO] - vector http client start
+    [2022-05-25 12:25:36,165] [    INFO] - the input audio: 85236145389.wav
+    [2022-05-25 12:25:36,165] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector
+    [2022-05-25 12:25:36,166] [    INFO] - http://127.0.0.1:8790/paddlespeech/vector
+    [2022-05-25 12:25:36,324] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
+    [2022-05-25 12:25:36,324] [    INFO] - Response time 0.159053 s.
+  ```
+
+* Python API
+
+  ``` python
+  from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
+
+  vectorclient_executor = VectorClientExecutor()
+  res = vectorclient_executor(
+      input="85236145389.wav",
+      server_ip="127.0.0.1",
+      port=8090,
+      task="spk")
+  print(res)
+  ```
+
+  Output:
+
+  ``` bash
+    {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
+  ```
+
+#### 7.2 Get the score between speaker audio embedding
+
+**Note:** The response time will be slightly longer when using the client for the first time
+
+- Command Line (Recommended)
+
+  If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+  ``` bash
+  paddlespeech_client vector --task score  --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav
+  ```
+
+  Usage:
+
+  ``` bash
+  paddlespeech_client vector --help
+  ```
+
+  Arguments:
+    * server_ip: server ip. Default: 127.0.0.1
+    * port: server port. Default: 8090
+    * input(required): Input text to generate.
+    * task: the task of vector, can be use 'spk' or 'score。If get the score, this must be 'score' parameter.
+    * enroll: enroll audio
+    * test: test audio
+  
+  Output:
+
+  ``` bash
+    [2022-05-25 12:33:24,527] [    INFO] - vector score http client start
+    [2022-05-25 12:33:24,527] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+    [2022-05-25 12:33:24,528] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
+    [2022-05-25 12:33:24,695] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
+    [2022-05-25 12:33:24,696] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
+    [2022-05-25 12:33:24,696] [    INFO] - Response time 0.168271 s.
+  ```
+
+* Python API
+
+  ``` python 
+  from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
+
+  vectorclient_executor = VectorClientExecutor()
+  res = vectorclient_executor(
+      input=None,
+      enroll_audio="85236145389.wav",
+      test_audio="123456789.wav",
+      server_ip="127.0.0.1",
+      port=8090,
+      task="score")
+  print(res)
+  ```
+
+  Output:
+
+  ``` bash
+  [2022-05-25 12:30:14,143] [    INFO] - vector score http client start
+  [2022-05-25 12:30:14,143] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+  [2022-05-25 12:30:14,143] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
+  [2022-05-25 12:30:14,363] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
+  {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
+  ```
+
+### 8. Punctuation prediction
+  
+**Note:** The response time will be slightly longer when using the client for the first time
+
+- Command Line (Recommended)
+
+  If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+   ``` bash
+   paddlespeech_client text --server_ip 127.0.0.1 --port 8090 --input "我认为跑步最重要的就是给我带来了身体健康"
+   ```
+
+  Usage:
+  
+  ```bash
+  paddlespeech_client text --help
+  ```
+  Arguments:
+  - `server_ip`: server ip. Default: 127.0.0.1
+  - `port`: server port. Default: 8090
+  - `input`(required): Input text to get punctuation.
+
+  Output:
+  ```bash
+    [2022-05-09 18:19:04,397] [    INFO] - The punc text: 我认为跑步最重要的就是给我带来了身体健康。
+    [2022-05-09 18:19:04,397] [    INFO] - Response time 0.092407 s.
+  ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import TextClientExecutor
+
+  textclient_executor = TextClientExecutor()
+  res = textclient_executor(
+      input="我认为跑步最重要的就是给我带来了身体健康",
+      server_ip="127.0.0.1",
+      port=8090,)
+  print(res)
+
+  ```
+
+  Output:
+  ```bash
+  我认为跑步最重要的就是给我带来了身体健康。
+  ```
+
+
 ## Models supported by the service
 ### ASR model
 Get all models supported by the ASR service via `paddlespeech_server stats --task asr`, where static models can be used for paddle inference inference.
@@ -244,3 +421,9 @@ Get all models supported by the TTS service via `paddlespeech_server stats --tas

 ### CLS model
 Get all models supported by the CLS service via `paddlespeech_server stats --task cls`, where static models can be used for paddle inference inference.
+
+### Vector model
+Get all models supported by the TTS service via `paddlespeech_server stats --task vector`, where static models can be used for paddle inference inference.
+
+### Text model
+Get all models supported by the CLS service via `paddlespeech_server stats --task text`, where static models can be used for paddle inference inference.
--- a/demos/speech_server/README_cn.md
+++ b/demos/speech_server/README_cn.md
-([简体中文](./README_cn.md)|English)
+(简体中文|[English](./README.md))

 # 语音服务

 ## 介绍
-这个demo是一个启动语音服务和访问服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
+这个 demo 是一个启动离线语音服务和访问服务的实现。它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。


 ## 使用方法
 ### 1. 安装
 请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).

-推荐使用 **paddlepaddle 2.2.1** 或以上版本。
-你可以从 medium，hard 三中方式中选择一种方式安装 PaddleSpeech。
+推荐使用 **paddlepaddle 2.2.2** 或以上版本。
+你可以从 medium，hard 两种方式中选择一种方式安装 PaddleSpeech。


 ### 2. 准备配置文件
 配置文件可参见 `conf/application.yaml` 。
 其中，`engine_list`表示即将启动的服务将会包含的语音引擎，格式为 <语音任务>_<引擎类型>。
-目前服务集成的语音任务有： asr(语音识别)、tts(语音合成)以及cls(音频分类)。
+目前服务集成的语音任务有： asr(语音识别)、tts(语音合成)、cls(音频分类)、vector(声纹识别)以及text(文本处理)。
 目前引擎类型支持两种形式：python 及 inference (Paddle Inference)
+**注意：** 如果在容器里可正常启动服务，但客户端访问 ip 不可达，可尝试将配置文件中 `host` 地址换成本地 ip 地址。


-这个 ASR client 的输入应该是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。
+ASR client 的输入是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。

-可以下载此 ASR client的示例音频：
+可以下载此 ASR client 的示例音频：
 ```bash
 wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
 ```
@@ -83,6 +84,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 ### 4. ASR 客户端使用方法
 **注意：** 初次使用客户端时响应时间会略长
 - 命令行 (推荐使用)
+
+  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
  ```
  paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav

@@ -95,7 +99,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
  ```

  参数:
-    - `server_ip`: 服务端ip地址，默认: 127.0.0.1。
+  - `server_ip`: 服务端 ip 地址，默认: 127.0.0.1。
  - `port`: 服务端口，默认: 8090。
  - `input`(必须输入): 用于识别的音频文件。
  - `sample_rate`: 音频采样率，默认值：16000。
@@ -135,6 +139,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 **注意：** 初次使用客户端时响应时间会略长
 - 命令行 (推荐使用)
  
+  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
  ```bash
  paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
  ```
@@ -192,9 +198,14 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee

  ```

-  ### 6. CLS 客户端使用方法
-  **注意：** 初次使用客户端时响应时间会略长
-  - 命令行 (推荐使用)
+### 6. CLS 客户端使用方法
+
+**注意：** 初次使用客户端时响应时间会略长
+
+- 命令行 (推荐使用)
+
+  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
  ```
  paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
  ```
@@ -205,7 +216,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
  paddlespeech_client cls --help
  ```
  参数:
-  - `server_ip`: 服务端ip地址，默认: 127.0.0.1。
+  - `server_ip`: 服务端 ip 地址，默认: 127.0.0.1。
  - `port`: 服务端口，默认: 8090。
  - `input`(必须输入): 用于分类的音频文件。
  - `topk`: 分类结果的topk。
@@ -239,13 +250,181 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee

  ```

+### 7. 声纹客户端使用方法
+
+#### 7.1 提取声纹特征
+注意： 初次使用客户端时响应时间会略长
+* 命令行 (推荐使用)
+
+  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
+  ``` bash
+  paddlespeech_client vector --task spk  --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
+  ```
+
+  使用帮助:
+
+  ``` bash
+  paddlespeech_client vector --help
+  ```
+  参数:
+  * server_ip: 服务端ip地址，默认: 127.0.0.1。
+  * port: 服务端口，默认: 8090。
+  * input(必须输入): 用于识别的音频文件。
+  * task: vector 的任务，可选spk或者score。默认是 spk。
+  * enroll: 注册音频；。
+  * test: 测试音频。
+  输出:
+
+  ``` bash
+    [2022-05-25 12:25:36,165] [    INFO] - vector http client start
+    [2022-05-25 12:25:36,165] [    INFO] - the input audio: 85236145389.wav
+    [2022-05-25 12:25:36,165] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector
+    [2022-05-25 12:25:36,166] [    INFO] - http://127.0.0.1:8790/paddlespeech/vector
+    [2022-05-25 12:25:36,324] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
+    [2022-05-25 12:25:36,324] [    INFO] - Response time 0.159053 s.
+  ```
+
+* Python API
+
+  ``` python
+  from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
+
+  vectorclient_executor = VectorClientExecutor()
+  res = vectorclient_executor(
+      input="85236145389.wav",
+      server_ip="127.0.0.1",
+      port=8090,
+      task="spk")
+  print(res)
+  ```
+
+  输出:
+
+  ``` bash
+    {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
+  ```
+
+#### 7.2 音频声纹打分
+
+注意： 初次使用客户端时响应时间会略长
+* 命令行 (推荐使用)
+
+  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
+  ``` bash
+  paddlespeech_client vector --task score  --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav
+  ```
+
+  使用帮助:
+
+  ``` bash
+  paddlespeech_client vector --help
+  ```
+
+  参数:
+  * server_ip: 服务端ip地址，默认: 127.0.0.1。
+  * port: 服务端口，默认: 8090。
+  * input(必须输入): 用于识别的音频文件。
+  * task: vector 的任务，可选spk或者score。默认是 spk。
+  * enroll: 注册音频；。
+  * test: 测试音频。
+
+  输出:
+
+  ``` bash
+    [2022-05-25 12:33:24,527] [    INFO] - vector score http client start
+    [2022-05-25 12:33:24,527] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+    [2022-05-25 12:33:24,528] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
+    [2022-05-25 12:33:24,695] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
+    [2022-05-25 12:33:24,696] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
+    [2022-05-25 12:33:24,696] [    INFO] - Response time 0.168271 s.
+  ```
+
+* Python API
+
+  ``` python 
+  from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
+
+  vectorclient_executor = VectorClientExecutor()
+  res = vectorclient_executor(
+      input=None,
+      enroll_audio="85236145389.wav",
+      test_audio="123456789.wav",
+      server_ip="127.0.0.1",
+      port=8090,
+      task="score")
+  print(res)
+  ```
+
+  输出:
+
+  ``` bash
+  [2022-05-25 12:30:14,143] [    INFO] - vector score http client start
+  [2022-05-25 12:30:14,143] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+  [2022-05-25 12:30:14,143] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
+  [2022-05-25 12:30:14,363] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
+  {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
+  ```
+
+
+### 8. 标点预测
+  
+  **注意：** 初次使用客户端时响应时间会略长
+- 命令行 (推荐使用)
+
+  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
+  ``` bash
+  paddlespeech_client text --server_ip 127.0.0.1 --port 8090 --input "我认为跑步最重要的就是给我带来了身体健康"
+  ```
+
+  使用帮助:
+  
+  ```bash
+  paddlespeech_client text --help
+  ```
+  参数:
+  - `server_ip`: 服务端ip地址，默认: 127.0.0.1。
+  - `port`: 服务端口，默认: 8090。
+  - `input`(必须输入): 用于标点预测的文本内容。
+
+  输出:
+  ```bash
+    [2022-05-09 18:19:04,397] [    INFO] - The punc text: 我认为跑步最重要的就是给我带来了身体健康。
+    [2022-05-09 18:19:04,397] [    INFO] - Response time 0.092407 s.
+  ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import TextClientExecutor
+
+  textclient_executor = TextClientExecutor()
+  res = textclient_executor(
+      input="我认为跑步最重要的就是给我带来了身体健康",
+      server_ip="127.0.0.1",
+      port=8090,)
+  print(res)
+
+  ```
+
+  输出:
+  ```bash
+  我认为跑步最重要的就是给我带来了身体健康。
+  ```

 ## 服务支持的模型
-### ASR支持的模型
-通过 `paddlespeech_server stats --task asr` 获取ASR服务支持的所有模型，其中静态模型可用于 paddle inference 推理。 
+### ASR 支持的模型
+通过 `paddlespeech_server stats --task asr` 获取 ASR 服务支持的所有模型，其中静态模型可用于 paddle inference 推理。 
+
+### TTS 支持的模型
+通过 `paddlespeech_server stats --task tts` 获取 TTS 服务支持的所有模型，其中静态模型可用于  paddle inference 推理。
+
+### CLS 支持的模型
+通过 `paddlespeech_server stats --task cls` 获取 CLS 服务支持的所有模型，其中静态模型可用于  paddle inference 推理。

-### TTS支持的模型
-通过 `paddlespeech_server stats --task tts` 获取TTS服务支持的所有模型，其中静态模型可用于 paddle inference 推理。
+### Vector 支持的模型
+通过 `paddlespeech_server stats --task vector` 获取 Vector 服务支持的所有模型。

-### CLS支持的模型
-通过 `paddlespeech_server stats --task cls` 获取CLS服务支持的所有模型，其中静态模型可用于 paddle inference 推理。
+### Text支持的模型
+通过 `paddlespeech_server stats --task text` 获取 Text 服务支持的所有模型。
--- a/demos/speech_server/asr_client.sh
+++ b/demos/speech_server/asr_client.sh
 #!/bin/bash

 wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
+
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
 paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
--- a/demos/speech_server/cls_client.sh
+++ b/demos/speech_server/cls_client.sh
 #!/bin/bash

 wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
+
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
 paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav --topk 1
--- a/demos/speech_server/conf/application.yaml
+++ b/demos/speech_server/conf/application.yaml
-# This is the parameter configuration file for PaddleSpeech Serving.
+# This is the parameter configuration file for PaddleSpeech Offline Serving.

 #################################################################################
 #                             SERVER SETTING                                    #
 #################################################################################
-host: 127.0.0.1
+host: 0.0.0.0
 port: 8090

 # The task format in the engin_list is: <speech task>_<engine type>
-# task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference']
-
-engine_list: ['asr_python', 'tts_python', 'cls_python']
+# task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference', 'cls_python', 'cls_inference']
+protocol: 'http'
+engine_list: ['asr_python', 'tts_python', 'cls_python', 'text_python', 'vector_python']


 #################################################################################
@@ -135,3 +135,26 @@ cls_inference:
        glog_info: False  # True -> print glog
        summary: True  # False -> do not show predictor config

+
+################################### Text #########################################
+################### text task: punc; engine_type: python #######################
+text_python:
+    task: punc
+    model_type: 'ernie_linear_p3_wudao'
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: # [optional]
+    ckpt_path: # [optional]
+    vocab_file: # [optional]
+    device:  # set 'gpu:id' or 'cpu'
+
+
+################################### Vector ######################################
+################### Vector task: spk; engine_type: python #######################
+vector_python:
+    task: spk
+    model_type: 'ecapatdnn_voxceleb12'
+    sample_rate: 16000
+    cfg_path: # [optional]
+    ckpt_path: # [optional]
+    device:  # set 'gpu:id' or 'cpu'
--- a/paddlespeech/server/bin/main.py
+++ b/paddlespeech/server/bin/main.py
@@ -12,66 +12,59 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import argparse
+import warnings

 import uvicorn
 from fastapi import FastAPI
+from starlette.middleware.cors import CORSMiddleware

 from paddlespeech.server.engine.engine_pool import init_engine_pool
 from paddlespeech.server.restful.api import setup_router as setup_http_router
 from paddlespeech.server.utils.config import get_config
 from paddlespeech.server.ws.api import setup_router as setup_ws_router
+warnings.filterwarnings("ignore")
+import sys

 app = FastAPI(
    title="PaddleSpeech Serving API", description="Api", version="0.0.1")
-
-
-def init(config):
-    """system initialization
-
-    Args:
-        config (CfgNode): config object
-
-    Returns:
-        bool: 
-    """
-    # init api
-    api_list = list(engine.split("_")[0] for engine in config.engine_list)
-    if config.protocol == "websocket":
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"])
+
+# change yaml file here
+config_file = "./conf/application.yaml"
+config = get_config(config_file)
+
+# init engine
+if not init_engine_pool(config):
+    print("Failed to init engine.")
+    sys.exit(-1)
+
+# get api_router
+api_list = list(engine.split("_")[0] for engine in config.engine_list)
+if config.protocol == "websocket":
    api_router = setup_ws_router(api_list)
-    elif config.protocol == "http":
+elif config.protocol == "http":
    api_router = setup_http_router(api_list)
-    else:
+else:
    raise Exception("unsupported protocol")
-    app.include_router(api_router)
-
-    if not init_engine_pool(config):
-        return False
-
-    return True
-
-
-def main(args):
-    """main function"""
-
-    config = get_config(args.config_file)
-
-    if init(config):
-        uvicorn.run(app, host=config.host, port=config.port, debug=True)
+    sys.exit(-1)

+# app needs to operate outside the main function 
+app.include_router(api_router)

 if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument(
-        "--config_file",
-        action="store",
-        help="yaml file of the app",
-        default="./conf/application.yaml")
-
+    parser = argparse.ArgumentParser(add_help=True)
    parser.add_argument(
-        "--log_file",
-        action="store",
-        help="log file",
-        default="./log/paddlespeech.log")
+        "--workers", type=int, help="workers of server", default=1)
    args = parser.parse_args()

-    main(args)
+    uvicorn.run(
+        "start_multi_progress_server:app",
+        host=config.host,
+        port=config.port,
+        debug=True,
+        workers=args.workers)
--- a/demos/speech_server/tts_client.sh
+++ b/demos/speech_server/tts_client.sh
 #!/bin/bash

+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
 paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
--- a/demos/speech_translation/README.md
+++ b/demos/speech_translation/README.md
@@ -47,7 +47,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import STExecutor
+  from paddlespeech.cli.st import STExecutor

  st_executor = STExecutor()
  text = st_executor(

--- a/demos/speech_translation/README_cn.md
+++ b/demos/speech_translation/README_cn.md
@@ -47,7 +47,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import STExecutor
+  from paddlespeech.cli.st import STExecutor
  
  st_executor = STExecutor()
  text = st_executor(

--- a/speechx/examples/ngram/.gitignore
+++ b/speechx/examples/ngram/.gitignore
-data
 exp
+
--- a/demos/streaming_asr_server/README.md
+++ b/demos/streaming_asr_server/README.md
--- a/demos/streaming_asr_server/README_cn.md
+++ b/demos/streaming_asr_server/README_cn.md
--- a/paddlespeech/server/conf/ws_application.yaml
+++ b/paddlespeech/server/conf/ws_application.yaml
@@ -7,8 +7,8 @@ host: 0.0.0.0
 port: 8090

 # The task format in the engin_list is: <speech task>_<engine type>
-# task choices = ['asr_online', 'tts_online']
-# protocol = ['websocket', 'http'] (only one can be selected).
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
 # websocket only support online engine type.
 protocol: 'websocket'
 engine_list: ['asr_online']
@@ -21,7 +21,7 @@ engine_list: ['asr_online']
 ################################### ASR #########################################
 ################### speech task: asr; engine_type: online #######################
 asr_online:
-    model_type: 'deepspeech2online_aishell'
+    model_type: 'conformer_online_wenetspeech'
    am_model: # the pdmodel file of am static model [optional]
    am_params:  # the pdiparams file of am static model [optional]
    lang: 'zh'
@@ -29,6 +29,9 @@ asr_online:
    cfg_path: 
    decode_method: 
    force_yes: True
+    device: 'cpu' # cpu or gpu:id
+    decode_method: "attention_rescoring"
+    continuous_decoding: True # enable continue decoding when endpoint detected

    am_predictor_conf:
        device:  # set 'gpu:id' or 'cpu'
@@ -37,11 +40,9 @@ asr_online:
        summary: True  # False -> do not show predictor config

    chunk_buffer_conf:
-        frame_duration_ms: 80
-        shift_ms: 40
-        sample_rate: 16000
-        sample_width: 2
        window_n: 7     # frame
        shift_n: 4      # frame
-        window_ms: 20   # ms
+        window_ms: 25   # ms
        shift_ms: 10    # ms
+        sample_rate: 16000
+        sample_width: 2
--- a/demos/streaming_asr_server/conf/punc_application.yaml
+++ b/demos/streaming_asr_server/conf/punc_application.yaml
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8190
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_python']
+# protocol = ['http'] (only one can be selected). 
+# http only support offline engine type.
+protocol: 'http'
+engine_list: ['text_python']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### Text #########################################
+################### text task: punc; engine_type: python #######################
+text_python:
+    task: punc
+    model_type: 'ernie_linear_p3_wudao'
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: # [optional]
+    ckpt_path: # [optional]
+    vocab_file: # [optional]
+    device: 'cpu' # set 'gpu:id' or 'cpu'
+
+
+
+
--- a/demos/streaming_asr_server/conf/ws_conformer_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_conformer_application.yaml
@@ -4,11 +4,11 @@
 #                             SERVER SETTING                                    #
 #################################################################################
 host: 0.0.0.0
-port: 8090
+port: 8091

 # The task format in the engin_list is: <speech task>_<engine type>
-# task choices = ['asr_online', 'tts_online']
-# protocol = ['websocket', 'http'] (only one can be selected).
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
 # websocket only support online engine type.
 protocol: 'websocket'
 engine_list: ['asr_online']
@@ -28,8 +28,12 @@ asr_online:
    sample_rate: 16000
    cfg_path: 
    decode_method: 
+    num_decoding_left_chunks: -1
    force_yes: True
-    device: # cpu or gpu:id
+    device: 'cpu' # cpu or gpu:id
+    decode_method: "attention_rescoring"
+    continuous_decoding: True # enable continue decoding when endpoint detected
+
    am_predictor_conf:
        device:  # set 'gpu:id' or 'cpu'
        switch_ir_optim: True

--- a/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application.yaml
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8090
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+    model_type: 'conformer_online_wenetspeech'
+    am_model: # the pdmodel file of am static model [optional]
+    am_params:  # the pdiparams file of am static model [optional]
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: 
+    decode_method: 
+    force_yes: True
+    device: 'cpu' # cpu or gpu:id
+    decode_method: "attention_rescoring"
+    continuous_decoding: True # enable continue decoding when endpoint detected
+    num_decoding_left_chunks: -1
+    am_predictor_conf:
+        device:  # set 'gpu:id' or 'cpu'
+        switch_ir_optim: True
+        glog_info: False  # True -> print glog
+        summary: True  # False -> do not show predictor config
+
+    chunk_buffer_conf:
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 25   # ms
+        shift_ms: 10    # ms
+        sample_rate: 16000
+        sample_width: 2
--- a/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application_faster.yaml
+++ b/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application_faster.yaml
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8090
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+    model_type: 'conformer_online_wenetspeech'
+    am_model: # the pdmodel file of am static model [optional]
+    am_params:  # the pdiparams file of am static model [optional]
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: 
+    decode_method: 
+    force_yes: True
+    device: 'cpu' # cpu or gpu:id
+    decode_method: "attention_rescoring"
+    continuous_decoding: True # enable continue decoding when endpoint detected
+    num_decoding_left_chunks: 16
+    am_predictor_conf:
+        device:  # set 'gpu:id' or 'cpu'
+        switch_ir_optim: True
+        glog_info: False  # True -> print glog
+        summary: True  # False -> do not show predictor config
+
+    chunk_buffer_conf:
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 25   # ms
+        shift_ms: 10    # ms
+        sample_rate: 16000
+        sample_width: 2
--- a/demos/streaming_asr_server/conf/ws_ds2_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_ds2_application.yaml
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8090
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_online-inference', 'asr_online-onnx']
+# protocol = ['websocket'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online-onnx']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online-inference #######################
+asr_online-inference:
+    model_type: 'deepspeech2online_wenetspeech'
+    am_model:    # the pdmodel file of am static model [optional]
+    am_params:   # the pdiparams file of am static model [optional]
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: 
+    decode_method: 
+    num_decoding_left_chunks: 
+    force_yes: True
+    device: 'cpu' # cpu or gpu:id
+
+    am_predictor_conf:
+        device:  # set 'gpu:id' or 'cpu'
+        switch_ir_optim: True
+        glog_info: False  # True -> print glog
+        summary: True  # False -> do not show predictor config
+
+    chunk_buffer_conf:
+        frame_duration_ms: 85
+        shift_ms: 40
+        sample_rate: 16000
+        sample_width: 2
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 25   # ms
+        shift_ms: 10    # ms
+
+
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online-onnx #######################
+asr_online-onnx:
+    model_type: 'deepspeech2online_wenetspeech'
+    am_model:  # the pdmodel file of onnx am static model [optional]
+    am_params:  # the pdiparams file of am static model [optional]
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: 
+    decode_method: 
+    num_decoding_left_chunks: 
+    force_yes: True
+    device: 'cpu' # cpu or gpu:id
+
+    # https://onnxruntime.ai/docs/api/python/api_summary.html#inferencesession
+    am_predictor_conf:
+        device: 'cpu' # set 'gpu:id' or 'cpu'
+        graph_optimization_level: 0 
+        intra_op_num_threads: 0 # Sets the number of threads used to parallelize the execution within nodes.
+        inter_op_num_threads: 0 # Sets the number of threads used to parallelize the execution of the graph (across nodes).
+        log_severity_level: 2   # Log severity level. Applies to session load, initialization, etc. 0:Verbose, 1:Info, 2:Warning. 3:Error, 4:Fatal. Default is 2.
+        log_verbosity_level: 0  # VLOG level if DEBUG build and session_log_severity_level is 0. Applies to session load, initialization, etc. Default is 0.
+
+    chunk_buffer_conf:
+        frame_duration_ms: 80
+        shift_ms: 40
+        sample_rate: 16000
+        sample_width: 2
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 25   # ms
+        shift_ms: 10    # ms
--- a/demos/streaming_asr_server/local/rtf_from_log.py
+++ b/demos/streaming_asr_server/local/rtf_from_log.py
+#!/usr/bin/env python3
+import argparse
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(prog=__doc__)
+    parser.add_argument(
+        '--logfile', type=str, required=True, help='ws client log file')
+
+    args = parser.parse_args()
+
+    rtfs = []
+    with open(args.logfile, 'r') as f:
+        for line in f:
+            if 'RTF=' in line:
+                # udio duration: 6.126, elapsed time: 3.471978187561035, RTF=0.5667610492264177
+                line = line.strip()
+                beg = line.index("audio")
+                line = line[beg:]
+
+                items = line.split(',')
+                vals = []
+                for elem in items:
+                    if "RTF=" in elem:
+                        continue
+                    _, val = elem.split(":")
+                    vals.append(eval(val))
+                keys = ['T', 'P']
+                meta = dict(zip(keys, vals))
+
+                rtfs.append(meta)
+
+    T = 0.0
+    P = 0.0
+    n = 0
+    for m in rtfs:
+        n += 1
+        T += m['T']
+        P += m['P']
+
+    print(f"RTF: {P/T}, utts: {n}")
--- a/demos/streaming_asr_server/local/test.sh
+++ b/demos/streaming_asr_server/local/test.sh
+#!/bin/bash 
+
+if [ $# != 1 ];then
+    echo "usage: $0 wav_scp"
+    exit -1
+fi
+
+scp=$1
+
+# calc RTF
+# wav_scp can generate from `speechx/examples/ds2_ol/aishell`
+
+exp=exp
+mkdir -p $exp
+
+python3 local/websocket_client.py --server_ip 127.0.0.1 --port 8090 --wavscp $scp &> $exp/log.rsl
+
+python3 local/rtf_from_log.py --logfile $exp/log.rsl
+
+
+ 
\ No newline at end of file
--- a/demos/streaming_asr_server/websocket_client.py
+++ b/demos/streaming_asr_server/websocket_client.py
+#!/usr/bin/python
 # Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -11,8 +12,9 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-#!/usr/bin/python
-# -*- coding: UTF-8 -*-
+# calc avg RTF(NOT Accurate): grep -rn RTF log.txt | awk '{print $NF}' | awk -F "=" '{sum += $NF} END {print "all time",sum, "audio num", NR,  "RTF", sum/NR}'
+# python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav
+# python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --wavfile ./zh.wav
 import argparse
 import asyncio
 import codecs
@@ -28,6 +30,7 @@ def main(args):
    handler = ASRWsAudioHandler(
        args.server_ip,
        args.port,
+        endpoint=args.endpoint,
        punc_server_ip=args.punc_server_ip,
        punc_server_port=args.punc_server_port)
    loop = asyncio.get_event_loop()
@@ -69,7 +72,11 @@ if __name__ == "__main__":
        default=8091,
        dest="punc_server_port",
        help='Punctuation server port')
-
+    parser.add_argument(
+        "--endpoint",
+        type=str,
+        default="/paddlespeech/asr/streaming",
+        help="ASR websocket endpoint")
    parser.add_argument(
        "--wavfile",
        action="store",

--- a/demos/streaming_asr_server/punc_server.py
+++ b/demos/streaming_asr_server/punc_server.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        prog='paddlespeech_server.start', add_help=True)
+    parser.add_argument(
+        "--config_file",
+        action="store",
+        help="yaml file of the app",
+        default=None,
+        required=True)
+
+    parser.add_argument(
+        "--log_file",
+        action="store",
+        help="log file",
+        default="./log/paddlespeech.log")
+    logger.info("start to parse the args")
+    args = parser.parse_args()
+
+    logger.info("start to launch the punctuation server")
+    punc_server = ServerExecutor()
+    punc_server(config_file=args.config_file, log_file=args.log_file)
--- a/demos/streaming_asr_server/server.sh
+++ b/demos/streaming_asr_server/server.sh
+export CUDA_VISIBLE_DEVICE=0,1,2,3
+ export CUDA_VISIBLE_DEVICE=0,1,2,3
+
+# nohup python3 punc_server.py --config_file conf/punc_application.yaml > punc.log 2>&1 &
+paddlespeech_server start --config_file conf/punc_application.yaml &> punc.log &
+
+# nohup python3 streaming_asr_server.py --config_file conf/ws_conformer_wenetspeech_application.yaml > streaming_asr.log 2>&1 &
+paddlespeech_server start --config_file conf/ws_conformer_wenetspeech_application.yaml &> streaming_asr.log  &
+
--- a/demos/streaming_asr_server/streaming_asr_server.py
+++ b/demos/streaming_asr_server/streaming_asr_server.py
--- a/demos/streaming_asr_server/test.sh
+++ b/demos/streaming_asr_server/test.sh
 # download the test wav
 wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav 

-# read the wav and pass it to service
-python3 websocket_client.py --wavfile ./zh.wav
+# read the wav and pass it to only streaming asr service
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
+
+# read the wav and call streaming and punc service
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav
+
--- a/demos/streaming_asr_server/web/templates/index.html
+++ b/demos/streaming_asr_server/web/templates/index.html
@@ -93,6 +93,7 @@

    function parseResult(data) {
      var data = JSON.parse(data)
+      console.log('result json:', data)
      var result = data.result
      console.log(result)
      $("#resultPanel").html(result)

--- a/demos/streaming_tts_server/README.md
+++ b/demos/streaming_tts_server/README.md
--- a/demos/streaming_tts_server/README_cn.md
+++ b/demos/streaming_tts_server/README_cn.md
--- a/demos/streaming_tts_server/conf/tts_online_application.yaml
+++ b/demos/streaming_tts_server/conf/tts_online_application.yaml
--- a/demos/streaming_tts_server/test_client.sh
+++ b/demos/streaming_tts_server/test_client.sh
 #!/bin/bash

 # http client test
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
 paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav

 # websocket client test
-#paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+# paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
--- a/demos/text_to_speech/README.md
+++ b/demos/text_to_speech/README.md
--- a/demos/text_to_speech/README_cn.md
+++ b/demos/text_to_speech/README_cn.md
--- a/docker/ubuntu18-cpu/Dockerfile
+++ b/docker/ubuntu18-cpu/Dockerfile
--- a/docs/paddlespeech.pdf
+++ b/docs/paddlespeech.pdf
--- a/docs/source/asr/PPASR.md
+++ b/docs/source/asr/PPASR.md
--- a/docs/source/asr/PPASR_cn.md
+++ b/docs/source/asr/PPASR_cn.md
--- a/audio/docs/source/_static/custom.css
+++ b/audio/docs/source/_static/custom.css
--- a/audio/docs/source/_templates/module.rst_t
+++ b/audio/docs/source/_templates/module.rst_t
--- a/audio/docs/source/_templates/package.rst_t
+++ b/audio/docs/source/_templates/package.rst_t
--- a/audio/docs/source/_templates/toc.rst_t
+++ b/audio/docs/source/_templates/toc.rst_t
--- a/audio/docs/source/conf.py
+++ b/audio/docs/source/conf.py
--- a/audio/docs/source/index.rst
+++ b/audio/docs/source/index.rst
--- a/docs/source/cls/custom_dataset.md
+++ b/docs/source/cls/custom_dataset.md
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
--- a/docs/source/install.md
+++ b/docs/source/install.md
--- a/docs/source/install_cn.md
+++ b/docs/source/install_cn.md
--- a/docs/source/reference.md
+++ b/docs/source/reference.md
--- a/docs/source/released_model.md
+++ b/docs/source/released_model.md
--- a/docs/source/streaming_asr_demo_video.rst
+++ b/docs/source/streaming_asr_demo_video.rst
--- a/docs/source/streaming_tts_demo_video.rst
+++ b/docs/source/streaming_tts_demo_video.rst
--- a/docs/source/tts/PPTTS.md
+++ b/docs/source/tts/PPTTS.md
--- a/docs/source/tts/PPTTS_cn.md
+++ b/docs/source/tts/PPTTS_cn.md
--- a/docs/source/vpr/PPVPR.md
+++ b/docs/source/vpr/PPVPR.md
--- a/docs/source/vpr/PPVPR_cn.md
+++ b/docs/source/vpr/PPVPR_cn.md
--- a/examples/aishell/asr0/RESULTS.md
+++ b/examples/aishell/asr0/RESULTS.md
--- a/examples/aishell/asr0/conf/augmentation.json
+++ b/examples/aishell/asr0/conf/augmentation.json
--- a/examples/aishell/asr0/conf/deepspeech2.yaml
+++ b/examples/aishell/asr0/conf/deepspeech2.yaml
--- a/examples/aishell/asr0/conf/deepspeech2_online.yaml
+++ b/examples/aishell/asr0/conf/deepspeech2_online.yaml
--- a/examples/aishell/asr0/conf/preprocess.yaml
+++ b/examples/aishell/asr0/conf/preprocess.yaml
--- a/examples/aishell/asr0/conf/tuning/decode.yaml
+++ b/examples/aishell/asr0/conf/tuning/decode.yaml
--- a/examples/aishell/asr0/local/data.sh
+++ b/examples/aishell/asr0/local/data.sh
--- a/examples/aishell/asr0/local/export.sh
+++ b/examples/aishell/asr0/local/export.sh
--- a/examples/aishell/asr0/local/test.sh
+++ b/examples/aishell/asr0/local/test.sh
--- a/examples/aishell/asr0/local/test_export.sh
+++ b/examples/aishell/asr0/local/test_export.sh
--- a/examples/aishell/asr0/local/test_wav.sh
+++ b/examples/aishell/asr0/local/test_wav.sh
--- a/examples/aishell/asr0/local/train.sh
+++ b/examples/aishell/asr0/local/train.sh
--- a/examples/aishell/asr0/run.sh
+++ b/examples/aishell/asr0/run.sh
--- a/examples/aishell/asr1/RESULTS.md
+++ b/examples/aishell/asr1/RESULTS.md
--- a/examples/aishell/asr1/conf/chunk_conformer.yaml
+++ b/examples/aishell/asr1/conf/chunk_conformer.yaml
--- a/examples/aishell/asr1/conf/conformer.yaml
+++ b/examples/aishell/asr1/conf/conformer.yaml
--- a/examples/aishell/asr1/conf/transformer.yaml
+++ b/examples/aishell/asr1/conf/transformer.yaml
--- a/examples/aishell/asr1/local/train.sh
+++ b/examples/aishell/asr1/local/train.sh
--- a/examples/aishell/asr1/run.sh
+++ b/examples/aishell/asr1/run.sh
--- a/examples/aishell3/tts3/README.md
+++ b/examples/aishell3/tts3/README.md
--- a/examples/aishell3/vc0/README.md
+++ b/examples/aishell3/vc0/README.md
--- a/examples/aishell3/vc1/README.md
+++ b/examples/aishell3/vc1/README.md
--- a/examples/aishell3/voc1/README.md
+++ b/examples/aishell3/voc1/README.md
--- a/examples/aishell3/voc5/README.md
+++ b/examples/aishell3/voc5/README.md
--- a/examples/ami/README.md
+++ b/examples/ami/README.md
--- a/examples/ami/sd0/README.md
+++ b/examples/ami/sd0/README.md
--- a/examples/ami/sd0/run.sh
+++ b/examples/ami/sd0/run.sh
--- a/examples/callcenter/asr1/local/train.sh
+++ b/examples/callcenter/asr1/local/train.sh
--- a/examples/callcenter/asr1/run.sh
+++ b/examples/callcenter/asr1/run.sh
--- a/examples/csmsc/tts0/README.md
+++ b/examples/csmsc/tts0/README.md
--- a/examples/csmsc/tts2/README.md
+++ b/examples/csmsc/tts2/README.md
--- a/examples/csmsc/tts3/README.md
+++ b/examples/csmsc/tts3/README.md
--- a/examples/csmsc/tts3/README_cn.md
+++ b/examples/csmsc/tts3/README_cn.md
--- a/examples/csmsc/tts3/conf/default.yaml
+++ b/examples/csmsc/tts3/conf/default.yaml
--- a/examples/csmsc/vits/README.md
+++ b/examples/csmsc/vits/README.md
--- a/examples/csmsc/vits/conf/default.yaml
+++ b/examples/csmsc/vits/conf/default.yaml
--- a/examples/csmsc/vits/local/preprocess.sh
+++ b/examples/csmsc/vits/local/preprocess.sh
--- a/examples/csmsc/vits/local/synthesize.sh
+++ b/examples/csmsc/vits/local/synthesize.sh
--- a/examples/csmsc/vits/local/synthesize_e2e.sh
+++ b/examples/csmsc/vits/local/synthesize_e2e.sh
--- a/examples/csmsc/vits/local/train.sh
+++ b/examples/csmsc/vits/local/train.sh
--- a/examples/other/1xt2x/aishell/path.sh
+++ b/examples/other/1xt2x/aishell/path.sh
--- a/examples/csmsc/vits/run.sh
+++ b/examples/csmsc/vits/run.sh
--- a/examples/csmsc/voc1/README.md
+++ b/examples/csmsc/voc1/README.md
--- a/examples/csmsc/voc3/README.md
+++ b/examples/csmsc/voc3/README.md
--- a/examples/csmsc/voc4/README.md
+++ b/examples/csmsc/voc4/README.md
--- a/examples/csmsc/voc5/README.md
+++ b/examples/csmsc/voc5/README.md
--- a/examples/csmsc/voc6/README.md
+++ b/examples/csmsc/voc6/README.md
--- a/examples/esc50/cls0/conf/panns.yaml
+++ b/examples/esc50/cls0/conf/panns.yaml
--- a/examples/hey_snips/kws0/conf/mdtc.yaml
+++ b/examples/hey_snips/kws0/conf/mdtc.yaml
--- a/examples/librispeech/asr0/RESULTS.md
+++ b/examples/librispeech/asr0/RESULTS.md
--- a/examples/librispeech/asr0/conf/augmentation.json
+++ b/examples/librispeech/asr0/conf/augmentation.json
--- a/examples/librispeech/asr0/conf/deepspeech2.yaml
+++ b/examples/librispeech/asr0/conf/deepspeech2.yaml
--- a/examples/librispeech/asr0/conf/deepspeech2_online.yaml
+++ b/examples/librispeech/asr0/conf/deepspeech2_online.yaml
--- a/examples/librispeech/asr0/conf/preprocess.yaml
+++ b/examples/librispeech/asr0/conf/preprocess.yaml
--- a/examples/librispeech/asr0/local/data.sh
+++ b/examples/librispeech/asr0/local/data.sh
--- a/examples/librispeech/asr0/local/export.sh
+++ b/examples/librispeech/asr0/local/export.sh
--- a/examples/librispeech/asr0/local/test.sh
+++ b/examples/librispeech/asr0/local/test.sh
--- a/examples/librispeech/asr0/local/test_wav.sh
+++ b/examples/librispeech/asr0/local/test_wav.sh
--- a/examples/librispeech/asr0/local/train.sh
+++ b/examples/librispeech/asr0/local/train.sh
--- a/examples/librispeech/asr0/run.sh
+++ b/examples/librispeech/asr0/run.sh
--- a/examples/librispeech/asr1/RESULTS.md
+++ b/examples/librispeech/asr1/RESULTS.md
--- a/examples/librispeech/asr1/local/test.sh
+++ b/examples/librispeech/asr1/local/test.sh
--- a/examples/librispeech/asr1/local/train.sh
+++ b/examples/librispeech/asr1/local/train.sh
--- a/examples/librispeech/asr1/run.sh
+++ b/examples/librispeech/asr1/run.sh
--- a/examples/librispeech/asr2/local/train.sh
+++ b/examples/librispeech/asr2/local/train.sh
--- a/examples/librispeech/asr2/run.sh
+++ b/examples/librispeech/asr2/run.sh
--- a/examples/ljspeech/tts0/README.md
+++ b/examples/ljspeech/tts0/README.md
--- a/examples/ljspeech/tts1/README.md
+++ b/examples/ljspeech/tts1/README.md
--- a/examples/ljspeech/tts3/README.md
+++ b/examples/ljspeech/tts3/README.md
--- a/examples/ljspeech/voc0/README.md
+++ b/examples/ljspeech/voc0/README.md
--- a/examples/ljspeech/voc1/README.md
+++ b/examples/ljspeech/voc1/README.md
--- a/examples/ljspeech/voc5/README.md
+++ b/examples/ljspeech/voc5/README.md
--- a/examples/mustc/st1/local/train.sh
+++ b/examples/mustc/st1/local/train.sh
--- a/examples/mustc/st1/run.sh
+++ b/examples/mustc/st1/run.sh
--- a/examples/other/1xt2x/.gitignore
+++ b/examples/other/1xt2x/.gitignore
--- a/examples/other/1xt2x/README.md
+++ b/examples/other/1xt2x/README.md
--- a/examples/other/1xt2x/aishell/.gitignore
+++ b/examples/other/1xt2x/aishell/.gitignore
--- a/examples/other/1xt2x/aishell/conf/augmentation.json
+++ b/examples/other/1xt2x/aishell/conf/augmentation.json
--- a/examples/other/1xt2x/aishell/conf/deepspeech2.yaml
+++ b/examples/other/1xt2x/aishell/conf/deepspeech2.yaml
--- a/examples/other/1xt2x/aishell/conf/tuning/decode.yaml
+++ b/examples/other/1xt2x/aishell/conf/tuning/decode.yaml
--- a/examples/other/1xt2x/aishell/local/data.sh
+++ b/examples/other/1xt2x/aishell/local/data.sh
--- a/examples/other/1xt2x/aishell/local/download_lm_ch.sh
+++ b/examples/other/1xt2x/aishell/local/download_lm_ch.sh
--- a/examples/other/1xt2x/aishell/local/download_model.sh
+++ b/examples/other/1xt2x/aishell/local/download_model.sh
--- a/examples/other/1xt2x/aishell/local/test.sh
+++ b/examples/other/1xt2x/aishell/local/test.sh
--- a/examples/other/1xt2x/aishell/run.sh
+++ b/examples/other/1xt2x/aishell/run.sh
--- a/examples/other/1xt2x/baidu_en8k/.gitignore
+++ b/examples/other/1xt2x/baidu_en8k/.gitignore
--- a/examples/other/1xt2x/baidu_en8k/conf/augmentation.json
+++ b/examples/other/1xt2x/baidu_en8k/conf/augmentation.json
--- a/examples/other/1xt2x/baidu_en8k/conf/deepspeech2.yaml
+++ b/examples/other/1xt2x/baidu_en8k/conf/deepspeech2.yaml
--- a/examples/other/1xt2x/baidu_en8k/conf/tuning/decode.yaml
+++ b/examples/other/1xt2x/baidu_en8k/conf/tuning/decode.yaml
--- a/examples/other/1xt2x/baidu_en8k/local/data.sh
+++ b/examples/other/1xt2x/baidu_en8k/local/data.sh
--- a/examples/other/1xt2x/baidu_en8k/local/download_lm_en.sh
+++ b/examples/other/1xt2x/baidu_en8k/local/download_lm_en.sh
--- a/examples/other/1xt2x/baidu_en8k/local/download_model.sh
+++ b/examples/other/1xt2x/baidu_en8k/local/download_model.sh
--- a/examples/other/1xt2x/baidu_en8k/local/test.sh
+++ b/examples/other/1xt2x/baidu_en8k/local/test.sh
--- a/examples/other/1xt2x/baidu_en8k/path.sh
+++ b/examples/other/1xt2x/baidu_en8k/path.sh
--- a/examples/other/1xt2x/baidu_en8k/run.sh
+++ b/examples/other/1xt2x/baidu_en8k/run.sh
--- a/examples/other/1xt2x/librispeech/.gitignore
+++ b/examples/other/1xt2x/librispeech/.gitignore
--- a/examples/other/1xt2x/librispeech/conf/augmentation.json
+++ b/examples/other/1xt2x/librispeech/conf/augmentation.json
--- a/examples/other/1xt2x/librispeech/conf/deepspeech2.yaml
+++ b/examples/other/1xt2x/librispeech/conf/deepspeech2.yaml
--- a/examples/other/1xt2x/librispeech/conf/tuning/decode.yaml
+++ b/examples/other/1xt2x/librispeech/conf/tuning/decode.yaml
--- a/examples/other/1xt2x/librispeech/local/data.sh
+++ b/examples/other/1xt2x/librispeech/local/data.sh
--- a/examples/other/1xt2x/librispeech/local/download_lm_en.sh
+++ b/examples/other/1xt2x/librispeech/local/download_lm_en.sh
--- a/examples/other/1xt2x/librispeech/local/download_model.sh
+++ b/examples/other/1xt2x/librispeech/local/download_model.sh
--- a/examples/other/1xt2x/librispeech/local/test.sh
+++ b/examples/other/1xt2x/librispeech/local/test.sh
--- a/examples/other/1xt2x/librispeech/run.sh
+++ b/examples/other/1xt2x/librispeech/run.sh
--- a/examples/other/1xt2x/src_deepspeech2x/__init__.py
+++ b/examples/other/1xt2x/src_deepspeech2x/__init__.py
--- a/examples/other/1xt2x/src_deepspeech2x/models/ds2/__init__.py
+++ b/examples/other/1xt2x/src_deepspeech2x/models/ds2/__init__.py
--- a/examples/other/1xt2x/src_deepspeech2x/models/ds2/deepspeech2.py
+++ b/examples/other/1xt2x/src_deepspeech2x/models/ds2/deepspeech2.py
--- a/examples/other/1xt2x/src_deepspeech2x/models/ds2/rnn.py
+++ b/examples/other/1xt2x/src_deepspeech2x/models/ds2/rnn.py
--- a/examples/other/1xt2x/src_deepspeech2x/test_model.py
+++ b/examples/other/1xt2x/src_deepspeech2x/test_model.py
--- a/examples/other/mfa/local/reorganize_aishell3.py
+++ b/examples/other/mfa/local/reorganize_aishell3.py
--- a/examples/other/mfa/local/reorganize_baker.py
+++ b/examples/other/mfa/local/reorganize_baker.py
--- a/examples/other/mfa/run.sh
+++ b/examples/other/mfa/run.sh
--- a/examples/ted_en_zh/st0/local/train.sh
+++ b/examples/ted_en_zh/st0/local/train.sh
--- a/examples/ted_en_zh/st0/run.sh
+++ b/examples/ted_en_zh/st0/run.sh
--- a/examples/ted_en_zh/st1/local/train.sh
+++ b/examples/ted_en_zh/st1/local/train.sh
--- a/examples/ted_en_zh/st1/run.sh
+++ b/examples/ted_en_zh/st1/run.sh
--- a/examples/timit/asr1/local/train.sh
+++ b/examples/timit/asr1/local/train.sh
--- a/examples/tiny/asr0/conf/augmentation.json
+++ b/examples/tiny/asr0/conf/augmentation.json
--- a/examples/tiny/asr0/conf/deepspeech2.yaml
+++ b/examples/tiny/asr0/conf/deepspeech2.yaml
--- a/examples/tiny/asr0/conf/deepspeech2_online.yaml
+++ b/examples/tiny/asr0/conf/deepspeech2_online.yaml
--- a/examples/tiny/asr0/conf/preprocess.yaml
+++ b/examples/tiny/asr0/conf/preprocess.yaml
--- a/examples/tiny/asr0/local/export.sh
+++ b/examples/tiny/asr0/local/export.sh
--- a/examples/tiny/asr0/local/test.sh
+++ b/examples/tiny/asr0/local/test.sh
--- a/examples/tiny/asr0/local/train.sh
+++ b/examples/tiny/asr0/local/train.sh
--- a/examples/tiny/asr0/run.sh
+++ b/examples/tiny/asr0/run.sh
--- a/examples/tiny/asr1/local/train.sh
+++ b/examples/tiny/asr1/local/train.sh
--- a/examples/tiny/asr1/run.sh
+++ b/examples/tiny/asr1/run.sh
--- a/examples/vctk/tts3/README.md
+++ b/examples/vctk/tts3/README.md
--- a/examples/vctk/voc1/README.md
+++ b/examples/vctk/voc1/README.md
--- a/examples/vctk/voc5/README.md
+++ b/examples/vctk/voc5/README.md
--- a/examples/voxceleb/sv0/README.md
+++ b/examples/voxceleb/sv0/README.md
--- a/examples/voxceleb/sv0/RESULT.md
+++ b/examples/voxceleb/sv0/RESULT.md
--- a/examples/voxceleb/sv0/conf/ecapa_tdnn.yaml
+++ b/examples/voxceleb/sv0/conf/ecapa_tdnn.yaml
--- a/examples/voxceleb/sv0/conf/ecapa_tdnn_small.yaml
+++ b/examples/voxceleb/sv0/conf/ecapa_tdnn_small.yaml
--- a/examples/voxceleb/sv0/local/data_prepare.py
+++ b/examples/voxceleb/sv0/local/data_prepare.py
--- a/examples/voxceleb/sv0/local/make_rirs_noise_csv_dataset_from_json.py
+++ b/examples/voxceleb/sv0/local/make_rirs_noise_csv_dataset_from_json.py
--- a/examples/voxceleb/sv0/local/make_vox_csv_dataset_from_json.py
+++ b/examples/voxceleb/sv0/local/make_vox_csv_dataset_from_json.py
--- a/examples/voxceleb/sv0/local/test.sh
+++ b/examples/voxceleb/sv0/local/test.sh
--- a/examples/voxceleb/sv0/local/train.sh
+++ b/examples/voxceleb/sv0/local/train.sh
--- a/examples/wenetspeech/asr0/RESULTS.md
+++ b/examples/wenetspeech/asr0/RESULTS.md
--- a/examples/wenetspeech/asr1/RESULTS.md
+++ b/examples/wenetspeech/asr1/RESULTS.md
--- a/examples/wenetspeech/asr1/local/extract_meta.py
+++ b/examples/wenetspeech/asr1/local/extract_meta.py
--- a/audio/paddleaudio/__init__.py
+++ b/audio/paddleaudio/__init__.py
--- a/audio/paddleaudio/backends/__init__.py
+++ b/audio/paddleaudio/backends/__init__.py
--- a/audio/paddleaudio/backends/soundfile_backend.py
+++ b/audio/paddleaudio/backends/soundfile_backend.py
--- a/audio/paddleaudio/backends/sox_backend.py
+++ b/audio/paddleaudio/backends/sox_backend.py
--- a/audio/paddleaudio/compliance/__init__.py
+++ b/audio/paddleaudio/compliance/__init__.py
--- a/audio/paddleaudio/compliance/kaldi.py
+++ b/audio/paddleaudio/compliance/kaldi.py
--- a/audio/paddleaudio/compliance/librosa.py
+++ b/audio/paddleaudio/compliance/librosa.py
--- a/audio/paddleaudio/datasets/__init__.py
+++ b/audio/paddleaudio/datasets/__init__.py
--- a/audio/paddleaudio/datasets/dataset.py
+++ b/audio/paddleaudio/datasets/dataset.py
--- a/audio/paddleaudio/datasets/esc50.py
+++ b/audio/paddleaudio/datasets/esc50.py
--- a/audio/paddleaudio/datasets/gtzan.py
+++ b/audio/paddleaudio/datasets/gtzan.py
--- a/audio/paddleaudio/datasets/hey_snips.py
+++ b/audio/paddleaudio/datasets/hey_snips.py
--- a/audio/paddleaudio/datasets/rirs_noises.py
+++ b/audio/paddleaudio/datasets/rirs_noises.py
--- a/audio/paddleaudio/datasets/tess.py
+++ b/audio/paddleaudio/datasets/tess.py
--- a/audio/paddleaudio/datasets/urban_sound.py
+++ b/audio/paddleaudio/datasets/urban_sound.py
--- a/audio/paddleaudio/datasets/voxceleb.py
+++ b/audio/paddleaudio/datasets/voxceleb.py
--- a/audio/paddleaudio/features/__init__.py
+++ b/audio/paddleaudio/features/__init__.py
--- a/audio/paddleaudio/features/layers.py
+++ b/audio/paddleaudio/features/layers.py
--- a/audio/paddleaudio/functional/__init__.py
+++ b/audio/paddleaudio/functional/__init__.py
--- a/audio/paddleaudio/functional/functional.py
+++ b/audio/paddleaudio/functional/functional.py
--- a/audio/paddleaudio/functional/window.py
+++ b/audio/paddleaudio/functional/window.py
--- a/audio/paddleaudio/io/__init__.py
+++ b/audio/paddleaudio/io/__init__.py
--- a/audio/paddleaudio/metric/__init__.py
+++ b/audio/paddleaudio/metric/__init__.py
--- a/audio/paddleaudio/metric/eer.py
+++ b/audio/paddleaudio/metric/eer.py
--- a/audio/paddleaudio/sox_effects/__init__.py
+++ b/audio/paddleaudio/sox_effects/__init__.py
--- a/audio/paddleaudio/utils/__init__.py
+++ b/audio/paddleaudio/utils/__init__.py
--- a/audio/paddleaudio/utils/download.py
+++ b/audio/paddleaudio/utils/download.py
--- a/audio/paddleaudio/utils/error.py
+++ b/audio/paddleaudio/utils/error.py
--- a/audio/paddleaudio/utils/log.py
+++ b/audio/paddleaudio/utils/log.py
--- a/audio/paddleaudio/utils/numeric.py
+++ b/audio/paddleaudio/utils/numeric.py
--- a/audio/paddleaudio/utils/time.py
+++ b/audio/paddleaudio/utils/time.py
--- a/paddlespeech/cli/__init__.py
+++ b/paddlespeech/cli/__init__.py
--- a/paddlespeech/cli/asr/infer.py
+++ b/paddlespeech/cli/asr/infer.py
--- a/paddlespeech/cli/asr/pretrained_models.py
+++ b/paddlespeech/cli/asr/pretrained_models.py
--- a/paddlespeech/cli/base_commands.py
+++ b/paddlespeech/cli/base_commands.py
--- a/paddlespeech/cli/cls/infer.py
+++ b/paddlespeech/cli/cls/infer.py
--- a/paddlespeech/cli/cls/pretrained_models.py
+++ b/paddlespeech/cli/cls/pretrained_models.py
--- a/paddlespeech/cli/download.py
+++ b/paddlespeech/cli/download.py
--- a/paddlespeech/cli/entry.py
+++ b/paddlespeech/cli/entry.py
--- a/paddlespeech/cli/executor.py
+++ b/paddlespeech/cli/executor.py
--- a/paddlespeech/cli/st/infer.py
+++ b/paddlespeech/cli/st/infer.py
--- a/paddlespeech/cli/stats/__init__.py
+++ b/paddlespeech/cli/stats/__init__.py
--- a/paddlespeech/cli/stats/infer.py
+++ b/paddlespeech/cli/stats/infer.py
--- a/paddlespeech/cli/text/infer.py
+++ b/paddlespeech/cli/text/infer.py
--- a/paddlespeech/cli/tts/infer.py
+++ b/paddlespeech/cli/tts/infer.py
--- a/paddlespeech/cli/tts/pretrained_models.py
+++ b/paddlespeech/cli/tts/pretrained_models.py
--- a/paddlespeech/cli/utils.py
+++ b/paddlespeech/cli/utils.py
--- a/paddlespeech/cli/vector/infer.py
+++ b/paddlespeech/cli/vector/infer.py
--- a/paddlespeech/cli/vector/pretrained_models.py
+++ b/paddlespeech/cli/vector/pretrained_models.py
--- a/paddlespeech/cls/exps/panns/deploy/predict.py
+++ b/paddlespeech/cls/exps/panns/deploy/predict.py
--- a/paddlespeech/cls/exps/panns/export_model.py
+++ b/paddlespeech/cls/exps/panns/export_model.py
--- a/paddlespeech/cls/exps/panns/predict.py
+++ b/paddlespeech/cls/exps/panns/predict.py
--- a/paddlespeech/cls/exps/panns/train.py
+++ b/paddlespeech/cls/exps/panns/train.py
--- a/paddlespeech/cls/models/panns/panns.py
+++ b/paddlespeech/cls/models/panns/panns.py
--- a/paddlespeech/kws/exps/mdtc/compute_det.py
+++ b/paddlespeech/kws/exps/mdtc/compute_det.py
--- a/paddlespeech/kws/exps/mdtc/plot_det_curve.py
+++ b/paddlespeech/kws/exps/mdtc/plot_det_curve.py
--- a/paddlespeech/kws/exps/mdtc/score.py
+++ b/paddlespeech/kws/exps/mdtc/score.py
--- a/paddlespeech/kws/exps/mdtc/train.py
+++ b/paddlespeech/kws/exps/mdtc/train.py
--- a/paddlespeech/kws/models/loss.py
+++ b/paddlespeech/kws/models/loss.py
--- a/paddlespeech/kws/models/mdtc.py
+++ b/paddlespeech/kws/models/mdtc.py
--- a/paddlespeech/resource/__init__.py
+++ b/paddlespeech/resource/__init__.py
--- a/paddlespeech/resource/model_alias.py
+++ b/paddlespeech/resource/model_alias.py
--- a/paddlespeech/resource/pretrained_models.py
+++ b/paddlespeech/resource/pretrained_models.py
--- a/paddlespeech/resource/resource.py
+++ b/paddlespeech/resource/resource.py
--- a/paddlespeech/s2t/__init__.py
+++ b/paddlespeech/s2t/__init__.py
--- a/paddlespeech/s2t/decoders/beam_search/beam_search.py
+++ b/paddlespeech/s2t/decoders/beam_search/beam_search.py
--- a/paddlespeech/s2t/decoders/scorers/ctc.py
+++ b/paddlespeech/s2t/decoders/scorers/ctc.py
--- a/paddlespeech/s2t/decoders/scorers/ctc_prefix_score.py
+++ b/paddlespeech/s2t/decoders/scorers/ctc_prefix_score.py
--- a/paddlespeech/s2t/exps/deepspeech2/bin/export.py
+++ b/paddlespeech/s2t/exps/deepspeech2/bin/export.py
--- a/paddlespeech/s2t/exps/deepspeech2/bin/test.py
+++ b/paddlespeech/s2t/exps/deepspeech2/bin/test.py
--- a/paddlespeech/s2t/exps/deepspeech2/bin/test_export.py
+++ b/paddlespeech/s2t/exps/deepspeech2/bin/test_export.py
--- a/paddlespeech/s2t/exps/deepspeech2/bin/test_wav.py
+++ b/paddlespeech/s2t/exps/deepspeech2/bin/test_wav.py
--- a/paddlespeech/s2t/exps/deepspeech2/bin/train.py
+++ b/paddlespeech/s2t/exps/deepspeech2/bin/train.py
--- a/paddlespeech/s2t/exps/deepspeech2/model.py
+++ b/paddlespeech/s2t/exps/deepspeech2/model.py
--- a/paddlespeech/s2t/exps/u2/bin/train.py
+++ b/paddlespeech/s2t/exps/u2/bin/train.py
--- a/paddlespeech/s2t/exps/u2_kaldi/bin/train.py
+++ b/paddlespeech/s2t/exps/u2_kaldi/bin/train.py
--- a/paddlespeech/s2t/exps/u2_st/bin/train.py
+++ b/paddlespeech/s2t/exps/u2_st/bin/train.py
--- a/paddlespeech/s2t/frontend/featurizer/audio_featurizer.py
+++ b/paddlespeech/s2t/frontend/featurizer/audio_featurizer.py
--- a/paddlespeech/s2t/io/dataset.py
+++ b/paddlespeech/s2t/io/dataset.py
--- a/paddlespeech/s2t/models/ds2/__init__.py
+++ b/paddlespeech/s2t/models/ds2/__init__.py
--- a/paddlespeech/s2t/models/ds2/conv.py
+++ b/paddlespeech/s2t/models/ds2/conv.py
--- a/paddlespeech/s2t/models/ds2/deepspeech2.py
+++ b/paddlespeech/s2t/models/ds2/deepspeech2.py
--- a/paddlespeech/s2t/models/ds2/rnn.py
+++ b/paddlespeech/s2t/models/ds2/rnn.py
--- a/paddlespeech/s2t/models/ds2_online/__init__.py
+++ b/paddlespeech/s2t/models/ds2_online/__init__.py
--- a/paddlespeech/s2t/models/ds2_online/conv.py
+++ b/paddlespeech/s2t/models/ds2_online/conv.py
--- a/paddlespeech/s2t/models/ds2_online/deepspeech2.py
+++ b/paddlespeech/s2t/models/ds2_online/deepspeech2.py
--- a/paddlespeech/s2t/models/lm/transformer.py
+++ b/paddlespeech/s2t/models/lm/transformer.py
--- a/paddlespeech/s2t/models/u2/u2.py
+++ b/paddlespeech/s2t/models/u2/u2.py
--- a/paddlespeech/s2t/models/u2/updater.py
+++ b/paddlespeech/s2t/models/u2/updater.py
--- a/paddlespeech/s2t/modules/ctc.py
+++ b/paddlespeech/s2t/modules/ctc.py
--- a/paddlespeech/s2t/modules/decoder.py
+++ b/paddlespeech/s2t/modules/decoder.py
--- a/paddlespeech/s2t/modules/embedding.py
+++ b/paddlespeech/s2t/modules/embedding.py
--- a/paddlespeech/s2t/modules/encoder.py
+++ b/paddlespeech/s2t/modules/encoder.py
--- a/paddlespeech/s2t/training/trainer.py
+++ b/paddlespeech/s2t/training/trainer.py
--- a/paddlespeech/s2t/transform/perturb.py
+++ b/paddlespeech/s2t/transform/perturb.py
--- a/paddlespeech/s2t/transform/spectrogram.py
+++ b/paddlespeech/s2t/transform/spectrogram.py
--- a/paddlespeech/s2t/utils/ctc_utils.py
+++ b/paddlespeech/s2t/utils/ctc_utils.py
--- a/paddlespeech/s2t/utils/tensor_utils.py
+++ b/paddlespeech/s2t/utils/tensor_utils.py
--- a/paddlespeech/s2t/utils/text_grid.py
+++ b/paddlespeech/s2t/utils/text_grid.py
--- a/paddlespeech/server/README.md
+++ b/paddlespeech/server/README.md
--- a/paddlespeech/server/README_cn.md
+++ b/paddlespeech/server/README_cn.md
--- a/paddlespeech/server/bin/paddlespeech_client.py
+++ b/paddlespeech/server/bin/paddlespeech_client.py
--- a/paddlespeech/server/bin/paddlespeech_server.py
+++ b/paddlespeech/server/bin/paddlespeech_server.py
--- a/paddlespeech/server/conf/application.yaml
+++ b/paddlespeech/server/conf/application.yaml
--- a/paddlespeech/server/conf/tts_online_application.yaml
+++ b/paddlespeech/server/conf/tts_online_application.yaml
--- a/paddlespeech/server/conf/vector_application.yaml
+++ b/paddlespeech/server/conf/vector_application.yaml
--- a/paddlespeech/server/conf/ws_conformer_application.yaml
+++ b/paddlespeech/server/conf/ws_conformer_application.yaml
--- a/paddlespeech/server/conf/ws_conformer_wenetspeech_application_faster.yaml
+++ b/paddlespeech/server/conf/ws_conformer_wenetspeech_application_faster.yaml
--- a/paddlespeech/server/conf/ws_ds2_application.yaml
+++ b/paddlespeech/server/conf/ws_ds2_application.yaml
--- a/paddlespeech/server/download.py
+++ b/paddlespeech/server/download.py
--- a/audio/tests/backends/__init__.py
+++ b/audio/tests/backends/__init__.py
--- a/audio/tests/backends/soundfile/__init__.py
+++ b/audio/tests/backends/soundfile/__init__.py
--- a/paddlespeech/server/engine/acs/python/acs_engine.py
+++ b/paddlespeech/server/engine/acs/python/acs_engine.py
--- a/paddlespeech/server/engine/asr/online/ctc_endpoint.py
+++ b/paddlespeech/server/engine/asr/online/ctc_endpoint.py
--- a/paddlespeech/server/engine/asr/online/ctc_search.py
+++ b/paddlespeech/server/engine/asr/online/ctc_search.py
--- a/audio/tests/features/__init__.py
+++ b/audio/tests/features/__init__.py
--- a/paddlespeech/server/engine/asr/online/onnx/asr_engine.py
+++ b/paddlespeech/server/engine/asr/online/onnx/asr_engine.py
--- a/paddlespeech/server/engine/asr/online/paddleinference/__init__.py
+++ b/paddlespeech/server/engine/asr/online/paddleinference/__init__.py
--- a/paddlespeech/server/engine/asr/online/paddleinference/asr_engine.py
+++ b/paddlespeech/server/engine/asr/online/paddleinference/asr_engine.py
--- a/paddlespeech/server/engine/asr/online/python/__init__.py
+++ b/paddlespeech/server/engine/asr/online/python/__init__.py
--- a/paddlespeech/server/engine/asr/online/asr_engine.py
+++ b/paddlespeech/server/engine/asr/online/asr_engine.py
--- a/paddlespeech/server/engine/asr/paddleinference/asr_engine.py
+++ b/paddlespeech/server/engine/asr/paddleinference/asr_engine.py
--- a/paddlespeech/server/engine/asr/python/asr_engine.py
+++ b/paddlespeech/server/engine/asr/python/asr_engine.py
--- a/paddlespeech/server/engine/cls/paddleinference/cls_engine.py
+++ b/paddlespeech/server/engine/cls/paddleinference/cls_engine.py
--- a/paddlespeech/server/engine/cls/python/cls_engine.py
+++ b/paddlespeech/server/engine/cls/python/cls_engine.py
--- a/paddlespeech/server/engine/engine_factory.py
+++ b/paddlespeech/server/engine/engine_factory.py
--- a/paddlespeech/server/engine/engine_pool.py
+++ b/paddlespeech/server/engine/engine_pool.py
--- a/paddlespeech/server/engine/engine_warmup.py
+++ b/paddlespeech/server/engine/engine_warmup.py
--- a/paddlespeech/server/engine/tts/online/onnx/tts_engine.py
+++ b/paddlespeech/server/engine/tts/online/onnx/tts_engine.py
--- a/paddlespeech/server/engine/tts/online/python/tts_engine.py
+++ b/paddlespeech/server/engine/tts/online/python/tts_engine.py
--- a/paddlespeech/server/engine/tts/paddleinference/tts_engine.py
+++ b/paddlespeech/server/engine/tts/paddleinference/tts_engine.py
--- a/paddlespeech/server/engine/tts/python/tts_engine.py
+++ b/paddlespeech/server/engine/tts/python/tts_engine.py
--- a/paddlespeech/server/engine/vector/__init__.py
+++ b/paddlespeech/server/engine/vector/__init__.py
--- a/paddlespeech/server/engine/vector/python/__init__.py
+++ b/paddlespeech/server/engine/vector/python/__init__.py
--- a/paddlespeech/server/engine/vector/python/vector_engine.py
+++ b/paddlespeech/server/engine/vector/python/vector_engine.py
--- a/paddlespeech/server/restful/acs_api.py
+++ b/paddlespeech/server/restful/acs_api.py
--- a/paddlespeech/server/restful/api.py
+++ b/paddlespeech/server/restful/api.py
--- a/paddlespeech/server/restful/asr_api.py
+++ b/paddlespeech/server/restful/asr_api.py
--- a/paddlespeech/server/restful/cls_api.py
+++ b/paddlespeech/server/restful/cls_api.py
--- a/paddlespeech/server/restful/request.py
+++ b/paddlespeech/server/restful/request.py
--- a/paddlespeech/server/restful/response.py
+++ b/paddlespeech/server/restful/response.py
--- a/paddlespeech/server/restful/tts_api.py
+++ b/paddlespeech/server/restful/tts_api.py
--- a/paddlespeech/server/restful/vector_api.py
+++ b/paddlespeech/server/restful/vector_api.py
--- a/paddlespeech/server/tests/tts/offline/http_client.py
+++ b/paddlespeech/server/tests/tts/offline/http_client.py
--- a/paddlespeech/server/tests/tts/online/http_client.py
+++ b/paddlespeech/server/tests/tts/online/http_client.py
--- a/paddlespeech/server/tests/tts/online/ws_client.py
+++ b/paddlespeech/server/tests/tts/online/ws_client.py
--- a/paddlespeech/server/util.py
+++ b/paddlespeech/server/util.py
--- a/paddlespeech/server/utils/audio_handler.py
+++ b/paddlespeech/server/utils/audio_handler.py
--- a/paddlespeech/server/utils/audio_process.py
+++ b/paddlespeech/server/utils/audio_process.py
--- a/paddlespeech/server/utils/buffer.py
+++ b/paddlespeech/server/utils/buffer.py
--- a/paddlespeech/server/utils/onnx_infer.py
+++ b/paddlespeech/server/utils/onnx_infer.py
--- a/paddlespeech/server/utils/util.py
+++ b/paddlespeech/server/utils/util.py
--- a/paddlespeech/server/ws/api.py
+++ b/paddlespeech/server/ws/api.py
--- a/paddlespeech/server/ws/asr_socket.py
+++ b/paddlespeech/server/ws/asr_socket.py
--- a/paddlespeech/server/ws/tts_api.py
+++ b/paddlespeech/server/ws/tts_api.py
--- a/paddlespeech/t2s/datasets/am_batch_fn.py
+++ b/paddlespeech/t2s/datasets/am_batch_fn.py
--- a/paddlespeech/t2s/datasets/batch.py
+++ b/paddlespeech/t2s/datasets/batch.py
--- a/paddlespeech/t2s/datasets/get_feats.py
+++ b/paddlespeech/t2s/datasets/get_feats.py
--- a/paddlespeech/t2s/exps/fastspeech2/preprocess.py
+++ b/paddlespeech/t2s/exps/fastspeech2/preprocess.py
--- a/paddlespeech/t2s/exps/gan_vocoder/hifigan/train.py
+++ b/paddlespeech/t2s/exps/gan_vocoder/hifigan/train.py
--- a/paddlespeech/t2s/exps/gan_vocoder/multi_band_melgan/train.py
+++ b/paddlespeech/t2s/exps/gan_vocoder/multi_band_melgan/train.py
--- a/paddlespeech/t2s/exps/gan_vocoder/parallelwave_gan/train.py
+++ b/paddlespeech/t2s/exps/gan_vocoder/parallelwave_gan/train.py
--- a/paddlespeech/t2s/exps/gan_vocoder/preprocess.py
+++ b/paddlespeech/t2s/exps/gan_vocoder/preprocess.py
--- a/paddlespeech/t2s/exps/gan_vocoder/style_melgan/train.py
+++ b/paddlespeech/t2s/exps/gan_vocoder/style_melgan/train.py
--- a/paddlespeech/t2s/exps/inference_streaming.py
+++ b/paddlespeech/t2s/exps/inference_streaming.py
--- a/paddlespeech/t2s/exps/ort_predict_streaming.py
+++ b/paddlespeech/t2s/exps/ort_predict_streaming.py
--- a/paddlespeech/t2s/exps/speedyspeech/preprocess.py
+++ b/paddlespeech/t2s/exps/speedyspeech/preprocess.py
--- a/paddlespeech/t2s/exps/speedyspeech/synthesize_e2e.py
+++ b/paddlespeech/t2s/exps/speedyspeech/synthesize_e2e.py
--- a/paddlespeech/t2s/exps/speedyspeech/train.py
+++ b/paddlespeech/t2s/exps/speedyspeech/train.py
--- a/paddlespeech/t2s/exps/syn_utils.py
+++ b/paddlespeech/t2s/exps/syn_utils.py
--- a/paddlespeech/t2s/exps/synthesize.py
+++ b/paddlespeech/t2s/exps/synthesize.py
--- a/paddlespeech/t2s/exps/synthesize_e2e.py
+++ b/paddlespeech/t2s/exps/synthesize_e2e.py
--- a/paddlespeech/t2s/exps/synthesize_streaming.py
+++ b/paddlespeech/t2s/exps/synthesize_streaming.py
--- a/paddlespeech/t2s/exps/tacotron2/preprocess.py
+++ b/paddlespeech/t2s/exps/tacotron2/preprocess.py
--- a/paddlespeech/t2s/exps/transformer_tts/preprocess.py
+++ b/paddlespeech/t2s/exps/transformer_tts/preprocess.py
--- a/paddlespeech/t2s/exps/transformer_tts/train.py
+++ b/paddlespeech/t2s/exps/transformer_tts/train.py
--- a/paddlespeech/t2s/exps/vits/normalize.py
+++ b/paddlespeech/t2s/exps/vits/normalize.py
--- a/paddlespeech/t2s/exps/vits/preprocess.py
+++ b/paddlespeech/t2s/exps/vits/preprocess.py
--- a/paddlespeech/t2s/exps/vits/synthesize.py
+++ b/paddlespeech/t2s/exps/vits/synthesize.py
--- a/paddlespeech/t2s/exps/vits/synthesize_e2e.py
+++ b/paddlespeech/t2s/exps/vits/synthesize_e2e.py
--- a/paddlespeech/t2s/exps/vits/train.py
+++ b/paddlespeech/t2s/exps/vits/train.py
--- a/paddlespeech/t2s/exps/voice_cloning.py
+++ b/paddlespeech/t2s/exps/voice_cloning.py
--- a/paddlespeech/t2s/exps/wavernn/train.py
+++ b/paddlespeech/t2s/exps/wavernn/train.py
--- a/paddlespeech/t2s/frontend/tone_sandhi.py
+++ b/paddlespeech/t2s/frontend/tone_sandhi.py
--- a/paddlespeech/t2s/frontend/zh_frontend.py
+++ b/paddlespeech/t2s/frontend/zh_frontend.py
--- a/paddlespeech/t2s/frontend/zh_normalization/num.py
+++ b/paddlespeech/t2s/frontend/zh_normalization/num.py
--- a/paddlespeech/t2s/models/__init__.py
+++ b/paddlespeech/t2s/models/__init__.py
--- a/paddlespeech/t2s/models/hifigan/hifigan.py
+++ b/paddlespeech/t2s/models/hifigan/hifigan.py
--- a/paddlespeech/t2s/models/parallel_wavegan/parallel_wavegan_updater.py
+++ b/paddlespeech/t2s/models/parallel_wavegan/parallel_wavegan_updater.py
--- a/paddlespeech/t2s/models/speedyspeech/speedyspeech_updater.py
+++ b/paddlespeech/t2s/models/speedyspeech/speedyspeech_updater.py
--- a/paddlespeech/t2s/models/vits/__init__.py
+++ b/paddlespeech/t2s/models/vits/__init__.py
--- a/paddlespeech/t2s/models/vits/duration_predictor.py
+++ b/paddlespeech/t2s/models/vits/duration_predictor.py
--- a/paddlespeech/t2s/models/vits/flow.py
+++ b/paddlespeech/t2s/models/vits/flow.py
--- a/paddlespeech/t2s/models/vits/generator.py
+++ b/paddlespeech/t2s/models/vits/generator.py
--- a/paddlespeech/t2s/models/vits/monotonic_align/__init__.py
+++ b/paddlespeech/t2s/models/vits/monotonic_align/__init__.py
--- a/paddlespeech/server/ws/tts_socket.py
+++ b/paddlespeech/server/ws/tts_socket.py
--- a/paddlespeech/cli/st/pretrained_models.py
+++ b/paddlespeech/cli/st/pretrained_models.py
--- a/paddlespeech/t2s/models/vits/posterior_encoder.py
+++ b/paddlespeech/t2s/models/vits/posterior_encoder.py
--- a/paddlespeech/t2s/models/vits/residual_coupling.py
+++ b/paddlespeech/t2s/models/vits/residual_coupling.py
--- a/paddlespeech/t2s/models/vits/text_encoder.py
+++ b/paddlespeech/t2s/models/vits/text_encoder.py
--- a/paddlespeech/t2s/models/vits/transform.py
+++ b/paddlespeech/t2s/models/vits/transform.py
--- a/paddlespeech/t2s/models/vits/vits.py
+++ b/paddlespeech/t2s/models/vits/vits.py
--- a/paddlespeech/t2s/models/vits/vits_updater.py
+++ b/paddlespeech/t2s/models/vits/vits_updater.py
--- a/paddlespeech/t2s/models/vits/wavenet/__init__.py
+++ b/paddlespeech/t2s/models/vits/wavenet/__init__.py
--- a/paddlespeech/t2s/models/vits/wavenet/residual_block.py
+++ b/paddlespeech/t2s/models/vits/wavenet/residual_block.py
--- a/paddlespeech/t2s/models/vits/wavenet/wavenet.py
+++ b/paddlespeech/t2s/models/vits/wavenet/wavenet.py
--- a/paddlespeech/t2s/modules/losses.py
+++ b/paddlespeech/t2s/modules/losses.py
--- a/paddlespeech/t2s/modules/nets_utils.py
+++ b/paddlespeech/t2s/modules/nets_utils.py
--- a/paddlespeech/t2s/training/optimizer.py
+++ b/paddlespeech/t2s/training/optimizer.py
--- a/paddlespeech/t2s/utils/timeline.py
+++ b/paddlespeech/t2s/utils/timeline.py
--- a/examples/other/1xt2x/src_deepspeech2x/models/__init__.py
+++ b/examples/other/1xt2x/src_deepspeech2x/models/__init__.py
--- a/paddlespeech/t2s/utils/profile.py
+++ b/paddlespeech/t2s/utils/profile.py
--- a/paddlespeech/vector/exps/ecapa_tdnn/extract_emb.py
+++ b/paddlespeech/vector/exps/ecapa_tdnn/extract_emb.py
--- a/paddlespeech/vector/exps/ecapa_tdnn/test.py
+++ b/paddlespeech/vector/exps/ecapa_tdnn/test.py
--- a/paddlespeech/vector/exps/ecapa_tdnn/train.py
+++ b/paddlespeech/vector/exps/ecapa_tdnn/train.py
--- a/paddlespeech/vector/io/dataset.py
+++ b/paddlespeech/vector/io/dataset.py
--- a/paddlespeech/vector/io/dataset_from_json.py
+++ b/paddlespeech/vector/io/dataset_from_json.py
--- a/setup.py
+++ b/setup.py
--- a/speechx/CMakeLists.txt
+++ b/speechx/CMakeLists.txt
--- a/speechx/README.md
+++ b/speechx/README.md
--- a/speechx/examples/CMakeLists.txt
+++ b/speechx/examples/CMakeLists.txt
--- a/speechx/examples/README.md
+++ b/speechx/examples/README.md
--- a/speechx/examples/codelab/README.md
+++ b/speechx/examples/codelab/README.md
--- a/speechx/examples/ds2_ol/decoder/.gitignore
+++ b/speechx/examples/ds2_ol/decoder/.gitignore
--- a/speechx/examples/ds2_ol/decoder/README.md
+++ b/speechx/examples/ds2_ol/decoder/README.md
--- a/speechx/examples/ds2_ol/decoder/path.sh
+++ b/speechx/examples/ds2_ol/decoder/path.sh
--- a/speechx/examples/ds2_ol/decoder/run.sh
+++ b/speechx/examples/ds2_ol/decoder/run.sh
--- a/speechx/examples/ds2_ol/decoder/valgrind.sh
+++ b/speechx/examples/ds2_ol/decoder/valgrind.sh
--- a/speechx/examples/ds2_ol/feat/README.md
+++ b/speechx/examples/ds2_ol/feat/README.md
--- a/speechx/examples/dev/glog/path.sh
+++ b/speechx/examples/dev/glog/path.sh
--- a/speechx/examples/ds2_ol/feat/run.sh
+++ b/speechx/examples/ds2_ol/feat/run.sh
--- a/speechx/examples/ds2_ol/feat/valgrind.sh
+++ b/speechx/examples/ds2_ol/feat/valgrind.sh
--- a/speechx/examples/ds2_ol/nnet/.gitignore
+++ b/speechx/examples/ds2_ol/nnet/.gitignore
--- a/speechx/examples/ds2_ol/nnet/README.md
+++ b/speechx/examples/ds2_ol/nnet/README.md
--- a/speechx/examples/ds2_ol/feat/path.sh
+++ b/speechx/examples/ds2_ol/feat/path.sh
--- a/speechx/examples/ds2_ol/nnet/run.sh
+++ b/speechx/examples/ds2_ol/nnet/run.sh
--- a/speechx/examples/ds2_ol/nnet/valgrind.sh
+++ b/speechx/examples/ds2_ol/nnet/valgrind.sh
--- a/speechx/examples/custom_asr/README.md
+++ b/speechx/examples/custom_asr/README.md
--- a/speechx/examples/custom_asr/local/compile_lexicon_token_fst.sh
+++ b/speechx/examples/custom_asr/local/compile_lexicon_token_fst.sh
--- a/paddlespeech/cli/text/pretrained_models.py
+++ b/paddlespeech/cli/text/pretrained_models.py
--- a/speechx/examples/custom_asr/local/mk_tlg_with_slot.sh
+++ b/speechx/examples/custom_asr/local/mk_tlg_with_slot.sh
--- a/speechx/examples/custom_asr/local/train_lm_with_slot.sh
+++ b/speechx/examples/custom_asr/local/train_lm_with_slot.sh
--- a/speechx/examples/ngram/zh/path.sh
+++ b/speechx/examples/ngram/zh/path.sh
--- a/speechx/examples/custom_asr/run.sh
+++ b/speechx/examples/custom_asr/run.sh
--- a/speechx/examples/custom_asr/utils
+++ b/speechx/examples/custom_asr/utils
--- a/speechx/examples/dev/glog/CMakeLists.txt
+++ b/speechx/examples/dev/glog/CMakeLists.txt
--- a/speechx/examples/dev/glog/run.sh
+++ b/speechx/examples/dev/glog/run.sh
--- a/speechx/examples/ds2_ol/CMakeLists.txt
+++ b/speechx/examples/ds2_ol/CMakeLists.txt
--- a/speechx/examples/ds2_ol/README.md
+++ b/speechx/examples/ds2_ol/README.md
--- a/speechx/examples/ds2_ol/aishell/README.md
+++ b/speechx/examples/ds2_ol/aishell/README.md
--- a/speechx/examples/ngram/zh/local/aishell_train_lms.sh
+++ b/speechx/examples/ngram/zh/local/aishell_train_lms.sh
--- a/speechx/examples/ds2_ol/aishell/path.sh
+++ b/speechx/examples/ds2_ol/aishell/path.sh
--- a/speechx/examples/ds2_ol/aishell/run.sh
+++ b/speechx/examples/ds2_ol/aishell/run.sh
--- a/speechx/examples/ngram/zh/run.sh
+++ b/speechx/examples/ngram/zh/run.sh
--- a/speechx/examples/ds2_ol/aishell/run_fbank.sh
+++ b/speechx/examples/ds2_ol/aishell/run_fbank.sh
--- a/speechx/examples/ds2_ol/decoder/CMakeLists.txt
+++ b/speechx/examples/ds2_ol/decoder/CMakeLists.txt
--- a/speechx/examples/ds2_ol/decoder/local/model.sh
+++ b/speechx/examples/ds2_ol/decoder/local/model.sh
--- a/speechx/examples/ds2_ol/feat/CMakeLists.txt
+++ b/speechx/examples/ds2_ol/feat/CMakeLists.txt
--- a/speechx/examples/ds2_ol/feat/.gitignore
+++ b/speechx/examples/ds2_ol/feat/.gitignore
--- a/speechx/examples/ds2_ol/onnx/README.md
+++ b/speechx/examples/ds2_ol/onnx/README.md
--- a/speechx/examples/ds2_ol/onnx/local/infer_check.py
+++ b/speechx/examples/ds2_ol/onnx/local/infer_check.py
--- a/speechx/examples/ds2_ol/onnx/local/netron.sh
+++ b/speechx/examples/ds2_ol/onnx/local/netron.sh
--- a/speechx/examples/ds2_ol/onnx/local/onnx_clone.sh
+++ b/speechx/examples/ds2_ol/onnx/local/onnx_clone.sh
--- a/speechx/examples/ds2_ol/onnx/local/onnx_infer_shape.py
+++ b/speechx/examples/ds2_ol/onnx/local/onnx_infer_shape.py
--- a/speechx/examples/ds2_ol/onnx/local/onnx_opt.sh
+++ b/speechx/examples/ds2_ol/onnx/local/onnx_opt.sh
--- a/speechx/examples/ds2_ol/onnx/local/onnx_prune_model.py
+++ b/speechx/examples/ds2_ol/onnx/local/onnx_prune_model.py
--- a/speechx/examples/ds2_ol/onnx/local/onnx_rename_model.py
+++ b/speechx/examples/ds2_ol/onnx/local/onnx_rename_model.py
--- a/speechx/examples/ds2_ol/onnx/local/ort_opt.py
+++ b/speechx/examples/ds2_ol/onnx/local/ort_opt.py
--- a/speechx/examples/ds2_ol/onnx/local/pd_infer_shape.py
+++ b/speechx/examples/ds2_ol/onnx/local/pd_infer_shape.py
--- a/speechx/examples/ds2_ol/onnx/local/pd_prune_model.py
+++ b/speechx/examples/ds2_ol/onnx/local/pd_prune_model.py
--- a/speechx/examples/ds2_ol/onnx/local/prune.sh
+++ b/speechx/examples/ds2_ol/onnx/local/prune.sh
--- a/speechx/examples/ds2_ol/onnx/local/tonnx.sh
+++ b/speechx/examples/ds2_ol/onnx/local/tonnx.sh
--- a/speechx/examples/ds2_ol/nnet/path.sh
+++ b/speechx/examples/ds2_ol/nnet/path.sh
--- a/speechx/examples/ds2_ol/onnx/run.sh
+++ b/speechx/examples/ds2_ol/onnx/run.sh
--- a/speechx/examples/ngram/zh/utils
+++ b/speechx/examples/ngram/zh/utils
--- a/speechx/examples/ds2_ol/websocket/path.sh
+++ b/speechx/examples/ds2_ol/websocket/path.sh
--- a/speechx/examples/ds2_ol/websocket/websocket_client.sh
+++ b/speechx/examples/ds2_ol/websocket/websocket_client.sh
--- a/speechx/examples/ds2_ol/websocket/websocket_server.sh
+++ b/speechx/examples/ds2_ol/websocket/websocket_server.sh
--- a/speechx/examples/ngram/en/README.md
+++ b/speechx/examples/ngram/en/README.md
--- a/speechx/examples/ngram/zh/README.md
+++ b/speechx/examples/ngram/zh/README.md
--- a/speechx/examples/ngram/zh/local/split_data.sh
+++ b/speechx/examples/ngram/zh/local/split_data.sh
--- a/speechx/examples/wfst/.gitignore
+++ b/speechx/examples/wfst/.gitignore
--- a/speechx/examples/wfst/README.md
+++ b/speechx/examples/wfst/README.md
--- a/speechx/examples/wfst/path.sh
+++ b/speechx/examples/wfst/path.sh
--- a/speechx/examples/wfst/run.sh
+++ b/speechx/examples/wfst/run.sh
--- a/speechx/examples/wfst/utils
+++ b/speechx/examples/wfst/utils
--- a/speechx/patch/README.md
+++ b/speechx/patch/README.md
--- a/speechx/speechx/CMakeLists.txt
+++ b/speechx/speechx/CMakeLists.txt
--- a/speechx/examples/dev/CMakeLists.txt
+++ b/speechx/examples/dev/CMakeLists.txt
--- a/speechx/speechx/codelab/README.md
+++ b/speechx/speechx/codelab/README.md
--- a/speechx/speechx/codelab/glog/CMakeLists.txt
+++ b/speechx/speechx/codelab/glog/CMakeLists.txt
--- a/speechx/examples/dev/glog/README.md
+++ b/speechx/examples/dev/glog/README.md
--- a/speechx/examples/dev/glog/glog_logtostderr_test.cc
+++ b/speechx/examples/dev/glog/glog_logtostderr_test.cc
--- a/speechx/examples/dev/glog/glog_test.cc
+++ b/speechx/examples/dev/glog/glog_test.cc
--- a/speechx/examples/ds2_ol/nnet/CMakeLists.txt
+++ b/speechx/examples/ds2_ol/nnet/CMakeLists.txt
--- a/speechx/examples/ds2_ol/nnet/ds2-model-ol-test.cc
+++ b/speechx/examples/ds2_ol/nnet/ds2-model-ol-test.cc
--- a/speechx/speechx/decoder/CMakeLists.txt
+++ b/speechx/speechx/decoder/CMakeLists.txt
--- a/speechx/examples/ds2_ol/decoder/ctc-prefix-beam-search-decoder-ol.cc
+++ b/speechx/examples/ds2_ol/decoder/ctc-prefix-beam-search-decoder-ol.cc
--- a/speechx/speechx/decoder/ctc_tlg_decoder.cc
+++ b/speechx/speechx/decoder/ctc_tlg_decoder.cc
--- a/speechx/speechx/decoder/ctc_tlg_decoder.h
+++ b/speechx/speechx/decoder/ctc_tlg_decoder.h
--- a/speechx/examples/ds2_ol/decoder/nnet-logprob-decoder-test.cc
+++ b/speechx/examples/ds2_ol/decoder/nnet-logprob-decoder-test.cc
--- a/speechx/speechx/decoder/param.h
+++ b/speechx/speechx/decoder/param.h
--- a/speechx/speechx/decoder/recognizer.cc
+++ b/speechx/speechx/decoder/recognizer.cc
--- a/speechx/speechx/decoder/recognizer.h
+++ b/speechx/speechx/decoder/recognizer.h
--- a/speechx/examples/ds2_ol/decoder/recognizer_test_main.cc
+++ b/speechx/examples/ds2_ol/decoder/recognizer_test_main.cc
--- a/speechx/examples/ds2_ol/decoder/wfst-decoder-ol.cc
+++ b/speechx/examples/ds2_ol/decoder/wfst-decoder-ol.cc
--- a/speechx/speechx/frontend/audio/CMakeLists.txt
+++ b/speechx/speechx/frontend/audio/CMakeLists.txt
--- a/speechx/speechx/frontend/audio/assembler.cc
+++ b/speechx/speechx/frontend/audio/assembler.cc
--- a/speechx/speechx/frontend/audio/assembler.h
+++ b/speechx/speechx/frontend/audio/assembler.h
--- a/speechx/speechx/frontend/audio/audio_cache.h
+++ b/speechx/speechx/frontend/audio/audio_cache.h
--- a/speechx/examples/ds2_ol/feat/cmvn-json2kaldi.cc
+++ b/speechx/examples/ds2_ol/feat/cmvn-json2kaldi.cc
--- a/speechx/speechx/frontend/audio/compute_fbank_main.cc
+++ b/speechx/speechx/frontend/audio/compute_fbank_main.cc
--- a/speechx/examples/ds2_ol/feat/linear-spectrogram-wo-db-norm-ol.cc
+++ b/speechx/examples/ds2_ol/feat/linear-spectrogram-wo-db-norm-ol.cc
--- a/speechx/speechx/frontend/audio/fbank.cc
+++ b/speechx/speechx/frontend/audio/fbank.cc
--- a/speechx/speechx/frontend/audio/fbank.h
+++ b/speechx/speechx/frontend/audio/fbank.h
--- a/speechx/speechx/frontend/audio/feature_cache.cc
+++ b/speechx/speechx/frontend/audio/feature_cache.cc
--- a/speechx/speechx/frontend/audio/feature_cache.h
+++ b/speechx/speechx/frontend/audio/feature_cache.h
--- a/speechx/speechx/frontend/audio/feature_common.h
+++ b/speechx/speechx/frontend/audio/feature_common.h
--- a/speechx/speechx/frontend/audio/feature_common_inl.h
+++ b/speechx/speechx/frontend/audio/feature_common_inl.h
--- a/speechx/speechx/frontend/audio/feature_pipeline.cc
+++ b/speechx/speechx/frontend/audio/feature_pipeline.cc
--- a/speechx/speechx/frontend/audio/feature_pipeline.h
+++ b/speechx/speechx/frontend/audio/feature_pipeline.h
--- a/speechx/speechx/frontend/audio/linear_spectrogram.cc
+++ b/speechx/speechx/frontend/audio/linear_spectrogram.cc
--- a/speechx/speechx/frontend/audio/linear_spectrogram.h
+++ b/speechx/speechx/frontend/audio/linear_spectrogram.h
--- a/speechx/speechx/kaldi/CMakeLists.txt
+++ b/speechx/speechx/kaldi/CMakeLists.txt
--- a/speechx/speechx/kaldi/feat/CMakeLists.txt
+++ b/speechx/speechx/kaldi/feat/CMakeLists.txt
--- a/speechx/speechx/kaldi/feat/feature-fbank.h
+++ b/speechx/speechx/kaldi/feat/feature-fbank.h
--- a/speechx/speechx/kaldi/feat/mel-computations.cc
+++ b/speechx/speechx/kaldi/feat/mel-computations.cc
--- a/speechx/speechx/kaldi/fstbin/CMakeLists.txt
+++ b/speechx/speechx/kaldi/fstbin/CMakeLists.txt
--- a/speechx/tools/fstbin/fstaddselfloops.cc
+++ b/speechx/tools/fstbin/fstaddselfloops.cc
--- a/speechx/tools/fstbin/fstdeterminizestar.cc
+++ b/speechx/tools/fstbin/fstdeterminizestar.cc
--- a/speechx/tools/fstbin/fstisstochastic.cc
+++ b/speechx/tools/fstbin/fstisstochastic.cc
--- a/speechx/tools/fstbin/fstminimizeencoded.cc
+++ b/speechx/tools/fstbin/fstminimizeencoded.cc
--- a/speechx/tools/fstbin/fsttablecompose.cc
+++ b/speechx/tools/fstbin/fsttablecompose.cc
--- a/speechx/speechx/kaldi/fstext/CMakeLists.txt
+++ b/speechx/speechx/kaldi/fstext/CMakeLists.txt
--- a/speechx/speechx/kaldi/lm/CMakeLists.txt
+++ b/speechx/speechx/kaldi/lm/CMakeLists.txt
--- a/speechx/speechx/kaldi/lm/arpa-file-parser.cc
+++ b/speechx/speechx/kaldi/lm/arpa-file-parser.cc
--- a/speechx/speechx/kaldi/lm/arpa-file-parser.h
+++ b/speechx/speechx/kaldi/lm/arpa-file-parser.h
--- a/speechx/speechx/kaldi/lm/arpa-lm-compiler.cc
+++ b/speechx/speechx/kaldi/lm/arpa-lm-compiler.cc
--- a/speechx/speechx/kaldi/lm/arpa-lm-compiler.h
+++ b/speechx/speechx/kaldi/lm/arpa-lm-compiler.h
--- a/speechx/tools/lmbin/CMakeLists.txt
+++ b/speechx/tools/lmbin/CMakeLists.txt
--- a/speechx/tools/lmbin/arpa2fst.cc
+++ b/speechx/tools/lmbin/arpa2fst.cc
--- a/speechx/speechx/nnet/CMakeLists.txt
+++ b/speechx/speechx/nnet/CMakeLists.txt
--- a/speechx/speechx/nnet/nnet_forward_main.cc
+++ b/speechx/speechx/nnet/nnet_forward_main.cc
--- a/speechx/speechx/protocol/CMakeLists.txt
+++ b/speechx/speechx/protocol/CMakeLists.txt
--- a/speechx/examples/ds2_ol/websocket/CMakeLists.txt
+++ b/speechx/examples/ds2_ol/websocket/CMakeLists.txt
--- a/speechx/speechx/websocket/websocket_client.cc
+++ b/speechx/speechx/websocket/websocket_client.cc
--- a/speechx/speechx/websocket/websocket_client.h
+++ b/speechx/speechx/websocket/websocket_client.h
--- a/speechx/examples/ds2_ol/websocket/websocket_client_main.cc
+++ b/speechx/examples/ds2_ol/websocket/websocket_client_main.cc
--- a/speechx/speechx/websocket/websocket_server.cc
+++ b/speechx/speechx/websocket/websocket_server.cc
--- a/speechx/speechx/websocket/websocket_server.h
+++ b/speechx/speechx/websocket/websocket_server.h
--- a/speechx/examples/ds2_ol/websocket/websocket_server_main.cc
+++ b/speechx/examples/ds2_ol/websocket/websocket_server_main.cc
--- a/speechx/speechx/utils/CMakeLists.txt
+++ b/speechx/speechx/utils/CMakeLists.txt
--- a/speechx/speechx/utils/simdjson.cpp
+++ b/speechx/speechx/utils/simdjson.cpp
--- a/speechx/speechx/utils/simdjson.h
+++ b/speechx/speechx/utils/simdjson.h
--- a/speechx/speechx/websocket/CMakeLists.txt
+++ b/speechx/speechx/websocket/CMakeLists.txt
--- a/audio/tests/benchmark/README.md
+++ b/audio/tests/benchmark/README.md
--- a/audio/tests/benchmark/log_melspectrogram.py
+++ b/audio/tests/benchmark/log_melspectrogram.py
--- a/audio/tests/benchmark/melspectrogram.py
+++ b/audio/tests/benchmark/melspectrogram.py
--- a/audio/tests/benchmark/mfcc.py
+++ b/audio/tests/benchmark/mfcc.py
--- a/tests/test_tipc/prepare.sh
+++ b/tests/test_tipc/prepare.sh
--- a/tests/unit/audio/backends/__init__.py
+++ b/tests/unit/audio/backends/__init__.py
--- a/audio/tests/backends/base.py
+++ b/audio/tests/backends/base.py
--- a/tests/unit/audio/backends/soundfile/__init__.py
+++ b/tests/unit/audio/backends/soundfile/__init__.py
--- a/audio/tests/backends/soundfile/test_io.py
+++ b/audio/tests/backends/soundfile/test_io.py
--- a/tests/unit/audio/features/__init__.py
+++ b/tests/unit/audio/features/__init__.py
--- a/audio/tests/features/base.py
+++ b/audio/tests/features/base.py
--- a/audio/tests/features/test_istft.py
+++ b/audio/tests/features/test_istft.py
--- a/audio/tests/features/test_kaldi.py
+++ b/audio/tests/features/test_kaldi.py
--- a/audio/tests/features/test_librosa.py
+++ b/audio/tests/features/test_librosa.py
--- a/audio/tests/features/test_log_melspectrogram.py
+++ b/audio/tests/features/test_log_melspectrogram.py
--- a/audio/tests/features/test_spectrogram.py
+++ b/audio/tests/features/test_spectrogram.py
--- a/audio/tests/features/test_stft.py
+++ b/audio/tests/features/test_stft.py
--- a/tests/unit/cli/aishell_test_prepare.py
+++ b/tests/unit/cli/aishell_test_prepare.py
--- a/tests/unit/cli/calc_RTF_CER_by_aishell.sh
+++ b/tests/unit/cli/calc_RTF_CER_by_aishell.sh
--- a/examples/other/1xt2x/librispeech/path.sh
+++ b/examples/other/1xt2x/librispeech/path.sh
--- a/tests/unit/cli/test_cli.sh
+++ b/tests/unit/cli/test_cli.sh
--- a/tests/unit/cli/utils
+++ b/tests/unit/cli/utils
--- a/tests/unit/server/offline/change_yaml.py
+++ b/tests/unit/server/offline/change_yaml.py
--- a/tests/unit/server/offline/conf/application.yaml
+++ b/tests/unit/server/offline/conf/application.yaml
--- a/tests/unit/server/offline/test_server_client.sh
+++ b/tests/unit/server/offline/test_server_client.sh
--- a/tests/unit/server/online/tts/check_server/change_yaml.py
+++ b/tests/unit/server/online/tts/check_server/change_yaml.py
--- a/tests/unit/server/online/tts/check_server/conf/application.yaml
+++ b/tests/unit/server/online/tts/check_server/conf/application.yaml
--- a/tests/unit/server/online/tts/check_server/http_client.py
+++ b/tests/unit/server/online/tts/check_server/http_client.py
--- a/tests/unit/server/online/tts/check_server/test.sh
+++ b/tests/unit/server/online/tts/check_server/test.sh
--- a/tests/unit/server/online/tts/check_server/test_all.sh
+++ b/tests/unit/server/online/tts/check_server/test_all.sh
--- a/tests/unit/server/online/tts/check_server/tts_online_application.yaml
+++ b/tests/unit/server/online/tts/check_server/tts_online_application.yaml
--- a/tests/unit/server/online/tts/check_server/ws_client.py
+++ b/tests/unit/server/online/tts/check_server/ws_client.py
--- a/tests/unit/server/online/tts/test_server/test_http_client.py
+++ b/tests/unit/server/online/tts/test_server/test_http_client.py
--- a/third_party/README.md
+++ b/third_party/README.md
--- a/third_party/ctc_decoders/LICENSE
+++ b/third_party/ctc_decoders/LICENSE
--- a/utils/README.md
+++ b/utils/README.md
--- a/utils/compute-wer.py
+++ b/utils/compute-wer.py
--- a/speechx/examples/ngram/zh/local/text_to_lexicon.py
+++ b/speechx/examples/ngram/zh/local/text_to_lexicon.py
--- a/utils/zh_tn.py
+++ b/utils/zh_tn.py