Merge branch 'develop' of github.com:PaddlePaddle/PaddleSpeech into update_paddle2onnx

a7402203 · 小湉湉 · b55865b2 · 9121dfc0 · a7402203 · a7402203
131 changed file
--- a/.flake8
+++ b/.flake8
@@ -12,6 +12,8 @@ exclude =
    .git,
    # python cache
    __pycache__,
+    # third party
+    utils/compute-wer.py,
    third_party/,
 # Provide a comma-separate list of glob patterns to include for checks.
 filename =

--- a/demos/README.md
+++ b/demos/README.md
@@ -10,6 +10,7 @@ The directory containes many speech applications in multi scenarios.
 * metaverse - 2D AR with TTS  
 * punctuation_restoration - restore punctuation from raw text
 * speech recogintion - recognize text of an audio file 
+* speech server - Server for Speech Task, e.g. ASR,TTS,CLS
 * speech translation - end to end speech translation  
 * story talker - book reader based on OCR and TTS  
 * style_fs2 - multi style control for FastSpeech2 model  

--- a/demos/README_cn.md
+++ b/demos/README_cn.md
@@ -10,6 +10,7 @@
 * 元宇宙 - 基于语音合成的 2D 增强现实。
 * 标点恢复 - 通常作为语音识别的文本后处理任务，为一段无标点的纯文本添加相应的标点符号。
 * 语音识别 - 识别一段音频中包含的语音文字。
+* 语音服务 - 离线语音服务，包括ASR、TTS、CLS等
 * 语音翻译 - 实时识别音频中的语言，并同时翻译成目标语言。
 * 会说话的故事书 - 基于 OCR 和语音合成的会说话的故事书。
 * 个性化语音合成 - 基于 FastSpeech2 模型的个性化语音合成。 

--- a/demos/speaker_verification/README.md
+++ b/demos/speaker_verification/README.md
 ([简体中文](./README_cn.md)|English)
-# Speech Verification)
+# Speech Verification

 ## Introduction


--- a/demos/speech_server/README_cn.md
+++ b/demos/speech_server/README_cn.md
@@ -86,9 +86,6 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
   ```
   paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav

-   # 流式ASR
-   paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8091 --input ./zh.wav
-
   ```

    使用帮助:

--- a/demos/streaming_asr_server/README.md
+++ b/demos/streaming_asr_server/README.md
--- a/demos/streaming_asr_server/README_cn.md
+++ b/demos/streaming_asr_server/README_cn.md
--- a/demos/streaming_asr_server/conf/ws_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_application.yaml
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8090
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_online', 'tts_online']
+# protocol = ['websocket', 'http'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+    model_type: 'deepspeech2online_aishell'
+    am_model: # the pdmodel file of am static model [optional]
+    am_params:  # the pdiparams file of am static model [optional]
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: 
+    decode_method: 
+    force_yes: True
+
+    am_predictor_conf:
+        device:  # set 'gpu:id' or 'cpu'
+        switch_ir_optim: True
+        glog_info: False  # True -> print glog
+        summary: True  # False -> do not show predictor config
+
+    chunk_buffer_conf:
+        frame_duration_ms: 80
+        shift_ms: 40
+        sample_rate: 16000
+        sample_width: 2
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 20   # ms
+        shift_ms: 10    # ms
--- a/demos/streaming_asr_server/conf/ws_conformer_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_conformer_application.yaml
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8090
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_online', 'tts_online']
+# protocol = ['websocket', 'http'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+    model_type: 'conformer_online_multicn'
+    am_model: # the pdmodel file of am static model [optional]
+    am_params:  # the pdiparams file of am static model [optional]
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: 
+    decode_method: 
+    force_yes: True
+    device: # cpu or gpu:id
+    am_predictor_conf:
+        device:  # set 'gpu:id' or 'cpu'
+        switch_ir_optim: True
+        glog_info: False  # True -> print glog
+        summary: True  # False -> do not show predictor config
+
+    chunk_buffer_conf:
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 25   # ms
+        shift_ms: 10    # ms
+        sample_rate: 16000
+        sample_width: 2
\ No newline at end of file
--- a/demos/streaming_asr_server/run.sh
+++ b/demos/streaming_asr_server/run.sh
+# start the streaming asr service
+paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml
\ No newline at end of file
--- a/demos/streaming_asr_server/test.sh
+++ b/demos/streaming_asr_server/test.sh
+# download the test wav
+wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav 
+
+# read the wav and pass it to service
+python3 websocket_client.py --wavfile ./zh.wav
--- a/paddlespeech/server/tests/asr/online/web/app.py
+++ b/paddlespeech/server/tests/asr/online/web/app.py
--- a/paddlespeech/server/tests/asr/online/web/paddle_web_demo.png
+++ b/paddlespeech/server/tests/asr/online/web/paddle_web_demo.png
--- a/paddlespeech/server/tests/asr/online/web/readme.md
+++ b/paddlespeech/server/tests/asr/online/web/readme.md
--- a/paddlespeech/server/tests/asr/online/web/static/css/font-awesome.min.css
+++ b/paddlespeech/server/tests/asr/online/web/static/css/font-awesome.min.css
--- a/paddlespeech/server/tests/asr/online/web/static/css/style.css
+++ b/paddlespeech/server/tests/asr/online/web/static/css/style.css
--- a/paddlespeech/server/tests/asr/online/web/static/fonts/FontAwesome.otf
+++ b/paddlespeech/server/tests/asr/online/web/static/fonts/FontAwesome.otf
--- a/paddlespeech/server/tests/asr/online/web/static/fonts/fontawesome-webfont.eot
+++ b/paddlespeech/server/tests/asr/online/web/static/fonts/fontawesome-webfont.eot
--- a/paddlespeech/server/tests/asr/online/web/static/fonts/fontawesome-webfont.svg
+++ b/paddlespeech/server/tests/asr/online/web/static/fonts/fontawesome-webfont.svg
--- a/paddlespeech/server/tests/asr/online/web/static/fonts/fontawesome-webfont.ttf
+++ b/paddlespeech/server/tests/asr/online/web/static/fonts/fontawesome-webfont.ttf
--- a/paddlespeech/server/tests/asr/online/web/static/fonts/fontawesome-webfont.woff
+++ b/paddlespeech/server/tests/asr/online/web/static/fonts/fontawesome-webfont.woff
--- a/paddlespeech/server/tests/asr/online/web/static/fonts/fontawesome-webfont.woff2
+++ b/paddlespeech/server/tests/asr/online/web/static/fonts/fontawesome-webfont.woff2
--- a/paddlespeech/server/tests/asr/online/web/static/image/PaddleSpeech_logo.png
+++ b/paddlespeech/server/tests/asr/online/web/static/image/PaddleSpeech_logo.png
--- a/paddlespeech/server/tests/asr/online/web/static/image/voice-dictation.svg
+++ b/paddlespeech/server/tests/asr/online/web/static/image/voice-dictation.svg
--- a/paddlespeech/server/tests/asr/online/web/static/js/SoundRecognizer.js
+++ b/paddlespeech/server/tests/asr/online/web/static/js/SoundRecognizer.js
--- a/paddlespeech/server/tests/asr/online/web/static/js/jquery-3.2.1.min.js
+++ b/paddlespeech/server/tests/asr/online/web/static/js/jquery-3.2.1.min.js
--- a/paddlespeech/server/tests/asr/online/web/static/js/recorder/engine/mp3.js
+++ b/paddlespeech/server/tests/asr/online/web/static/js/recorder/engine/mp3.js
--- a/paddlespeech/server/tests/asr/online/web/static/js/recorder/engine/pcm.js
+++ b/paddlespeech/server/tests/asr/online/web/static/js/recorder/engine/pcm.js
--- a/paddlespeech/server/tests/asr/online/web/static/js/recorder/engine/wav.js
+++ b/paddlespeech/server/tests/asr/online/web/static/js/recorder/engine/wav.js
--- a/paddlespeech/server/tests/asr/online/web/static/js/recorder/extensions/frequency.histogram.view.js
+++ b/paddlespeech/server/tests/asr/online/web/static/js/recorder/extensions/frequency.histogram.view.js
--- a/paddlespeech/server/tests/asr/online/web/static/js/recorder/extensions/lib.fft.js
+++ b/paddlespeech/server/tests/asr/online/web/static/js/recorder/extensions/lib.fft.js
--- a/paddlespeech/server/tests/asr/online/web/static/js/recorder/recorder-core.js
+++ b/paddlespeech/server/tests/asr/online/web/static/js/recorder/recorder-core.js
--- a/paddlespeech/server/tests/asr/online/web/static/paddle.ico
+++ b/paddlespeech/server/tests/asr/online/web/static/paddle.ico
--- a/paddlespeech/server/tests/asr/online/web/templates/index.html
+++ b/paddlespeech/server/tests/asr/online/web/templates/index.html
--- a/paddlespeech/server/tests/asr/online/websocket_client.py
+++ b/paddlespeech/server/tests/asr/online/websocket_client.py
@@ -16,99 +16,24 @@
 import argparse
 import asyncio
 import codecs
-import json
 import logging
 import os

-import numpy as np
-import soundfile
-import websockets
-
-
-class ASRAudioHandler:
-    def __init__(self, url="127.0.0.1", port=8090):
-        self.url = url
-        self.port = port
-        self.url = "ws://" + self.url + ":" + str(self.port) + "/ws/asr"
-
-    def read_wave(self, wavfile_path: str):
-        samples, sample_rate = soundfile.read(wavfile_path, dtype='int16')
-        x_len = len(samples)
-        # chunk_stride = 40 * 16  #40ms, sample_rate = 16kHz
-        chunk_size = 80 * 16  #80ms, sample_rate = 16kHz
-
-        if x_len % chunk_size != 0:
-            padding_len_x = chunk_size - x_len % chunk_size
-        else:
-            padding_len_x = 0
-
-        padding = np.zeros((padding_len_x), dtype=samples.dtype)
-        padded_x = np.concatenate([samples, padding], axis=0)
-
-        assert (x_len + padding_len_x) % chunk_size == 0
-        num_chunk = (x_len + padding_len_x) / chunk_size
-        num_chunk = int(num_chunk)
-
-        for i in range(0, num_chunk):
-            start = i * chunk_size
-            end = start + chunk_size
-            x_chunk = padded_x[start:end]
-            yield x_chunk
-
-    async def run(self, wavfile_path: str):
-        logging.info("send a message to the server")
-        async with websockets.connect(self.url) as ws:
-            audio_info = json.dumps(
-                {
-                    "name": "test.wav",
-                    "signal": "start",
-                    "nbest": 5
-                },
-                sort_keys=True,
-                indent=4,
-                separators=(',', ': '))
-            await ws.send(audio_info)
-            msg = await ws.recv()
-            logging.info("receive msg={}".format(msg))
-
-            # send chunk audio data to engine
-            for chunk_data in self.read_wave(wavfile_path):
-                await ws.send(chunk_data.tobytes())
-                msg = await ws.recv()
-                msg = json.loads(msg)
-                logging.info("receive msg={}".format(msg))
-
-            result = msg
-            # finished 
-            audio_info = json.dumps(
-                {
-                    "name": "test.wav",
-                    "signal": "end",
-                    "nbest": 5
-                },
-                sort_keys=True,
-                indent=4,
-                separators=(',', ': '))
-            await ws.send(audio_info)
-            msg = await ws.recv()
-            msg = json.loads(msg)
-            logging.info("receive msg={}".format(msg))
-
-        return result
+from paddlespeech.cli.log import logger
+from paddlespeech.server.utils.audio_handler import ASRAudioHandler


 def main(args):
-    logging.basicConfig(level=logging.INFO)
-    logging.info("asr websocket client start")
+    logger.info("asr websocket client start")
    handler = ASRAudioHandler("127.0.0.1", 8090)
    loop = asyncio.get_event_loop()

    # support to process single audio file
    if args.wavfile and os.path.exists(args.wavfile):
-        logging.info(f"start to process the wavscp: {args.wavfile}")
+        logger.info(f"start to process the wavscp: {args.wavfile}")
        result = loop.run_until_complete(handler.run(args.wavfile))
        result = result["asr_results"]
-        logging.info(f"asr websocket client finished : {result}")
+        logger.info(f"asr websocket client finished : {result}")

    # support to process batch audios from wav.scp 
    if args.wavscp and os.path.exists(args.wavscp):
@@ -123,6 +48,7 @@ def main(args):


 if __name__ == "__main__":
+    logger.info("Start to do streaming asr client")
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--wavfile",

--- a/docs/source/released_model.md
+++ b/docs/source/released_model.md
@@ -8,8 +8,8 @@ Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER |
 :-------------:| :------------:| :-----: | -----: | :-----: |:-----:| :-----:  | :-----:  | :-----: 
 [Ds2 Online Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz) | Aishell Dataset | Char-based | 345 MB  | 2 Conv + 5 LSTM layers with only forward direction | 0.078 |-| 151 h | [D2 Online Aishell ASR0](../../examples/aishell/asr0) 
 [Ds2 Offline Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz)| Aishell Dataset | Char-based | 306 MB | 2 Conv + 3 bidirectional GRU layers| 0.064 |-| 151 h | [Ds2 Offline Aishell ASR0](../../examples/aishell/asr0) 
-[Conformer Online Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_chunk_conformer_aishell_ckpt_0.1.2.model.tar.gz) | Aishell Dataset | Char-based | 189 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0565 |-| 151 h | [Conformer Online Aishell ASR1](../../examples/aishell/asr1) 
-[Conformer Offline Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_conformer_aishell_ckpt_0.1.2.model.tar.gz) | Aishell Dataset | Char-based | 189 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0483 |-| 151 h | [Conformer Offline Aishell ASR1](../../examples/aishell/asr1) 
+[Conformer Online Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_chunk_conformer_aishell_ckpt_0.2.0.model.tar.gz) | Aishell Dataset | Char-based | 189 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.0544 |-| 151 h | [Conformer Online Aishell ASR1](../../examples/aishell/asr1) 
+[Conformer Offline Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_conformer_aishell_ckpt_0.1.2.model.tar.gz) | Aishell Dataset | Char-based | 189 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0464 |-| 151 h | [Conformer Offline Aishell ASR1](../../examples/aishell/asr1) 
 [Transformer Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz) | Aishell Dataset | Char-based | 128 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0523 || 151 h | [Transformer  Aishell ASR1](../../examples/aishell/asr1) 
 [Ds2 Offline Librispeech ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr0/asr0_deepspeech2_librispeech_ckpt_0.1.1.model.tar.gz)| Librispeech Dataset | Char-based | 518 MB | 2 Conv + 3 bidirectional LSTM layers| - |0.0725| 960 h | [Ds2 Offline Librispeech ASR0](../../examples/librispeech/asr0) 
 [Conformer Librispeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz) | Librispeech Dataset | subword-based | 191 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring |-| 0.0337 | 960 h | [Conformer Librispeech ASR1](../../examples/librispeech/asr1) 

--- a/examples/aishell/asr0/local/test.sh
+++ b/examples/aishell/asr0/local/test.sh
@@ -5,6 +5,8 @@ if [ $# != 4 ];then
    exit -1
 fi

+stage=0
+stop_stage=100
 ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
 echo "using $ngpu gpus..."

@@ -19,18 +21,45 @@ if [ $? -ne 0 ]; then
   exit 1
 fi

-python3 -u ${BIN_DIR}/test.py \
--ngpu ${ngpu} \
--config ${config_path} \
--decode_cfg ${decode_config_path} \
--result_file ${ckpt_prefix}.rsl \
--checkpoint_path ${ckpt_prefix} \
--model_type ${model_type}
+if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+    # format the reference test file
+    python utils/format_rsl.py \
+        --origin_ref data/manifest.test.raw \
+        --trans_ref data/manifest.test.text

-if [ $? -ne 0 ]; then
-    echo "Failed in evaluation!"
-    exit 1
+    python3 -u ${BIN_DIR}/test.py \
+    --ngpu ${ngpu} \
+    --config ${config_path} \
+    --decode_cfg ${decode_config_path} \
+    --result_file ${ckpt_prefix}.rsl \
+    --checkpoint_path ${ckpt_prefix} \
+    --model_type ${model_type}
+
+    if [ $? -ne 0 ]; then
+        echo "Failed in evaluation!"
+        exit 1
+    fi
+
+    # format the hyp file
+    python utils/format_rsl.py \
+        --origin_hyp ${ckpt_prefix}.rsl \
+        --trans_hyp ${ckpt_prefix}.rsl.text
+
+    python utils/compute-wer.py --char=1 --v=1 \
+        data/manifest.test.text ${ckpt_prefix}.rsl.text > ${ckpt_prefix}.error 
 fi

+if [ ${stage} -le 101 ] && [ ${stop_stage} -ge 101 ]; then
+    python utils/format_rsl.py \
+        --origin_ref data/manifest.test.raw \
+        --trans_ref_sclite data/manifest.test.text.sclite
+
+        python utils/format_rsl.py \
+            --origin_hyp ${ckpt_prefix}.rsl \
+            --trans_hyp_sclite ${ckpt_prefix}.rsl.text.sclite
+
+        mkdir -p ${ckpt_prefix}_sclite
+        sclite -i wsj -r data/manifest.test.text.sclite -h  ${ckpt_prefix}.rsl.text.sclite  -e utf-8 -o all -O ${ckpt_prefix}_sclite -c NOASCII
+fi

 exit 0
--- a/examples/aishell/asr1/RESULTS.md
+++ b/examples/aishell/asr1/RESULTS.md
@@ -2,26 +2,26 @@

 ## Conformer
 paddle version: 2.2.2  
-paddlespeech version: 0.1.2
+paddlespeech version: 0.2.0
 | Model | Params | Config | Augmentation| Test set | Decode method | Loss | CER |
 | --- | --- | --- | --- | --- | --- | --- | --- | 
-| conformer | 47.07M  | conf/conformer.yaml | spec_aug | test | attention | - | 0.0548 |
-| conformer | 47.07M  | conf/conformer.yaml | spec_aug | test | ctc_greedy_search | - | 0.05127 |
-| conformer | 47.07M  | conf/conformer.yaml | spec_aug| test | ctc_prefix_beam_search | - | 0.05131 | 
-| conformer | 47.07M  | conf/conformer.yaml | spec_aug | test | attention_rescoring | - | 0.04829 | 
+| conformer | 47.07M  | conf/conformer.yaml | spec_aug | test | attention | - | 0.0530 |
+| conformer | 47.07M  | conf/conformer.yaml | spec_aug | test | ctc_greedy_search | - | 0.0495 |
+| conformer | 47.07M  | conf/conformer.yaml | spec_aug| test | ctc_prefix_beam_search | - | 0.0494 | 
+| conformer | 47.07M  | conf/conformer.yaml | spec_aug | test | attention_rescoring | - | 0.0464 | 


 ## Chunk Conformer
 paddle version: 2.2.2  
-paddlespeech version: 0.1.2  
+paddlespeech version: 0.2.0  
 Need set `decoding.decoding_chunk_size=16` when decoding.

 | Model | Params | Config | Augmentation| Test set | Decode method | Chunk Size & Left Chunks | Loss | CER |  
 | --- | --- | --- | --- | --- | --- | --- | --- | --- |  
-| conformer | 47.06M | conf/chunk_conformer.yaml | spec_aug | test | attention | 16, -1 | - | 0.0573884 |  
-| conformer | 47.06M | conf/chunk_conformer.yaml | spec_aug | test | ctc_greedy_search | 16, -1 | - | 0.06599091 |  
-| conformer | 47.06M | conf/chunk_conformer.yaml | spec_aug | test | ctc_prefix_beam_search | 16, -1 | - | 0.065991 |  
-| conformer | 47.06M | conf/chunk_conformer.yaml | spec_aug | test | attention_rescoring | 16, -1 |  - | 0.056502 |  
+| conformer | 47.06M | conf/chunk_conformer.yaml | spec_aug | test | attention | 16, -1 | - | 0.0551 |  
+| conformer | 47.06M | conf/chunk_conformer.yaml | spec_aug | test | ctc_greedy_search | 16, -1 | - | 0.0629 |  
+| conformer | 47.06M | conf/chunk_conformer.yaml | spec_aug | test | ctc_prefix_beam_search | 16, -1 | - | 0.0629 |  
+| conformer | 47.06M | conf/chunk_conformer.yaml | spec_aug | test | attention_rescoring | 16, -1 |  - | 0.0544 |  


 ## Transformer 

--- a/examples/aishell/asr1/local/test.sh
+++ b/examples/aishell/asr1/local/test.sh
@@ -5,6 +5,8 @@ if [ $# != 3 ];then
    exit -1
 fi

+stage=0
+stop_stage=100
 ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
 echo "using $ngpu gpus..."

@@ -24,49 +26,86 @@ fi
 #fi


-for type in attention ctc_greedy_search; do
-    echo "decoding ${type}"
-    if [ ${chunk_mode} == true ];then
-        # stream decoding only support batchsize=1
+if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+    # format the reference test file
+    python utils/format_rsl.py \
+        --origin_ref data/manifest.test.raw \
+        --trans_ref data/manifest.test.text
+
+    for type in attention ctc_greedy_search; do
+        echo "decoding ${type}"
+        if [ ${chunk_mode} == true ];then
+            # stream decoding only support batchsize=1
+            batch_size=1
+        else
+            batch_size=64
+        fi
+        output_dir=${ckpt_prefix}
+        mkdir -p ${output_dir}
+        python3 -u ${BIN_DIR}/test.py \
+        --ngpu ${ngpu} \
+        --config ${config_path} \
+        --decode_cfg ${decode_config_path} \
+        --result_file ${output_dir}/${type}.rsl \
+        --checkpoint_path ${ckpt_prefix} \
+        --opts decode.decoding_method ${type} \
+        --opts decode.decode_batch_size ${batch_size}
+
+        if [ $? -ne 0 ]; then
+            echo "Failed in evaluation!"
+            exit 1
+
+        fi
+        # format the hyp file
+        python utils/format_rsl.py \
+            --origin_hyp ${output_dir}/${type}.rsl \
+            --trans_hyp ${output_dir}/${type}.rsl.text
+        python utils/compute-wer.py --char=1 --v=1 \
+            data/manifest.test.text ${output_dir}/${type}.rsl.text > ${output_dir}/${type}.error 
+
+    done
+
+    for type in ctc_prefix_beam_search attention_rescoring; do
+        echo "decoding ${type}"
        batch_size=1
-    else
-        batch_size=64
-    fi
-    output_dir=${ckpt_prefix}
-    mkdir -p ${output_dir}
-    python3 -u ${BIN_DIR}/test.py \
-    --ngpu ${ngpu} \
-    --config ${config_path} \
-    --decode_cfg ${decode_config_path} \
-    --result_file ${output_dir}/${type}.rsl \
-    --checkpoint_path ${ckpt_prefix} \
-    --opts decode.decoding_method ${type} \
-    --opts decode.decode_batch_size ${batch_size}
-
-    if [ $? -ne 0 ]; then
-        echo "Failed in evaluation!"
-        exit 1
-    fi
-done
-
-for type in ctc_prefix_beam_search attention_rescoring; do
-    echo "decoding ${type}"
-    batch_size=1
+        output_dir=${ckpt_prefix}
+        mkdir -p ${output_dir}
+        python3 -u ${BIN_DIR}/test.py \
+        --ngpu ${ngpu} \
+        --config ${config_path} \
+        --decode_cfg ${decode_config_path} \
+        --result_file ${output_dir}/${type}.rsl \
+        --checkpoint_path ${ckpt_prefix} \
+        --opts decode.decoding_method ${type} \
+        --opts decode.decode_batch_size ${batch_size}
+
+        if [ $? -ne 0 ]; then
+            echo "Failed in evaluation!"
+            exit 1
+        fi
+        python utils/format_rsl.py \
+            --origin_hyp ${output_dir}/${type}.rsl \
+            --trans_hyp ${output_dir}/${type}.rsl.text
+        python utils/compute-wer.py --char=1 --v=1 \
+            data/manifest.test.text ${output_dir}/${type}.rsl.text > ${output_dir}/${type}.error 
+    done
+fi
+
+if [ ${stage} -le 101 ] && [ ${stop_stage} -ge 101 ]; then
+    # format the reference test file for sclite
+    python utils/format_rsl.py \
+        --origin_ref data/manifest.test.raw \
+        --trans_ref_sclite data/manifest.test.text.sclite
+    
    output_dir=${ckpt_prefix}
-    mkdir -p ${output_dir}
-    python3 -u ${BIN_DIR}/test.py \
-    --ngpu ${ngpu} \
-    --config ${config_path} \
-    --decode_cfg ${decode_config_path} \
-    --result_file ${output_dir}/${type}.rsl \
-    --checkpoint_path ${ckpt_prefix} \
-    --opts decode.decoding_method ${type} \
-    --opts decode.decode_batch_size ${batch_size}
-
-    if [ $? -ne 0 ]; then
-        echo "Failed in evaluation!"
-        exit 1
-    fi
-done
+    for type in attention ctc_greedy_search ctc_prefix_beam_search attention_rescoring; do
+        python utils/format_rsl.py \
+            --origin_hyp ${output_dir}/${type}.rsl \
+            --trans_hyp_sclite ${output_dir}/${type}.rsl.text.sclite
+
+        mkdir -p ${output_dir}/${type}_sclite
+        sclite -i wsj -r data/manifest.test.text.sclite -h  ${output_dir}/${type}.rsl.text.sclite  -e utf-8 -o all -O ${output_dir}/${type}_sclite -c NOASCII
+    done
+fi

 exit 0
--- a/examples/aishell/asr1/run.sh
+++ b/examples/aishell/asr1/run.sh
@@ -7,7 +7,7 @@ stage=0
 stop_stage=50
 conf_path=conf/conformer.yaml
 decode_conf_path=conf/tuning/decode.yaml
-avg_num=20
+avg_num=30
 audio_file=data/demo_01_03.wav

 source ${MAIN_ROOT}/utils/parse_options.sh || exit 1;

--- a/examples/csmsc/tts3/local/synthesize_e2e.sh
+++ b/examples/csmsc/tts3/local/synthesize_e2e.sh
@@ -109,6 +109,6 @@ if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
        --lang=zh \
        --text=${BIN_DIR}/../sentences.txt \
        --output_dir=${train_output_path}/test_e2e \
-        --phones_dict=dump/phone_id_map.txt #\
-        # --inference_dir=${train_output_path}/inference
+        --phones_dict=dump/phone_id_map.txt \
+        --inference_dir=${train_output_path}/inference
 fi
--- a/examples/csmsc/voc6/README.md
+++ b/examples/csmsc/voc6/README.md
@@ -114,6 +114,7 @@ The pretrained model can be downloaded here:

 The static model can be downloaded here:
 - [wavernn_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/wavernn/wavernn_csmsc_static_0.2.0.zip)
+- [wavernn_csmsc_static_1.0.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/wavernn/wavernn_csmsc_static_1.0.0.zip) (fix bug for paddle 2.3)

 Model | Step | eval/loss
 :-------------:|:------------:| :------------:

--- a/examples/voxceleb/sv0/README.md
+++ b/examples/voxceleb/sv0/README.md
+# ECAPA-TDNN with VoxCeleb
+This example contains code used to train a ECAPA-TDNN model with [VoxCeleb dataset](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/index.html#about)
+
+## Overview
+All the scripts you need are in the `run.sh`. There are several stages in the `run.sh`, and each stage has its function.
+| Stage | Function                                                     |
+|:---- |:----------------------------------------------------------- |
+| 0     | Process data. It includes: <br>       (1) Download the VoxCeleb1 dataset <br>       (2) Download the VoxCeleb2 dataset  <br>       (3) Convert the VoxCeleb2 m4a to wav format <br>       (4) Get the manifest files of the train, development and test dataset <br> (5) Download the RIR Noise dataset and Get the noise manifest files for augmentation |
+| 1     | Train the model                                              |
+| 2     | Test the speaker verification with VoxCeleb trial|
+
+You can choose to run a range of stages by setting the `stage` and `stop_stage `. 
+
+For example, if you want to execute the code in stage 1 and stage 2, you can run this script:
+```bash
+bash run.sh --stage 1 --stop_stage 2
+```
+Or you can set `stage` equal to `stop-stage` to only run one stage.
+For example, if you only want to run `stage 0`, you can use the script below:
+```bash
+bash run.sh --stage 1 --stop_stage 1
+```
+The document below will describe the scripts in the `run.sh` in detail.
+## The environment variables
+The path.sh contains the environment variable. 
+```bash
+source path.sh
+```
+This script needs to be run first.  
+
+And another script is also needed:
+```bash
+source ${MAIN_ROOT}/utils/parse_options.sh
+```
+It will support the way of using `--variable value` in the shell scripts.
+
+## The local variables
+Some local variables are set in the `run.sh`. 
+`gpus` denotes the GPU number you want to use. If you set `gpus=`,  it means you only use CPU. 
+`stage` denotes the number of the stage you want to start from in the experiments.
+`stop stage` denotes the number of the stage you want to end at in the experiments. 
+`conf_path` denotes the config path of the model.
+`exp_dir` denotes the experiment directory, e.g. "exp/ecapa-tdnn-vox12-big/"
+
+You can set the local variables when you use the `run.sh`
+
+For example, you can set the `gpus` when you use the command line.:
+```bash
+bash run.sh --gpus 0,1 
+```
+## Stage 0: Data processing
+To use this example, you need to process data firstly and you can use stage 0 in the `run.sh` to do this. The code is shown below:
+
+```bash
+ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+     # prepare data
+     bash ./local/data.sh || exit -1
+ fi
+```
+Stage 0 is for processing the data. If you only want to process the data. You can run
+```bash
+bash run.sh --stage 0 --stop_stage 0
+```
+You can also just run these scripts in your command line.
+```bash
+source path.sh
+bash ./local/data.sh
+```
+After processing the data, the `data` directory will look like this:
+```bash
+data/
+├── rir_noise
+│   ├── csv
+│   │   ├── noise.csv
+│   │   └── rir.csv
+│   ├── manifest.pointsource_noises
+│   ├── manifest.real_rirs_isotropic_noises
+│   └── manifest.simulated_rirs
+├── vox
+│   ├── csv
+│   │   ├── dev.csv
+│   │   ├── enroll.csv
+│   │   ├── test.csv
+│   │   └── train.csv
+│   └── meta
+│       └── label2id.txt
+└── vox1
+    ├── list_test_all2.txt
+    ├── list_test_all.txt
+    ├── list_test_hard2.txt
+    ├── list_test_hard.txt
+    ├── manifest.dev
+    ├── manifest.test
+    ├── veri_test2.txt
+    ├── veri_test.txt
+    ├── voxceleb1.dev.meta
+    └── voxceleb1.test.meta
+```
+## Stage 1: Model training
+If you want to train the model. you can use stage 1 in the `run.sh`. The code is shown below. 
+```bash
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+     # train model, all `ckpt` under `exp` dir
+     CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path}  ${ckpt}
+ fi
+```
+If you want to train the model, you can use the script below to execute stage 0 and stage 1:
+```bash
+bash run.sh --stage 0 --stop_stage 1
+```
+or you can run these scripts in the command line (only use CPU).
+```bash
+source path.sh
+bash ./local/data.sh ./data/ conf/ecapa_tdnn.yaml
+CUDA_VISIBLE_DEVICES= ./local/train.sh ./data/ exp/ecapa-tdnn-vox12-big/ conf/ecapa_tdnn.yaml
+```
+## Stage 2: Model Testing
+The test stage is to evaluate the model performance. The code of the test stage is shown below:
+```bash
+ if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
+     # test ckpt avg_n
+     CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${dir} ${exp_dir} ${conf_path} || exit -1
+ fi
+```
+If you want to train a model and test it,  you can use the script below to execute stage 0, stage 1 and stage 2:
+```bash
+bash run.sh --stage 0 --stop_stage 2
+```
+or you can run these scripts in the command line (only use CPU).
+```bash
+source path.sh
+bash ./local/data.sh ./data/ conf/ecapa_tdnn.yaml
+CUDA_VISIBLE_DEVICES= ./local/train.sh ./data/ exp/ecapa-tdnn-vox12-big/ conf/ecapa_tdnn.yaml
+CUDA_VISIBLE_DEVICES= ./local/test.sh ./data/ exp/ecapa-tdnn-vox12-big/ conf/ecapa_tdnn.yaml
+```
+
+## 3: Pretrained Model
+You can get the pretrained models from [this](../../../docs/source/released_model.md).
+
+using the `tar` scripts to unpack the model and then you can use the script to test the model.
+
+For example:
+```
+wget https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz
+tar xzvf sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz
+source path.sh
+# If you have processed the data and get the manifest file， you can skip the following 2 steps
+
+CUDA_VISIBLE_DEVICES= ./local/test.sh ./data sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_2 conf/ecapa_tdnn.yaml
+```
+The performance of the released models are shown in [this](./RESULTS.md)
--- a/paddlespeech/cli/asr/infer.py
+++ b/paddlespeech/cli/asr/infer.py
@@ -125,6 +125,7 @@ class ASRExecutor(BaseExecutor):
        """
        Init model and other resources from a specific path.
        """
+        logger.info("start to init the model")
        if hasattr(self, 'model'):
            logger.info('Model had been initialized.')
            return
@@ -140,13 +141,14 @@ class ASRExecutor(BaseExecutor):
                res_path,
                self.pretrained_models[tag]['ckpt_path'] + ".pdparams")
            logger.info(res_path)
-            logger.info(self.cfg_path)
-            logger.info(self.ckpt_path)
+
        else:
            self.cfg_path = os.path.abspath(cfg_path)
            self.ckpt_path = os.path.abspath(ckpt_path + ".pdparams")
            self.res_path = os.path.dirname(
                os.path.dirname(os.path.abspath(self.cfg_path)))
+        logger.info(self.cfg_path)
+        logger.info(self.ckpt_path)

        #Init body.
        self.config = CfgNode(new_allowed=True)
@@ -176,7 +178,6 @@ class ASRExecutor(BaseExecutor):
                    vocab=self.config.vocab_filepath,
                    spm_model_prefix=self.config.spm_model_prefix)
                self.config.decode.decoding_method = decode_method
-
            else:
                raise Exception("wrong type")
        model_name = model_type[:model_type.rindex(
@@ -254,12 +255,14 @@ class ASRExecutor(BaseExecutor):
        else:
            raise Exception("wrong type")

+        logger.info("audio feat process success")
+
    @paddle.no_grad()
    def infer(self, model_type: str):
        """
        Model inference and result stored in self.output.
        """
-
+        logger.info("start to infer the model to get the output")
        cfg = self.config.decode
        audio = self._inputs["audio"]
        audio_len = self._inputs["audio_len"]
@@ -276,17 +279,23 @@ class ASRExecutor(BaseExecutor):
            self._outputs["result"] = result_transcripts[0]

        elif "conformer" in model_type or "transformer" in model_type:
-            result_transcripts = self.model.decode(
-                audio,
-                audio_len,
-                text_feature=self.text_feature,
-                decoding_method=cfg.decoding_method,
-                beam_size=cfg.beam_size,
-                ctc_weight=cfg.ctc_weight,
-                decoding_chunk_size=cfg.decoding_chunk_size,
-                num_decoding_left_chunks=cfg.num_decoding_left_chunks,
-                simulate_streaming=cfg.simulate_streaming)
-            self._outputs["result"] = result_transcripts[0][0]
+            logger.info(
+                f"we will use the transformer like model : {model_type}")
+            try:
+                result_transcripts = self.model.decode(
+                    audio,
+                    audio_len,
+                    text_feature=self.text_feature,
+                    decoding_method=cfg.decoding_method,
+                    beam_size=cfg.beam_size,
+                    ctc_weight=cfg.ctc_weight,
+                    decoding_chunk_size=cfg.decoding_chunk_size,
+                    num_decoding_left_chunks=cfg.num_decoding_left_chunks,
+                    simulate_streaming=cfg.simulate_streaming)
+                self._outputs["result"] = result_transcripts[0][0]
+            except Exception as e:
+                logger.exception(e)
+
        else:
            raise Exception("invalid model name")


--- a/paddlespeech/cli/asr/pretrained_models.py
+++ b/paddlespeech/cli/asr/pretrained_models.py
@@ -88,6 +88,8 @@ model_alias = {
    "paddlespeech.s2t.models.ds2_online:DeepSpeech2ModelOnline",
    "conformer":
    "paddlespeech.s2t.models.u2:U2Model",
+    "conformer_online":
+    "paddlespeech.s2t.models.u2:U2Model",
    "transformer":
    "paddlespeech.s2t.models.u2:U2Model",
    "wenetspeech":

--- a/paddlespeech/s2t/exps/deepspeech2/model.py
+++ b/paddlespeech/s2t/exps/deepspeech2/model.py
@@ -278,7 +278,7 @@ class DeepSpeech2Tester(DeepSpeech2Trainer):
            len_refs += len_ref
            num_ins += 1
            if fout:
-                fout.write({"utt": utt, "ref": target, "hyp": result})
+                fout.write({"utt": utt, "refs": [target], "hyps": [result]})
            logger.info(f"Utt: {utt}")
            logger.info(f"Ref: {target}")
            logger.info(f"Hyp: {result}")

--- a/paddlespeech/s2t/models/u2/u2.py
+++ b/paddlespeech/s2t/models/u2/u2.py
@@ -286,7 +286,6 @@ class U2BaseModel(ASRInterface, nn.Layer):
            # logp: (B*N, vocab)
            logp, cache = self.decoder.forward_one_step(
                encoder_out, encoder_mask, hyps, hyps_mask, cache)
-
            # 2.2 First beam prune: select topk best prob at current time
            top_k_logp, top_k_index = logp.topk(beam_size)  # (B*N, N)
            top_k_logp = mask_finished_scores(top_k_logp, end_flag)
@@ -708,11 +707,11 @@ class U2BaseModel(ASRInterface, nn.Layer):
        batch_size = feats.shape[0]
        if decoding_method in ['ctc_prefix_beam_search',
                               'attention_rescoring'] and batch_size > 1:
-            logger.fatal(
+            logger.error(
                f'decoding mode {decoding_method} must be running with batch_size == 1'
            )
+            logger.error(f"current batch_size is {batch_size}")
            sys.exit(1)
-
        if decoding_method == 'attention':
            hyps = self.recognize(
                feats,

--- a/paddlespeech/s2t/modules/align.py
+++ b/paddlespeech/s2t/modules/align.py
@@ -105,7 +105,6 @@ class Conv1D(nn.Conv1D):
                 data_format='NCL'):
        if weight_attr is None:
            if global_init_type == "kaiming_uniform":
-                print("set kaiming_uniform")
                weight_attr = paddle.ParamAttr(initializer=KaimingUniform())
        if bias_attr is None:
            if global_init_type == "kaiming_uniform":

--- a/paddlespeech/server/README.md
+++ b/paddlespeech/server/README.md
@@ -35,3 +35,16 @@
 ```bash
 paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav
 ```
+
+ ## Online ASR Server
+
+### Lanuch online asr server
+```
+paddlespeech_server start --config_file conf/ws_conformer_application.yaml
+```
+
+### Access online asr server
+
+```
+paddlespeech_client asr_online  --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
+```
--- a/paddlespeech/server/README_cn.md
+++ b/paddlespeech/server/README_cn.md
@@ -35,3 +35,17 @@
 ```bash
 paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav
 ```
+
+## 流式ASR
+
+### 启动流式语音识别服务
+
+```
+paddlespeech_server start --config_file conf/ws_conformer_application.yaml
+```
+
+### 访问流式语音识别服务
+
+```
+paddlespeech_client asr_online  --server_ip 127.0.0.1 --port 8090 --input zh.wav
+```
--- a/paddlespeech/server/bin/paddlespeech_client.py
+++ b/paddlespeech/server/bin/paddlespeech_client.py
@@ -30,11 +30,14 @@ from ..executor import BaseExecutor
 from ..util import cli_client_register
 from ..util import stats_wrapper
 from paddlespeech.cli.log import logger
-from paddlespeech.server.tests.asr.online.websocket_client import ASRAudioHandler
+from paddlespeech.server.utils.audio_handler import ASRAudioHandler
 from paddlespeech.server.utils.audio_process import wav2pcm
 from paddlespeech.server.utils.util import wav2base64

-__all__ = ['TTSClientExecutor', 'ASRClientExecutor', 'CLSClientExecutor']
+__all__ = [
+    'TTSClientExecutor', 'ASRClientExecutor', 'ASROnlineClientExecutor',
+    'CLSClientExecutor'
+]


 @cli_client_register(
@@ -236,11 +239,11 @@ class ASRClientExecutor(BaseExecutor):
 @cli_client_register(
    name='paddlespeech_client.asr_online',
    description='visit asr online service')
-class ASRClientExecutor(BaseExecutor):
+class ASROnlineClientExecutor(BaseExecutor):
    def __init__(self):
-        super(ASRClientExecutor, self).__init__()
+        super(ASROnlineClientExecutor, self).__init__()
        self.parser = argparse.ArgumentParser(
-            prog='paddlespeech_client.asr', add_help=True)
+            prog='paddlespeech_client.asr_online', add_help=True)
        self.parser.add_argument(
            '--server_ip', type=str, default='127.0.0.1', help='server ip')
        self.parser.add_argument(
@@ -277,11 +280,12 @@ class ASRClientExecutor(BaseExecutor):
                lang=lang,
                audio_format=audio_format)
            time_end = time.time()
-            logger.info(res.json())
+            logger.info(res)
            logger.info("Response time %f s." % (time_end - time_start))
            return True
        except Exception as e:
            logger.error("Failed to speech recognition.")
+            logger.error(e)
            return False

    @stats_wrapper
@@ -299,9 +303,11 @@ class ASRClientExecutor(BaseExecutor):
        logging.info("asr websocket client start")
        handler = ASRAudioHandler(server_ip, port)
        loop = asyncio.get_event_loop()
-        loop.run_until_complete(handler.run(input))
+        res = loop.run_until_complete(handler.run(input))
        logging.info("asr websocket client finished")

+        return res['asr_results']
+

 @cli_client_register(
    name='paddlespeech_client.cls', description='visit cls service')

--- a/paddlespeech/server/conf/ws_application.yaml
+++ b/paddlespeech/server/conf/ws_application.yaml
@@ -41,11 +41,7 @@ asr_online:
        shift_ms: 40
        sample_rate: 16000
        sample_width: 2
-
-    vad_conf:
-        aggressiveness: 2
-        sample_rate: 16000
-        frame_duration_ms: 20
-        sample_width: 2
-        padding_ms: 200
-        padding_ratio: 0.9
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 20   # ms
+        shift_ms: 10    # ms
--- a/paddlespeech/server/conf/ws_conformer_application.yaml
+++ b/paddlespeech/server/conf/ws_conformer_application.yaml
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8090
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_online', 'tts_online']
+# protocol = ['websocket', 'http'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+    model_type: 'conformer_online_multicn'
+    am_model: # the pdmodel file of am static model [optional]
+    am_params:  # the pdiparams file of am static model [optional]
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: 
+    decode_method: 
+    force_yes: True
+    device:  # cpu or gpu:id
+    am_predictor_conf:
+        device:  # set 'gpu:id' or 'cpu'
+        switch_ir_optim: True
+        glog_info: False  # True -> print glog
+        summary: True  # False -> do not show predictor config
+
+    chunk_buffer_conf:
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 25   # ms
+        shift_ms: 10    # ms
+        sample_rate: 16000
+        sample_width: 2
\ No newline at end of file
--- a/paddlespeech/server/engine/asr/online/asr_engine.py
+++ b/paddlespeech/server/engine/asr/online/asr_engine.py
--- a/paddlespeech/server/engine/asr/online/ctc_search.py
+++ b/paddlespeech/server/engine/asr/online/ctc_search.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from collections import defaultdict
+
+import paddle
+
+from paddlespeech.cli.log import logger
+from paddlespeech.s2t.utils.utility import log_add
+
+__all__ = ['CTCPrefixBeamSearch']
+
+
+class CTCPrefixBeamSearch:
+    def __init__(self, config):
+        """Implement the ctc prefix beam search
+
+        Args:
+            config (yacs.config.CfgNode): _description_
+        """
+        self.config = config
+        self.reset()
+
+    @paddle.no_grad()
+    def search(self, ctc_probs, device, blank_id=0):
+        """ctc prefix beam search method decode a chunk feature
+
+        Args:
+            xs (paddle.Tensor): feature data
+            ctc_probs (paddle.Tensor): the ctc probability of all the tokens
+            device (paddle.fluid.core_avx.Place): the feature host device, such as CUDAPlace(0).
+            blank_id (int, optional): the blank id in the vocab. Defaults to 0.
+
+        Returns:
+            list: the search result
+        """
+        # decode 
+        logger.info("start to ctc prefix search")
+
+        batch_size = 1
+        beam_size = self.config.beam_size
+        maxlen = ctc_probs.shape[0]
+
+        assert len(ctc_probs.shape) == 2
+
+        # cur_hyps: (prefix, (blank_ending_score, none_blank_ending_score))
+        # blank_ending_score and  none_blank_ending_score in ln domain
+        if self.cur_hyps is None:
+            self.cur_hyps = [(tuple(), (0.0, -float('inf')))]
+        # 2. CTC beam search step by step
+        for t in range(0, maxlen):
+            logp = ctc_probs[t]  # (vocab_size,)
+            # key: prefix, value (pb, pnb), default value(-inf, -inf)
+            next_hyps = defaultdict(lambda: (-float('inf'), -float('inf')))
+
+            # 2.1 First beam prune: select topk best
+            #     do token passing process
+            top_k_logp, top_k_index = logp.topk(beam_size)  # (beam_size,)
+            for s in top_k_index:
+                s = s.item()
+                ps = logp[s].item()
+                for prefix, (pb, pnb) in self.cur_hyps:
+                    last = prefix[-1] if len(prefix) > 0 else None
+                    if s == blank_id:  # blank
+                        n_pb, n_pnb = next_hyps[prefix]
+                        n_pb = log_add([n_pb, pb + ps, pnb + ps])
+                        next_hyps[prefix] = (n_pb, n_pnb)
+                    elif s == last:
+                        #  Update *ss -> *s;
+                        n_pb, n_pnb = next_hyps[prefix]
+                        n_pnb = log_add([n_pnb, pnb + ps])
+                        next_hyps[prefix] = (n_pb, n_pnb)
+                        # Update *s-s -> *ss, - is for blank
+                        n_prefix = prefix + (s, )
+                        n_pb, n_pnb = next_hyps[n_prefix]
+                        n_pnb = log_add([n_pnb, pb + ps])
+                        next_hyps[n_prefix] = (n_pb, n_pnb)
+                    else:
+                        n_prefix = prefix + (s, )
+                        n_pb, n_pnb = next_hyps[n_prefix]
+                        n_pnb = log_add([n_pnb, pb + ps, pnb + ps])
+                        next_hyps[n_prefix] = (n_pb, n_pnb)
+
+            # 2.2 Second beam prune
+            next_hyps = sorted(
+                next_hyps.items(),
+                key=lambda x: log_add(list(x[1])),
+                reverse=True)
+            self.cur_hyps = next_hyps[:beam_size]
+
+        self.hyps = [(y[0], log_add([y[1][0], y[1][1]])) for y in self.cur_hyps]
+        logger.info("ctc prefix search success")
+        return self.hyps
+
+    def get_one_best_hyps(self):
+        """Return the one best result
+
+        Returns:
+            list: the one best result
+        """
+        return [self.hyps[0][0]]
+
+    def get_hyps(self):
+        """Return the search hyps
+
+        Returns:
+            list: return the search hyps
+        """
+        return self.hyps
+
+    def reset(self):
+        """Rest the search cache value
+        """
+        self.cur_hyps = None
+        self.hyps = None
+
+    def finalize_search(self):
+        """do nothing in ctc_prefix_beam_search
+        """
+        pass
--- a/paddlespeech/server/engine/tts/online/tts_engine.py
+++ b/paddlespeech/server/engine/tts/online/tts_engine.py
--- a/paddlespeech/server/tests/__init__.py
+++ b/paddlespeech/server/tests/__init__.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/paddlespeech/server/tests/asr/__init__.py
+++ b/paddlespeech/server/tests/asr/__init__.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/paddlespeech/server/tests/asr/offline/__init__.py
+++ b/paddlespeech/server/tests/asr/offline/__init__.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/paddlespeech/server/tests/asr/online/README.md
+++ b/paddlespeech/server/tests/asr/online/README.md
+([简体中文](./README_cn.md)|English)
+
+# Speech Service
+
+## Introduction
+
+This document introduces a client for streaming asr service: microphone
+
+
+## Usage
+### 1. Install
+Refer [Install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
+
+ **paddlepaddle 2.2.1** 或以上版本。
+It is recommended to use **paddlepaddle 2.2.1** or above.
+You can choose one way from meduim and hard to install paddlespeech.
+
+
+### 2. Prepare config File
+
+
+The input of  ASR client demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
+
+Here are sample files for thisASR client demo that can be downloaded:
+```bash
+wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
+```
+
+### 2. Streaming ASR Client Usage
+
+- microphone
+   ```
+   python microphone_client.py
+
+   ```
--- a/paddlespeech/server/tests/asr/online/README_cn.md
+++ b/paddlespeech/server/tests/asr/online/README_cn.md
-([简体中文](./README_cn.md)|English)
+([English](./README.md)|中文)

 # 语音服务

 ## 介绍
-本文档介绍如何使用流式ASR的三种不同客户端:网页、麦克风、Python模拟流式服务。 
+本文档介绍如何使用流式ASR的一种不同客户端:麦克风。 


 ## 使用方法
@@ -20,7 +20,7 @@

 可以下载此 ASR client的示例音频：
 ```bash
-wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
+wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
 ```

 ### 2. 流式 ASR 客户端使用方法
@@ -40,10 +40,3 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
   python microphone_client.py

   ```
-
-
- 网页
-   ```
-   # 进入web目录后参考相关readme.md
-
-   ```
--- a/paddlespeech/server/utils/buffer.py
+++ b/paddlespeech/server/utils/buffer.py
@@ -63,12 +63,12 @@ class ChunkBuffer(object):
        the sample rate.
        Yields Frames of the requested duration.
        """
+
        audio = self.remained_audio + audio
        self.remained_audio = b''

        offset = 0
        timestamp = 0.0
-
        while offset + self.window_bytes <= len(audio):
            yield Frame(audio[offset:offset + self.window_bytes], timestamp,
                        self.window_sec)

--- a/paddlespeech/server/utils/util.py
+++ b/paddlespeech/server/utils/util.py
@@ -52,6 +52,10 @@ def get_chunks(data, block_size, pad_size, step):
    Returns:
        list: chunks list
    """
+
+    if block_size == -1:
+        return [data]
+
    if step == "am":
        data_len = data.shape[1]
    elif step == "voc":

--- a/paddlespeech/server/ws/asr_socket.py
+++ b/paddlespeech/server/ws/asr_socket.py
@@ -13,12 +13,12 @@
 # limitations under the License.
 import json

-import numpy as np
 from fastapi import APIRouter
 from fastapi import WebSocket
 from fastapi import WebSocketDisconnect
 from starlette.websockets import WebSocketState as WebSocketState

+from paddlespeech.server.engine.asr.online.asr_engine import PaddleASRConnectionHanddler
 from paddlespeech.server.engine.engine_pool import get_engine_pool
 from paddlespeech.server.utils.buffer import ChunkBuffer
 from paddlespeech.server.utils.vad import VADAudio
@@ -28,26 +28,29 @@ router = APIRouter()

 @router.websocket('/ws/asr')
 async def websocket_endpoint(websocket: WebSocket):
-
    await websocket.accept()

    engine_pool = get_engine_pool()
    asr_engine = engine_pool['asr']
+    connection_handler = None
    # init buffer
+    # each websocekt connection has its own chunk buffer
    chunk_buffer_conf = asr_engine.config.chunk_buffer_conf
    chunk_buffer = ChunkBuffer(
-        window_n=7,
-        shift_n=4,
-        window_ms=20,
-        shift_ms=10,
-        sample_rate=chunk_buffer_conf['sample_rate'],
-        sample_width=chunk_buffer_conf['sample_width'])
+        window_n=chunk_buffer_conf.window_n,
+        shift_n=chunk_buffer_conf.shift_n,
+        window_ms=chunk_buffer_conf.window_ms,
+        shift_ms=chunk_buffer_conf.shift_ms,
+        sample_rate=chunk_buffer_conf.sample_rate,
+        sample_width=chunk_buffer_conf.sample_width)
+
    # init vad
-    vad_conf = asr_engine.config.vad_conf
-    vad = VADAudio(
-        aggressiveness=vad_conf['aggressiveness'],
-        rate=vad_conf['sample_rate'],
-        frame_duration_ms=vad_conf['frame_duration_ms'])
+    vad_conf = asr_engine.config.get('vad_conf', None)
+    if vad_conf:
+        vad = VADAudio(
+            aggressiveness=vad_conf['aggressiveness'],
+            rate=vad_conf['sample_rate'],
+            frame_duration_ms=vad_conf['frame_duration_ms'])

    try:
        while True:
@@ -64,13 +67,21 @@ async def websocket_endpoint(websocket: WebSocket):
                if message['signal'] == 'start':
                    resp = {"status": "ok", "signal": "server_ready"}
                    # do something at begining here
+                    # create the instance to process the audio
+                    connection_handler = PaddleASRConnectionHanddler(asr_engine)
                    await websocket.send_json(resp)
                elif message['signal'] == 'end':
-                    engine_pool = get_engine_pool()
-                    asr_engine = engine_pool['asr']
                    # reset single  engine for an new connection
-                    asr_engine.reset()
-                    resp = {"status": "ok", "signal": "finished"}
+                    connection_handler.decode(is_finished=True)
+                    connection_handler.rescoring()
+                    asr_results = connection_handler.get_result()
+                    connection_handler.reset()
+
+                    resp = {
+                        "status": "ok",
+                        "signal": "finished",
+                        'asr_results': asr_results
+                    }
                    await websocket.send_json(resp)
                    break
                else:
@@ -79,21 +90,11 @@ async def websocket_endpoint(websocket: WebSocket):
            elif "bytes" in message:
                message = message["bytes"]

-                engine_pool = get_engine_pool()
-                asr_engine = engine_pool['asr']
-                asr_results = ""
-                frames = chunk_buffer.frame_generator(message)
-                for frame in frames:
-                    samples = np.frombuffer(frame.bytes, dtype=np.int16)
-                    sample_rate = asr_engine.config.sample_rate
-                    x_chunk, x_chunk_lens = asr_engine.preprocess(samples,
-                                                                  sample_rate)
-                    asr_engine.run(x_chunk, x_chunk_lens)
-                    asr_results = asr_engine.postprocess()
+                connection_handler.extract_feat(message)
+                connection_handler.decode(is_finished=False)
+                asr_results = connection_handler.get_result()

-                asr_results = asr_engine.postprocess()
                resp = {'asr_results': asr_results}
-
                await websocket.send_json(resp)
    except WebSocketDisconnect:
        pass
--- a/paddlespeech/t2s/exps/synthesize.py
+++ b/paddlespeech/t2s/exps/synthesize.py
@@ -52,7 +52,7 @@ def evaluate(args):
    # acoustic model
    am_name = args.am[:args.am.rindex('_')]
    am_dataset = args.am[args.am.rindex('_') + 1:]
-    
+
    am_inference = get_am_inference(
        am=args.am,
        am_config=am_config,

--- a/paddlespeech/t2s/models/wavernn/wavernn.py
+++ b/paddlespeech/t2s/models/wavernn/wavernn.py
@@ -360,7 +360,9 @@ class WaveRNN(nn.Layer):
                x = sample.transpose([1, 0, 2])

            elif self.mode == 'RAW':
-                posterior = F.softmax(logits, axis=1)
+                # fix bug for paddle 2.3, see https://github.com/PaddlePaddle/Paddle/commit/01f606b4f1ca3e184a59111084ed460ee0798a5a
+                # posterior = F.softmax(logits, axis=1)
+                posterior = logits
                distrib = paddle.distribution.Categorical(posterior)
                # corresponding operate [np.floor((fx + 1) / 2 * mu + 0.5)] in enocde_mu_law
                # distrib.sample([1])[0].cast('float32'): [0, 2**bits-1]

--- a/paddlespeech/vector/cluster/diarization.py
+++ b/paddlespeech/vector/cluster/diarization.py
@@ -20,11 +20,11 @@ A few sklearn functions are modified in this script as per requirement.
 import argparse
 import copy
 import warnings
-from distutils.util import strtobool

 import numpy as np
 import scipy
 import sklearn
+from distutils.util import strtobool
 from scipy import linalg
 from scipy import sparse
 from scipy.sparse.csgraph import connected_components

--- a/paddlespeech/vector/modules/layer.py
+++ b/paddlespeech/vector/modules/layer.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.nn as nn
+from paddle.autograd import PyLayer
+
+
+class GradientReversalFunction(PyLayer):
+    """Gradient Reversal Layer from:
+    Unsupervised Domain Adaptation by Backpropagation (Ganin & Lempitsky, 2015)
+
+    Forward pass is the identity function. In the backward pass,
+    the upstream gradients are multiplied by -lambda (i.e. gradient is reversed)
+    """
+
+    @staticmethod
+    def forward(ctx, x, lambda_=1):
+        """Forward in networks
+        """
+        ctx.save_for_backward(lambda_)
+        return x.clone()
+
+    @staticmethod
+    def backward(ctx, grads):
+        """Backward in networks
+        """
+        lambda_, = ctx.saved_tensor()
+        dx = -lambda_ * grads
+        return dx
+
+
+class GradientReversalLayer(nn.Layer):
+    """Gradient Reversal Layer from:
+    Unsupervised Domain Adaptation by Backpropagation (Ganin & Lempitsky, 2015)
+
+    Forward pass is the identity function. In the backward pass,
+    the upstream gradients are multiplied by -lambda (i.e. gradient is reversed)
+    """
+
+    def __init__(self, lambda_=1):
+        super(GradientReversalLayer, self).__init__()
+        self.lambda_ = lambda_
+
+    def forward(self, x):
+        """Forward in networks
+        """
+        return GradientReversalFunction.apply(x, self.lambda_)
+
+
+if __name__ == "__main__":
+    paddle.set_device("cpu")
+
+    data = paddle.randn([2, 3], dtype="float64")
+    data.stop_gradient = False
+    grl = GradientReversalLayer(1)
+    out = grl(data)
+    out.mean().backward()
+    print(data.grad)
+
+    data = paddle.randn([2, 3], dtype="float64")
+    data.stop_gradient = False
+    grl = GradientReversalLayer(-1)
+    out = grl(data)
+    out.mean().backward()
+    print(data.grad)
--- a/paddlespeech/vector/modules/loss.py
+++ b/paddlespeech/vector/modules/loss.py
@@ -91,3 +91,199 @@ class LogSoftmaxWrapper(nn.Layer):
        predictions = F.log_softmax(predictions, axis=1)
        loss = self.criterion(predictions, targets) / targets.sum()
        return loss
+
+
+class NCELoss(nn.Layer):
+    """Noise Contrastive Estimation loss funtion
+
+    Noise Contrastive Estimation (NCE) is an approximation method that is used to
+    work around the huge computational cost of large softmax layer.
+    The basic idea is to convert the prediction problem into classification problem
+    at training stage. It has been proved that these two criterions converges to
+    the same minimal point as long as noise distribution is close enough to real one.
+
+    NCE bridges the gap between generative models and discriminative models,
+    rather than simply speedup the softmax layer.
+    With NCE, you can turn almost anything into posterior with less effort (I think).
+
+    Refs:
+    NCE：http://www.cs.helsinki.fi/u/ahyvarin/papers/Gutmann10AISTATS.pdf
+    Thanks: https://github.com/mingen-pan/easy-to-use-NCE-RNN-for-Pytorch/blob/master/nce.py
+
+    Examples:
+    Q = Q_from_tokens(output_dim)
+    NCELoss(Q)
+    """
+
+    def __init__(self, Q, noise_ratio=100, Z_offset=9.5):
+        """Noise Contrastive Estimation loss funtion
+
+        Args:
+            Q (tensor): prior model, uniform or guassian
+            noise_ratio (int, optional): noise sampling times. Defaults to 100.
+            Z_offset (float, optional): scale of post processing the score. Defaults to 9.5.
+        """
+        super(NCELoss, self).__init__()
+        assert type(noise_ratio) is int
+        self.Q = paddle.to_tensor(Q, stop_gradient=False)
+        self.N = self.Q.shape[0]
+        self.K = noise_ratio
+        self.Z_offset = Z_offset
+
+    def forward(self, output, target):
+        """Forward inference
+
+        Args:
+            output (tensor): the model output, which is the input of loss function
+        """
+        output = paddle.reshape(output, [-1, self.N])
+        B = output.shape[0]
+        noise_idx = self.get_noise(B)
+        idx = self.get_combined_idx(target, noise_idx)
+        P_target, P_noise = self.get_prob(idx, output, sep_target=True)
+        Q_target, Q_noise = self.get_Q(idx)
+        loss = self.nce_loss(P_target, P_noise, Q_noise, Q_target)
+        return loss.mean()
+
+    def get_Q(self, idx, sep_target=True):
+        """Get prior model of batchsize data
+        """
+        idx_size = idx.size
+        prob_model = paddle.to_tensor(
+            self.Q.numpy()[paddle.reshape(idx, [-1]).numpy()])
+        prob_model = paddle.reshape(prob_model, [idx.shape[0], idx.shape[1]])
+        if sep_target:
+            return prob_model[:, 0], prob_model[:, 1:]
+        else:
+            return prob_model
+
+    def get_prob(self, idx, scores, sep_target=True):
+        """Post processing the score of post model(output of nn) of batchsize data
+        """
+        scores = self.get_scores(idx, scores)
+        scale = paddle.to_tensor([self.Z_offset], dtype='float64')
+        scores = paddle.add(scores, -scale)
+        prob = paddle.exp(scores)
+        if sep_target:
+            return prob[:, 0], prob[:, 1:]
+        else:
+            return prob
+
+    def get_scores(self, idx, scores):
+        """Get the score of post model(output of nn) of batchsize data
+        """
+        B, N = scores.shape
+        K = idx.shape[1]
+        idx_increment = paddle.to_tensor(
+            N * paddle.reshape(paddle.arange(B), [B, 1]) * paddle.ones([1, K]),
+            dtype="int64",
+            stop_gradient=False)
+        new_idx = idx_increment + idx
+        new_scores = paddle.index_select(
+            paddle.reshape(scores, [-1]), paddle.reshape(new_idx, [-1]))
+
+        return paddle.reshape(new_scores, [B, K])
+
+    def get_noise(self, batch_size, uniform=True):
+        """Select noise sample
+        """
+        if uniform:
+            noise = np.random.randint(self.N, size=self.K * batch_size)
+        else:
+            noise = np.random.choice(
+                self.N, self.K * batch_size, replace=True, p=self.Q.data)
+        noise = paddle.to_tensor(noise, dtype='int64', stop_gradient=False)
+        noise_idx = paddle.reshape(noise, [batch_size, self.K])
+        return noise_idx
+
+    def get_combined_idx(self, target_idx, noise_idx):
+        """Combined target and noise
+        """
+        target_idx = paddle.reshape(target_idx, [-1, 1])
+        return paddle.concat((target_idx, noise_idx), 1)
+
+    def nce_loss(self, prob_model, prob_noise_in_model, prob_noise,
+                 prob_target_in_noise):
+        """Combined the loss of target and noise
+        """
+
+        def safe_log(tensor):
+            """Safe log
+            """
+            EPSILON = 1e-10
+            return paddle.log(EPSILON + tensor)
+
+        model_loss = safe_log(prob_model /
+                              (prob_model + self.K * prob_target_in_noise))
+        model_loss = paddle.reshape(model_loss, [-1])
+
+        noise_loss = paddle.sum(
+            safe_log((self.K * prob_noise) /
+                     (prob_noise_in_model + self.K * prob_noise)), -1)
+        noise_loss = paddle.reshape(noise_loss, [-1])
+
+        loss = -(model_loss + noise_loss)
+
+        return loss
+
+
+class FocalLoss(nn.Layer):
+    """This criterion is a implemenation of Focal Loss, which is proposed in 
+    Focal Loss for Dense Object Detection.
+
+        Loss(x, class) = - \alpha (1-softmax(x)[class])^gamma \log(softmax(x)[class])
+
+    The losses are averaged across observations for each minibatch.
+
+    Args:
+        alpha(1D Tensor, Variable) : the scalar factor for this criterion
+        gamma(float, double) : gamma > 0; reduces the relative loss for well-classiﬁed examples (p > .5), 
+                                putting more focus on hard, misclassiﬁed examples
+        size_average(bool): By default, the losses are averaged over observations for each minibatch.
+                            However, if the field size_average is set to False, the losses are
+                            instead summed for each minibatch.
+    """
+
+    def __init__(self, alpha=1, gamma=0, size_average=True, ignore_index=-100):
+        super(FocalLoss, self).__init__()
+        self.alpha = alpha
+        self.gamma = gamma
+        self.size_average = size_average
+        self.ce = nn.CrossEntropyLoss(
+            ignore_index=ignore_index, reduction="none")
+
+    def forward(self, outputs, targets):
+        """Forword inference.
+
+        Args:
+            outputs: input tensor
+            target: target label tensor
+        """
+        ce_loss = self.ce(outputs, targets)
+        pt = paddle.exp(-ce_loss)
+        focal_loss = self.alpha * (1 - pt)**self.gamma * ce_loss
+        if self.size_average:
+            return focal_loss.mean()
+        else:
+            return focal_loss.sum()
+
+
+if __name__ == "__main__":
+    import numpy as np
+    from paddlespeech.vector.utils.vector_utils import Q_from_tokens
+    paddle.set_device("cpu")
+
+    input_data = paddle.uniform([5, 100], dtype="float64")
+    label_data = np.random.randint(0, 100, size=(5)).astype(np.int64)
+
+    input = paddle.to_tensor(input_data)
+    label = paddle.to_tensor(label_data)
+
+    loss1 = FocalLoss()
+    loss = loss1.forward(input, label)
+    print("loss: %.5f" % (loss))
+
+    Q = Q_from_tokens(100)
+    loss2 = NCELoss(Q)
+    loss = loss2.forward(input, label)
+    print("loss: %.5f" % (loss))
--- a/paddlespeech/vector/utils/vector_utils.py
+++ b/paddlespeech/vector/utils/vector_utils.py
@@ -11,6 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+import paddle


 def get_chunks(seg_dur, audio_id, audio_duration):
@@ -30,3 +31,11 @@ def get_chunks(seg_dur, audio_id, audio_duration):
        for i in range(num_chunks)
    ]
    return chunk_lst
+
+
+def Q_from_tokens(token_num):
+    """Get prior model, data from uniform, would support others(guassian) in future
+    """
+    freq = [1] * token_num
+    Q = paddle.to_tensor(freq, dtype='float64')
+    return Q / Q.sum()
--- a/speechx/CMakeLists.txt
+++ b/speechx/CMakeLists.txt
@@ -63,7 +63,8 @@ include(libsndfile)
 # include(boost) # not work
 set(boost_SOURCE_DIR ${fc_patch}/boost-src)
 set(BOOST_ROOT ${boost_SOURCE_DIR})
-# #find_package(boost REQUIRED PATHS ${BOOST_ROOT})
+include_directories(${boost_SOURCE_DIR})
+link_directories(${boost_SOURCE_DIR}/stage/lib)

 # Eigen
 include(eigen)
@@ -141,4 +142,4 @@ set(DEPS ${DEPS}
 set(SPEECHX_ROOT ${CMAKE_CURRENT_SOURCE_DIR}/speechx)

 add_subdirectory(speechx)
-add_subdirectory(examples)
\ No newline at end of file
+add_subdirectory(examples)
--- a/speechx/README.md
+++ b/speechx/README.md
@@ -3,7 +3,7 @@
 ## Environment

 We develop under:
-* docker - registry.baidubce.com/paddlepaddle/paddle:2.1.1-gpu-cuda10.2-cudnn7
+* docker - `registry.baidubce.com/paddlepaddle/paddle:2.2.2-gpu-cuda10.2-cudnn7`
 * os - Ubuntu 16.04.7 LTS
 * gcc/g++/gfortran - 8.2.0
 * cmake - 3.16.0
@@ -19,7 +19,7 @@ We develop under:
 1. First to launch docker container.

 ```
-nvidia-docker run --privileged  --net=host --ipc=host -it --rm -v $PWD:/workspace --name=dev registry.baidubce.com/paddlepaddle/paddle:2.1.1-gpu-cuda10.2-cudnn7 /bin/bash
+docker run --privileged  --net=host --ipc=host -it --rm -v $PWD:/workspace --name=dev registry.baidubce.com/paddlepaddle/paddle:2.2.2-gpu-cuda10.2-cudnn7 /bin/bash
 ```

 * More `Paddle` docker images you can see [here](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/docker/linux-docker.html).
@@ -60,4 +60,5 @@ popd

 ## TODO

+### Deepspeech2 with linear feature
 * DecibelNormalizer: there is a little bit difference between offline and online db norm. The computation of online db norm read feature chunk by chunk, which causes the feature size is different with offline db norm. In normalizer.cc:73, the samples.size() is different, which causes the difference of result.
--- a/speechx/build.sh
+++ b/speechx/build.sh
@@ -22,6 +22,6 @@ cd build
 cmake .. -DBOOST_ROOT:STRING=${boost_SOURCE_DIR}
 #cmake .. 

-make -j10
+make -j

 cd -
--- a/speechx/examples/ds2_ol/CMakeLists.txt
+++ b/speechx/examples/ds2_ol/CMakeLists.txt
@@ -2,4 +2,5 @@ cmake_minimum_required(VERSION 3.14 FATAL_ERROR)

 add_subdirectory(feat)
 add_subdirectory(nnet)
-add_subdirectory(decoder)
\ No newline at end of file
+add_subdirectory(decoder)
+add_subdirectory(websocket)
--- a/speechx/examples/ds2_ol/README.md
+++ b/speechx/examples/ds2_ol/README.md
@@ -4,6 +4,8 @@ Please go to `aishell` to test it.

 * aishell
 Deepspeech2 Streaming Decoding under aishell dataset.
+* websocket
+Streaming ASR with websocket.

 The below is for developing and offline testing:
 * nnet

--- a/speechx/examples/ds2_ol/aishell/README.md
+++ b/speechx/examples/ds2_ol/aishell/README.md
@@ -8,7 +8,7 @@ Mandarin -> 16.14 % N=104612 C=88190 S=16110 D=312 I=465
 Other -> 0.00 % N=0 C=0 S=0 D=0 I=0
 ```

-## CTC Prefix Beam Search w LM
+## CTC Prefix Beam Search w/ LM

 LM: zh_giga.no_cna_cmn.prune01244.klm
 ```
@@ -19,9 +19,18 @@ Other -> 0.00 % N=0 C=0 S=0 D=0 I=0

 ## CTC WFST

-LM: aishell train
+LM: [aishell train](http://paddlespeech.bj.bcebos.com/speechx/examples/ds2_ol/aishell/aishell_graph.zip)
+--acoustic_scale=1.2
 ```
 Overall -> 11.14 % N=103017 C=93363 S=9583 D=71 I=1819
 Mandarin -> 11.14 % N=103017 C=93363 S=9583 D=71 I=1818
 Other -> 0.00 % N=0 C=0 S=0 D=0 I=1
 ```
+
+LM: [wenetspeech](http://paddlespeech.bj.bcebos.com/speechx/examples/ds2_ol/aishell/wenetspeech_graph.zip)
+--acoustic_scale=1.5
+```
+Overall -> 10.93 % N=104765 C=93410 S=9780 D=1575 I=95
+Mandarin -> 10.93 % N=104762 C=93410 S=9779 D=1573 I=95
+Other -> 100.00 % N=3 C=0 S=1 D=2 I=0
+```
--- a/speechx/examples/ds2_ol/aishell/local/split_data.sh
+++ b/speechx/examples/ds2_ol/aishell/local/split_data.sh
 #!/usr/bin/env bash

+set -eo pipefail
+
 data=$1
-feat_scp=$2
-split_feat_name=$3
+scp=$2
+split_name=$3
 numsplit=$4

+# save in $data/split{n}
+# $scp to split
+# 

-if ! [ "$numsplit" -gt 0 ]; then
+if [[ ! $numsplit -gt 0 ]]; then
  echo "Invalid num-split argument";
  exit 1;
 fi

 directories=$(for n in `seq $numsplit`; do echo $data/split${numsplit}/$n; done)
-feat_split_scp=$(for n in `seq $numsplit`; do echo $data/split${numsplit}/$n/${split_feat_name}; done)
-echo $feat_split_scp
+scp_splits=$(for n in `seq $numsplit`; do echo $data/split${numsplit}/$n/${split_name}; done)
+
 # if this mkdir fails due to argument-list being too long, iterate.
 if ! mkdir -p $directories >&/dev/null; then
  for n in `seq $numsplit`; do
@@ -21,4 +26,5 @@ if ! mkdir -p $directories >&/dev/null; then
  done
 fi

-utils/split_scp.pl $feat_scp $feat_split_scp
+echo "utils/split_scp.pl $scp $scp_splits"
+utils/split_scp.pl $scp $scp_splits
--- a/speechx/examples/ds2_ol/aishell/path.sh
+++ b/speechx/examples/ds2_ol/aishell/path.sh
 # This contains the locations of binarys build required for running the examples.

-SPEECHX_ROOT=$PWD/../../../
+SPEECHX_ROOT=$PWD/../../..
 SPEECHX_EXAMPLES=$SPEECHX_ROOT/build/examples

 SPEECHX_TOOLS=$SPEECHX_ROOT/tools
@@ -10,5 +10,5 @@ TOOLS_BIN=$SPEECHX_TOOLS/valgrind/install/bin

 export LC_AL=C

-SPEECHX_BIN=$SPEECHX_EXAMPLES/ds2_ol/decoder:$SPEECHX_EXAMPLES/ds2_ol/feat
-export PATH=$PATH:$SPEECHX_BIN:$TOOLS_BIN
\ No newline at end of file
+SPEECHX_BIN=$SPEECHX_EXAMPLES/ds2_ol/decoder:$SPEECHX_EXAMPLES/ds2_ol/feat:$SPEECHX_EXAMPLES/ds2_ol/websocket
+export PATH=$PATH:$SPEECHX_BIN:$TOOLS_BIN
--- a/speechx/examples/ds2_ol/aishell/run.sh
+++ b/speechx/examples/ds2_ol/aishell/run.sh
@@ -29,8 +29,8 @@ vocb_dir=$ckpt_dir/data/lang_char/
 mkdir -p exp
 exp=$PWD/exp

-if [ $stage -le 0 ] && [ $stop_stage -ge 0 ];then
-    aishell_wav_scp=aishell_test.scp
+aishell_wav_scp=aishell_test.scp
+if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ];then
    if [ ! -d $data/test ]; then
        pushd $data
        wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/aishell_test.zip
@@ -42,11 +42,12 @@ if [ $stage -le 0 ] && [ $stop_stage -ge 0 ];then
        paste $data/utt_id $data/wavlist > $data/$aishell_wav_scp
    fi

-
-    if [ ! -d $ckpt_dir ]; then
+    if [ ! -f $ckpt_dir/data/mean_std.json ]; then
        mkdir -p $ckpt_dir
-        wget -P $ckpt_dir -c https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz
-        tar xzfv $model_dir/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz -C $ckpt_dir
+        pushd $ckpt_dir
+        wget -c https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz
+        tar xzfv asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz 
+        popd
    fi

    lm=$data/zh_giga.no_cna_cmn.prune01244.klm
@@ -65,7 +66,7 @@ wer=./aishell_wer
 export GLOG_logtostderr=1


-if [ $stage -le 1 ] && [ $stop_stage -ge 1 ]; then
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    # 3. gen linear feat
    cmvn=$data/cmvn.ark
    cmvn-json2kaldi --json_file=$ckpt_dir/data/mean_std.json --cmvn_write_path=$cmvn
@@ -80,7 +81,7 @@ if [ $stage -le 1 ] && [ $stop_stage -ge 1 ]; then
        --streaming_chunk=0.36
 fi

-if [ $stage -le 2 ] && [ $stop_stage -ge 2 ];then
+if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
    #  recognizer
    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/recog.wolm.log \
    ctc-prefix-beam-search-decoder-ol \
@@ -92,10 +93,10 @@ if [ $stage -le 2 ] && [ $stop_stage -ge 2 ];then
        --result_wspecifier=ark,t:$data/split${nj}/JOB/result

    cat $data/split${nj}/*/result > $exp/${label_file}
-    utils/compute-wer.py --char=1 --v=1 $exp/${label_file} $text > $exp/${wer}
+    utils/compute-wer.py --char=1 --v=1 $text $exp/${label_file} > $exp/${wer}
 fi

-if [ $stage -le 3 ] && [ $stop_stage -ge 3 ];then
+if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
    #  decode with lm
    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/recog.lm.log \
    ctc-prefix-beam-search-decoder-ol \
@@ -108,21 +109,22 @@ if [ $stage -le 3 ] && [ $stop_stage -ge 3 ];then
        --result_wspecifier=ark,t:$data/split${nj}/JOB/result_lm
 
    cat $data/split${nj}/*/result_lm > $exp/${label_file}_lm
-    utils/compute-wer.py --char=1 --v=1 $exp/${label_file}_lm $text > $exp/${wer}_lm
+    utils/compute-wer.py --char=1 --v=1 $text $exp/${label_file}_lm > $exp/${wer}.lm
 fi

+if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
+    wfst=$data/wfst/
+    mkdir -p $wfst
+    if [ ! -f $wfst/aishell_graph.zip ]; then
+        pushd $wfst
+        wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/aishell_graph.zip
+        unzip aishell_graph.zip
+        mv aishell_graph/* $wfst
+        popd
+    fi

-wfst=$data/wfst/
-mkdir -p $wfst
-if [ ! -f $wfst/aishell_graph.zip ]; then
-    pushd $wfst
-    wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/aishell_graph.zip
-    unzip aishell_graph.zip
-    popd
-fi
+    graph_dir=$wfst/

-graph_dir=$wfst/aishell_graph
-if [ $stage -le 4 ] && [ $stop_stage -ge 4 ]; then
    #  TLG decoder
    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/recog.wfst.log \
    wfst-decoder-ol \
@@ -136,5 +138,44 @@ if [ $stage -le 4 ] && [ $stop_stage -ge 4 ]; then
        --result_wspecifier=ark,t:$data/split${nj}/JOB/result_tlg

    cat $data/split${nj}/*/result_tlg > $exp/${label_file}_tlg
-    utils/compute-wer.py --char=1 --v=1 $exp/${label_file}_tlg $text > $exp/${wer}_tlg
-fi
\ No newline at end of file
+    utils/compute-wer.py --char=1 --v=1 $text $exp/${label_file}_tlg > $exp/${wer}.tlg
+fi
+
+if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
+
+    cmvn=$data/cmvn.ark
+    if [ ! -f $data/split${nj}/1/${aishell_wav_scp} ]; then
+        cmvn-json2kaldi --json_file=$ckpt_dir/data/mean_std.json --cmvn_write_path=$cmvn
+        ./local/split_data.sh $data ${data}/${aishell_wav_scp} $aishell_wav_scp $nj
+    fi
+
+    wfst=$data/wfst/
+    mkdir -p $wfst
+    if [ ! -f $wfst/aishell_graph.zip ]; then
+        pushd $wfst
+        wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/aishell_graph.zip
+        unzip aishell_graph.zip
+        popd
+    fi
+
+    graph_dir=$wfst/aishell_graph
+
+    #  TLG decoder
+    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/recognizer.log \
+    recognizer_test_main \
+        --wav_rspecifier=scp:$data/split${nj}/JOB/${aishell_wav_scp} \
+        --cmvn_file=$cmvn \
+        --model_path=$model_dir/avg_1.jit.pdmodel \
+        --convert2PCM32=true \
+        --streaming_chunk=30 \
+        --params_path=$model_dir/avg_1.jit.pdiparams \
+        --word_symbol_table=$graph_dir/words.txt \
+        --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
+        --graph_path=$graph_dir/TLG.fst --max_active=7500 \
+        --acoustic_scale=1.2 \
+        --result_wspecifier=ark,t:$data/split${nj}/JOB/result_recognizer
+
+    cat $data/split${nj}/*/result_recognizer > $exp/${label_file}_recognizer
+    utils/compute-wer.py --char=1 --v=1 $text $exp/${label_file}_recognizer > $exp/${wer}.recognizer
+fi
+
--- a/speechx/examples/ds2_ol/decoder/CMakeLists.txt
+++ b/speechx/examples/ds2_ol/decoder/CMakeLists.txt
@@ -17,3 +17,6 @@ add_executable(${bin_name} ${CMAKE_CURRENT_SOURCE_DIR}/${bin_name}.cc)
 target_include_directories(${bin_name} PRIVATE ${SPEECHX_ROOT} ${SPEECHX_ROOT}/kaldi)
 target_link_libraries(${bin_name} PUBLIC nnet decoder fst utils gflags glog kaldi-base kaldi-matrix kaldi-util ${DEPS})

+add_executable(recognizer_test_main ${CMAKE_CURRENT_SOURCE_DIR}/recognizer_test_main.cc)
+target_include_directories(recognizer_test_main PRIVATE ${SPEECHX_ROOT} ${SPEECHX_ROOT}/kaldi)
+target_link_libraries(recognizer_test_main PUBLIC frontend kaldi-feat-common nnet decoder fst utils gflags glog kaldi-base kaldi-matrix kaldi-util kaldi-decoder ${DEPS})
--- a/speechx/examples/ds2_ol/decoder/ctc-prefix-beam-search-decoder-ol.cc
+++ b/speechx/examples/ds2_ol/decoder/ctc-prefix-beam-search-decoder-ol.cc
@@ -58,12 +58,11 @@ int main(int argc, char* argv[]) {
    kaldi::SequentialBaseFloatMatrixReader feature_reader(
        FLAGS_feature_rspecifier);
    kaldi::TokenWriter result_writer(FLAGS_result_wspecifier);
-
-    std::string model_graph = FLAGS_model_path;
+    std::string model_path = FLAGS_model_path;
    std::string model_params = FLAGS_param_path;
    std::string dict_file = FLAGS_dict_file;
    std::string lm_path = FLAGS_lm_path;
-    LOG(INFO) << "model path: " << model_graph;
+    LOG(INFO) << "model path: " << model_path;
    LOG(INFO) << "model param: " << model_params;
    LOG(INFO) << "dict path: " << dict_file;
    LOG(INFO) << "lm path: " << lm_path;
@@ -76,8 +75,8 @@ int main(int argc, char* argv[]) {
    ppspeech::CTCBeamSearch decoder(opts);

    ppspeech::ModelOptions model_opts;
-    model_opts.model_path = model_graph;
-    model_opts.params_path = model_params;
+    model_opts.model_path = model_path;
+    model_opts.param_path = model_params;
    model_opts.cache_shape = FLAGS_model_cache_names;
    model_opts.input_names = FLAGS_model_input_names;
    model_opts.output_names = FLAGS_model_output_names;
@@ -125,7 +124,6 @@ int main(int argc, char* argv[]) {
            if (feature_chunk_size < receptive_field_length) break;

            int32 start = chunk_idx * chunk_stride;
-            int32 end = start + chunk_size;

            for (int row_id = 0; row_id < chunk_size; ++row_id) {
                kaldi::SubVector<kaldi::BaseFloat> tmp(feature, start);

--- a/speechx/examples/ds2_ol/decoder/recognizer_test_main.cc
+++ b/speechx/examples/ds2_ol/decoder/recognizer_test_main.cc
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "decoder/recognizer.h"
+#include "decoder/param.h"
+#include "kaldi/feat/wave-reader.h"
+#include "kaldi/util/table-types.h"
+
+DEFINE_string(wav_rspecifier, "", "test feature rspecifier");
+DEFINE_string(result_wspecifier, "", "test result wspecifier");
+
+int main(int argc, char* argv[]) {
+    gflags::ParseCommandLineFlags(&argc, &argv, false);
+    google::InitGoogleLogging(argv[0]);
+
+    ppspeech::RecognizerResource resource = ppspeech::InitRecognizerResoure();
+    ppspeech::Recognizer recognizer(resource);
+
+    kaldi::SequentialTableReader<kaldi::WaveHolder> wav_reader(
+        FLAGS_wav_rspecifier);
+    kaldi::TokenWriter result_writer(FLAGS_result_wspecifier);
+    int sample_rate = 16000;
+    float streaming_chunk = FLAGS_streaming_chunk;
+    int chunk_sample_size = streaming_chunk * sample_rate;
+    LOG(INFO) << "sr: " << sample_rate;
+    LOG(INFO) << "chunk size (s): " << streaming_chunk;
+    LOG(INFO) << "chunk size (sample): " << chunk_sample_size;
+
+    int32 num_done = 0, num_err = 0;
+
+    for (; !wav_reader.Done(); wav_reader.Next()) {
+        std::string utt = wav_reader.Key();
+        const kaldi::WaveData& wave_data = wav_reader.Value();
+
+        int32 this_channel = 0;
+        kaldi::SubVector<kaldi::BaseFloat> waveform(wave_data.Data(),
+                                                    this_channel);
+        int tot_samples = waveform.Dim();
+        LOG(INFO) << "wav len (sample): " << tot_samples;
+
+        int sample_offset = 0;
+        std::vector<kaldi::Vector<BaseFloat>> feats;
+        int feature_rows = 0;
+        while (sample_offset < tot_samples) {
+            int cur_chunk_size =
+                std::min(chunk_sample_size, tot_samples - sample_offset);
+
+            kaldi::Vector<kaldi::BaseFloat> wav_chunk(cur_chunk_size);
+            for (int i = 0; i < cur_chunk_size; ++i) {
+                wav_chunk(i) = waveform(sample_offset + i);
+            }
+            // wav_chunk = waveform.Range(sample_offset + i, cur_chunk_size);
+
+            recognizer.Accept(wav_chunk);
+            if (cur_chunk_size < chunk_sample_size) {
+                recognizer.SetFinished();
+            }
+            recognizer.Decode();
+
+            // no overlap
+            sample_offset += cur_chunk_size;
+        }
+
+        std::string result;
+        result = recognizer.GetFinalResult();
+        recognizer.Reset();
+        if (result.empty()) {
+            // the TokenWriter can not write empty string.
+            ++num_err;
+            KALDI_LOG << " the result of " << utt << " is empty";
+            continue;
+        }
+        KALDI_LOG << " the result of " << utt << " is " << result;
+        result_writer.Write(utt, result);
+        ++num_done;
+    }
+}
\ No newline at end of file
--- a/speechx/examples/ds2_ol/decoder/run.sh
+++ b/speechx/examples/ds2_ol/decoder/run.sh
@@ -48,7 +48,6 @@ if [ ! -f $lm ]; then
    popd
 fi

-
 feat_wspecifier=$exp_dir/feats.ark
 cmvn=$exp_dir/cmvn.ark

@@ -57,7 +56,7 @@ export GLOG_logtostderr=1
 # dump json cmvn to kaldi
 cmvn-json2kaldi \
    --json_file  $ckpt_dir/data/mean_std.json \
-    --cmvn_write_path $exp_dir/cmvn.ark \
+    --cmvn_write_path $cmvn \
    --binary=false
 echo "convert json cmvn to kaldi ark."

@@ -66,7 +65,7 @@ echo "convert json cmvn to kaldi ark."
 linear-spectrogram-wo-db-norm-ol \
    --wav_rspecifier=scp:$data/wav.scp \
    --feature_wspecifier=ark,t:$feat_wspecifier \
-    --cmvn_file=$exp_dir/cmvn.ark
+    --cmvn_file=$cmvn
 echo "compute linear spectrogram feature."

 # run ctc beam search decoder as streaming

--- a/speechx/examples/ds2_ol/decoder/wfst-decoder-ol.cc
+++ b/speechx/examples/ds2_ol/decoder/wfst-decoder-ol.cc
@@ -37,10 +37,12 @@ DEFINE_int32(receptive_field_length,
 DEFINE_int32(downsampling_rate,
             4,
             "two CNN(kernel=5) module downsampling rate.");
+DEFINE_string(
+    model_input_names,
+    "audio_chunk,audio_chunk_lens,chunk_state_h_box,chunk_state_c_box",
+    "model input names");
 DEFINE_string(model_output_names,
-              "save_infer_model/scale_0.tmp_1,save_infer_model/"
-              "scale_1.tmp_1,save_infer_model/scale_2.tmp_1,save_infer_model/"
-              "scale_3.tmp_1",
+              "softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0",
              "model output names");
 DEFINE_string(model_cache_names, "5-1-1024,5-1-1024", "model cache names");

@@ -77,8 +79,9 @@ int main(int argc, char* argv[]) {

    ppspeech::ModelOptions model_opts;
    model_opts.model_path = model_graph;
-    model_opts.params_path = model_params;
+    model_opts.param_path = model_params;
    model_opts.cache_shape = FLAGS_model_cache_names;
+    model_opts.input_names = FLAGS_model_input_names;
    model_opts.output_names = FLAGS_model_output_names;
    std::shared_ptr<ppspeech::PaddleNnet> nnet(
        new ppspeech::PaddleNnet(model_opts));

--- a/speechx/examples/ds2_ol/feat/CMakeLists.txt
+++ b/speechx/examples/ds2_ol/feat/CMakeLists.txt
@@ -9,4 +9,4 @@ target_link_libraries(${bin_name} frontend kaldi-util kaldi-feat-common gflags g
 set(bin_name cmvn-json2kaldi)
 add_executable(${bin_name} ${CMAKE_CURRENT_SOURCE_DIR}/${bin_name}.cc)
 target_include_directories(${bin_name} PRIVATE ${SPEECHX_ROOT} ${SPEECHX_ROOT}/kaldi)
-target_link_libraries(${bin_name} utils kaldi-util kaldi-matrix gflags glog)
+target_link_libraries(${bin_name} utils kaldi-util kaldi-matrix gflags glog ${DEPS})
\ No newline at end of file
--- a/speechx/examples/ds2_ol/feat/cmvn-json2kaldi.cc
+++ b/speechx/examples/ds2_ol/feat/cmvn-json2kaldi.cc
@@ -14,18 +14,20 @@

 // Note: Do not print/log ondemand object.

+#include "base/common.h"
 #include "base/flags.h"
 #include "base/log.h"
 #include "kaldi/matrix/kaldi-matrix.h"
 #include "kaldi/util/kaldi-io.h"
 #include "utils/file_utils.h"
-#include "utils/simdjson.h"
+// #include "boost/json.hpp"
+#include <boost/json/src.hpp>

 DEFINE_string(json_file, "", "cmvn json file");
 DEFINE_string(cmvn_write_path, "./cmvn.ark", "write cmvn");
 DEFINE_bool(binary, true, "write cmvn in binary (true) or text(false)");

-using namespace simdjson;
+using namespace boost::json;  // from <boost/json.hpp>

 int main(int argc, char* argv[]) {
    gflags::ParseCommandLineFlags(&argc, &argv, false);
@@ -33,49 +35,51 @@ int main(int argc, char* argv[]) {

    LOG(INFO) << "cmvn josn path: " << FLAGS_json_file;

-    try {
-        padded_string json = padded_string::load(FLAGS_json_file);
-
-        ondemand::parser parser;
-        ondemand::document doc = parser.iterate(json);
-        ondemand::value val = doc;
+    auto ifs = std::ifstream(FLAGS_json_file);
+    std::string json_str = ppspeech::ReadFile2String(FLAGS_json_file);
+    auto value = boost::json::parse(json_str);
+    if (!value.is_object()) {
+        LOG(ERROR) << "Input json file format error.";
+    }

-        ondemand::array mean_stat = val["mean_stat"];
-        std::vector<kaldi::BaseFloat> mean_stat_vec;
-        for (double x : mean_stat) {
-            mean_stat_vec.push_back(x);
+    for (auto obj : value.as_object()) {
+        if (obj.key() == "mean_stat") {
+            LOG(INFO) << "mean_stat:" << obj.value();
        }
-        // LOG(INFO) << mean_stat; this line will casue
-        // simdjson::simdjson_error("Objects and arrays can only be iterated
-        // when
-        // they are first encountered")
-
-        ondemand::array var_stat = val["var_stat"];
-        std::vector<kaldi::BaseFloat> var_stat_vec;
-        for (double x : var_stat) {
-            var_stat_vec.push_back(x);
+        if (obj.key() == "var_stat") {
+            LOG(INFO) << "var_stat: " << obj.value();
        }
-
-        kaldi::int32 frame_num = uint64_t(val["frame_num"]);
-        LOG(INFO) << "nframe: " << frame_num;
-
-        size_t mean_size = mean_stat_vec.size();
-        kaldi::Matrix<double> cmvn_stats(2, mean_size + 1);
-        for (size_t idx = 0; idx < mean_size; ++idx) {
-            cmvn_stats(0, idx) = mean_stat_vec[idx];
-            cmvn_stats(1, idx) = var_stat_vec[idx];
+        if (obj.key() == "frame_num") {
+            LOG(INFO) << "frame_num: " << obj.value();
        }
-        cmvn_stats(0, mean_size) = frame_num;
-        LOG(INFO) << cmvn_stats;
+    }
+
+    boost::json::array mean_stat = value.at("mean_stat").as_array();
+    std::vector<kaldi::BaseFloat> mean_stat_vec;
+    for (auto it = mean_stat.begin(); it != mean_stat.end(); it++) {
+        mean_stat_vec.push_back(it->as_double());
+    }

-        kaldi::WriteKaldiObject(
-            cmvn_stats, FLAGS_cmvn_write_path, FLAGS_binary);
-        LOG(INFO) << "cmvn stats have write into: " << FLAGS_cmvn_write_path;
-        LOG(INFO) << "Binary: " << FLAGS_binary;
-    } catch (simdjson::simdjson_error& err) {
-        LOG(ERR) << err.what();
+    boost::json::array var_stat = value.at("var_stat").as_array();
+    std::vector<kaldi::BaseFloat> var_stat_vec;
+    for (auto it = var_stat.begin(); it != var_stat.end(); it++) {
+        var_stat_vec.push_back(it->as_double());
    }

+    kaldi::int32 frame_num = uint64_t(value.at("frame_num").as_int64());
+    LOG(INFO) << "nframe: " << frame_num;
+
+    size_t mean_size = mean_stat_vec.size();
+    kaldi::Matrix<double> cmvn_stats(2, mean_size + 1);
+    for (size_t idx = 0; idx < mean_size; ++idx) {
+        cmvn_stats(0, idx) = mean_stat_vec[idx];
+        cmvn_stats(1, idx) = var_stat_vec[idx];
+    }
+    cmvn_stats(0, mean_size) = frame_num;
+    LOG(INFO) << cmvn_stats;

+    kaldi::WriteKaldiObject(cmvn_stats, FLAGS_cmvn_write_path, FLAGS_binary);
+    LOG(INFO) << "cmvn stats have write into: " << FLAGS_cmvn_write_path;
+    LOG(INFO) << "Binary: " << FLAGS_binary;
    return 0;
-}
\ No newline at end of file
+}
--- a/speechx/examples/ds2_ol/feat/linear-spectrogram-wo-db-norm-ol.cc
+++ b/speechx/examples/ds2_ol/feat/linear-spectrogram-wo-db-norm-ol.cc
@@ -32,7 +32,6 @@ DEFINE_string(feature_wspecifier, "", "output feats wspecifier");
 DEFINE_string(cmvn_file, "./cmvn.ark", "read cmvn");
 DEFINE_double(streaming_chunk, 0.36, "streaming feature chunk size");

-
 int main(int argc, char* argv[]) {
    gflags::ParseCommandLineFlags(&argc, &argv, false);
    google::InitGoogleLogging(argv[0]);
@@ -66,7 +65,13 @@ int main(int argc, char* argv[]) {
    std::unique_ptr<ppspeech::FrontendInterface> cmvn(
        new ppspeech::CMVN(FLAGS_cmvn_file, std::move(linear_spectrogram)));

-    ppspeech::FeatureCache feature_cache(kint16max, std::move(cmvn));
+    ppspeech::FeatureCacheOptions feat_cache_opts;
+    // the feature cache output feature chunk by chunk.
+    // frame_chunk_size : num frame of a chunk.
+    // frame_chunk_stride: chunk sliding window stride.
+    feat_cache_opts.frame_chunk_stride = 1;
+    feat_cache_opts.frame_chunk_size = 1;
+    ppspeech::FeatureCache feature_cache(feat_cache_opts, std::move(cmvn));
    LOG(INFO) << "feat dim: " << feature_cache.Dim();

    int sample_rate = 16000;
@@ -105,12 +110,13 @@ int main(int argc, char* argv[]) {
            if (cur_chunk_size < chunk_sample_size) {
                feature_cache.SetFinished();
            }
-            feature_cache.Read(&features);
-            if (features.Dim() == 0) break;
-
-            feats.push_back(features);
+            bool flag = true;
+            do {
+                flag = feature_cache.Read(&features);
+                feats.push_back(features);
+                feature_rows += features.Dim() / feature_cache.Dim();
+            } while(flag == true && features.Dim() != 0);
            sample_offset += cur_chunk_size;
-            feature_rows += features.Dim() / feature_cache.Dim();
        }

        int cur_idx = 0;

--- a/speechx/examples/ds2_ol/websocket/.gitignore
+++ b/speechx/examples/ds2_ol/websocket/.gitignore
+data
+exp
--- a/speechx/examples/ds2_ol/websocket/CMakeLists.txt
+++ b/speechx/examples/ds2_ol/websocket/CMakeLists.txt
+cmake_minimum_required(VERSION 3.14 FATAL_ERROR)
+
+add_executable(websocket_server_main ${CMAKE_CURRENT_SOURCE_DIR}/websocket_server_main.cc)
+target_include_directories(websocket_server_main PRIVATE ${SPEECHX_ROOT} ${SPEECHX_ROOT}/kaldi)
+target_link_libraries(websocket_server_main PUBLIC frontend kaldi-feat-common nnet decoder fst utils gflags glog kaldi-base kaldi-matrix kaldi-util kaldi-decoder websocket ${DEPS})
+
+add_executable(websocket_client_main ${CMAKE_CURRENT_SOURCE_DIR}/websocket_client_main.cc)
+target_include_directories(websocket_client_main PRIVATE ${SPEECHX_ROOT} ${SPEECHX_ROOT}/kaldi)
+target_link_libraries(websocket_client_main PUBLIC frontend kaldi-feat-common nnet decoder fst utils gflags glog kaldi-base kaldi-matrix kaldi-util kaldi-decoder websocket ${DEPS})
\ No newline at end of file
--- a/speechx/examples/ds2_ol/websocket/path.sh
+++ b/speechx/examples/ds2_ol/websocket/path.sh
+# This contains the locations of binarys build required for running the examples.
+
+SPEECHX_ROOT=$PWD/../../..
+SPEECHX_EXAMPLES=$SPEECHX_ROOT/build/examples
+
+SPEECHX_TOOLS=$SPEECHX_ROOT/tools
+TOOLS_BIN=$SPEECHX_TOOLS/valgrind/install/bin
+
+[ -d $SPEECHX_EXAMPLES ] || { echo "Error: 'build/examples' directory not found. please ensure that the project build successfully"; }
+
+export LC_AL=C
+
+SPEECHX_BIN=$SPEECHX_EXAMPLES/ds2_ol/websocket:$SPEECHX_EXAMPLES/ds2_ol/feat
+export PATH=$PATH:$SPEECHX_BIN:$TOOLS_BIN
--- a/speechx/examples/ds2_ol/websocket/websocket_client.sh
+++ b/speechx/examples/ds2_ol/websocket/websocket_client.sh
+#!/bin/bash
+set +x
+set -e
+
+. path.sh
+
+# 1. compile
+if [ ! -d ${SPEECHX_EXAMPLES} ]; then
+    pushd ${SPEECHX_ROOT} 
+    bash build.sh
+    popd
+fi
+
+# input
+mkdir -p data
+data=$PWD/data
+ckpt_dir=$data/model
+model_dir=$ckpt_dir/exp/deepspeech2_online/checkpoints/
+vocb_dir=$ckpt_dir/data/lang_char
+# output
+aishell_wav_scp=aishell_test.scp
+if [ ! -d $data/test ]; then
+    pushd $data
+    wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/aishell_test.zip
+    unzip  aishell_test.zip
+    popd
+
+    realpath $data/test/*/*.wav > $data/wavlist
+    awk -F '/' '{ print $(NF) }' $data/wavlist | awk -F '.' '{ print $1 }' > $data/utt_id
+    paste $data/utt_id $data/wavlist > $data/$aishell_wav_scp
+fi
+
+export GLOG_logtostderr=1
+
+# websocket client
+websocket_client_main \
+    --wav_rspecifier=scp:$data/$aishell_wav_scp --streaming_chunk=0.36
--- a/speechx/examples/ds2_ol/websocket/websocket_client_main.cc
+++ b/speechx/examples/ds2_ol/websocket/websocket_client_main.cc
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "websocket/websocket_client.h"
+#include "kaldi/feat/wave-reader.h"
+#include "kaldi/util/kaldi-io.h"
+#include "kaldi/util/table-types.h"
+
+DEFINE_string(host, "127.0.0.1", "host of websocket server");
+DEFINE_int32(port, 8082, "port of websocket server");
+DEFINE_string(wav_rspecifier, "", "test wav scp path");
+DEFINE_double(streaming_chunk, 0.1, "streaming feature chunk size");
+
+using kaldi::int16;
+int main(int argc, char* argv[]) {
+    gflags::ParseCommandLineFlags(&argc, &argv, false);
+    google::InitGoogleLogging(argv[0]);
+
+    kaldi::SequentialTableReader<kaldi::WaveHolder> wav_reader(
+        FLAGS_wav_rspecifier);
+
+    const int sample_rate = 16000;
+    const float streaming_chunk = FLAGS_streaming_chunk;
+    const int chunk_sample_size = streaming_chunk * sample_rate;
+
+    for (; !wav_reader.Done(); wav_reader.Next()) {
+        ppspeech::WebSocketClient client(FLAGS_host, FLAGS_port);
+
+        client.SendStartSignal();
+        std::string utt = wav_reader.Key();
+        const kaldi::WaveData& wave_data = wav_reader.Value();
+        CHECK_EQ(wave_data.SampFreq(), sample_rate);
+
+        int32 this_channel = 0;
+        kaldi::SubVector<kaldi::BaseFloat> waveform(wave_data.Data(),
+                                                    this_channel);
+        const int tot_samples = waveform.Dim();
+        int sample_offset = 0;
+
+        while (sample_offset < tot_samples) {
+            int cur_chunk_size =
+                std::min(chunk_sample_size, tot_samples - sample_offset);
+
+            std::vector<int16> wav_chunk(cur_chunk_size);
+            for (int i = 0; i < cur_chunk_size; ++i) {
+                wav_chunk[i] = static_cast<int16>(waveform(sample_offset + i));
+            }
+            client.SendBinaryData(wav_chunk.data(),
+                                  wav_chunk.size() * sizeof(int16));
+
+
+            sample_offset += cur_chunk_size;
+            LOG(INFO) << "Send " << cur_chunk_size << " samples";
+            std::this_thread::sleep_for(
+                std::chrono::milliseconds(static_cast<int>(1 * 1000)));
+
+            if (cur_chunk_size < chunk_sample_size) {
+                client.SendEndSignal();
+            }
+        }
+
+        while (!client.Done()) {
+        }
+        std::string result = client.GetResult();
+        LOG(INFO) << "utt: " << utt << " " << result;
+
+        client.Join();
+    }
+
+    return 0;
+}
--- a/speechx/examples/ds2_ol/websocket/websocket_server.sh
+++ b/speechx/examples/ds2_ol/websocket/websocket_server.sh
+#!/bin/bash
+set +x
+set -e
+
+. path.sh
+
+
+# 1. compile
+if [ ! -d ${SPEECHX_EXAMPLES} ]; then
+    pushd ${SPEECHX_ROOT} 
+    bash build.sh
+    popd
+fi
+
+# input
+mkdir -p data
+data=$PWD/data
+ckpt_dir=$data/model
+model_dir=$ckpt_dir/exp/deepspeech2_online/checkpoints/
+vocb_dir=$ckpt_dir/data/lang_char/
+
+if [ ! -f $ckpt_dir/data/mean_std.json ]; then
+        mkdir -p $ckpt_dir
+        pushd $ckpt_dir
+        wget -c https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz
+        tar xzfv asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz 
+        popd
+fi
+
+export GLOG_logtostderr=1
+
+# 3. gen cmvn 
+cmvn=$data/cmvn.ark
+cmvn-json2kaldi --json_file=$ckpt_dir/data/mean_std.json --cmvn_write_path=$cmvn
+
+
+wfst=$data/wfst/
+mkdir -p $wfst
+if [ ! -f $wfst/aishell_graph.zip ]; then
+    pushd $wfst
+    wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/aishell_graph.zip
+    unzip aishell_graph.zip
+    mv aishell_graph/* $wfst
+    popd
+fi
+
+# 5. test websocket server 
+websocket_server_main \
+    --cmvn_file=$cmvn \
+    --model_path=$model_dir/avg_1.jit.pdmodel \
+    --streaming_chunk=0.1 \
+    --convert2PCM32=true \
+    --param_path=$model_dir/avg_1.jit.pdiparams \
+    --word_symbol_table=$data/wfst/words.txt \
+    --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
+    --graph_path=$data/wfst/TLG.fst --max_active=7500 \
+    --acoustic_scale=1.2 
--- a/speechx/examples/ds2_ol/websocket/websocket_server_main.cc
+++ b/speechx/examples/ds2_ol/websocket/websocket_server_main.cc
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "websocket/websocket_server.h"
+#include "decoder/param.h"
+
+DEFINE_int32(port, 8082, "websocket listening port");
+
+int main(int argc, char *argv[]) {
+    gflags::ParseCommandLineFlags(&argc, &argv, false);
+    google::InitGoogleLogging(argv[0]);
+
+    ppspeech::RecognizerResource resource = ppspeech::InitRecognizerResoure();
+
+    ppspeech::WebSocketServer server(FLAGS_port, resource);
+    LOG(INFO) << "Listening at port " << FLAGS_port;
+    server.Start();
+    return 0;
+}
--- a/speechx/examples/ngram/zh/local/aishell_train_lms.sh
+++ b/speechx/examples/ngram/zh/local/aishell_train_lms.sh
@@ -3,6 +3,7 @@
 # To be run from one directory above this script.
 . ./path.sh

+nj=40
 text=data/local/lm/text
 lexicon=data/local/dict/lexicon.txt

@@ -31,21 +32,27 @@ cleantext=$dir/text.no_oov
 # oov to <SPOKEN_NOISE>
 # lexicon line: word char0 ... charn
 # text line: utt word0 ... wordn -> line: <SPOKEN_NOISE> word0 ... wordn
-cat $text | awk -v lex=$lexicon 'BEGIN{while((getline<lex) >0){ seen[$1]=1; } }
-  {for(n=1; n<=NF;n++) {  if (seen[$n]) { printf("%s ", $n); } else {printf("<SPOKEN_NOISE> ");} } printf("\n");}' \
-  > $cleantext || exit 1;
+text_dir=$(dirname $text)
+split_name=$(basename $text)
+./local/split_data.sh $text_dir $text $split_name $nj
+
+utils/run.pl JOB=1:$nj $text_dir/split${nj}/JOB/${split_name}.no_oov.log \
+  cat ${text_dir}/split${nj}/JOB/${split_name} \| awk -v lex=$lexicon 'BEGIN{while((getline<lex) >0){ seen[$1]=1; } }
+    {for(n=1; n<=NF;n++) {  if (seen[$n]) { printf("%s ", $n); } else {printf("<SPOKEN_NOISE> ");} } printf("\n");}' \
+    \> ${text_dir}/split${nj}/JOB/${split_name}.no_oov || exit 1;
+cat ${text_dir}/split${nj}/*/${split_name}.no_oov  > $cleantext

 # compute word counts, sort in descending order
 # line: count word
-cat $cleantext | awk '{for(n=2;n<=NF;n++) print $n; }' | sort | uniq -c | \
-   sort -nr > $dir/word.counts || exit 1;
+cat $cleantext | awk '{for(n=2;n<=NF;n++) print $n; }' | sort --parallel=`nproc` | uniq -c | \
+   sort --parallel=`nproc` -nr > $dir/word.counts || exit 1;

 # Get counts from acoustic training transcripts, and add  one-count
 # for each word in the lexicon (but not silence, we don't want it
 # in the LM-- we'll add it optionally later).
 cat $cleantext | awk '{for(n=2;n<=NF;n++) print $n; }' | \
  cat - <(grep -w -v '!SIL' $lexicon | awk '{print $1}') | \
-   sort | uniq -c | sort -nr > $dir/unigram.counts || exit 1;
+   sort --parallel=`nproc` | uniq -c | sort --parallel=`nproc` -nr > $dir/unigram.counts || exit 1;

 # word with <s> </s>
 cat $dir/unigram.counts | awk '{print $2}' | cat - <(echo "<s>"; echo "</s>" ) > $dir/wordlist

--- a/speechx/examples/ngram/zh/local/split_data.sh
+++ b/speechx/examples/ngram/zh/local/split_data.sh
+#!/usr/bin/env bash
+
+set -eo pipefail
+
+data=$1
+scp=$2
+split_name=$3
+numsplit=$4
+
+# save in $data/split{n}
+# $scp to split
+# 
+
+if [[ ! $numsplit -gt 0 ]]; then
+  echo "Invalid num-split argument";
+  exit 1;
+fi
+
+directories=$(for n in `seq $numsplit`; do echo $data/split${numsplit}/$n; done)
+scp_splits=$(for n in `seq $numsplit`; do echo $data/split${numsplit}/$n/${split_name}; done)
+
+# if this mkdir fails due to argument-list being too long, iterate.
+if ! mkdir -p $directories >&/dev/null; then
+  for n in `seq $numsplit`; do
+    mkdir -p $data/split${numsplit}/$n
+  done
+fi
+
+echo "utils/split_scp.pl $scp $scp_splits"
+utils/split_scp.pl $scp $scp_splits
--- a/speechx/examples/ngram/zh/local/text_to_lexicon.py
+++ b/speechx/examples/ngram/zh/local/text_to_lexicon.py
@@ -2,6 +2,7 @@
 import argparse
 from collections import Counter

+
 def main(args):
    counter = Counter()
    with open(args.text, 'r') as fin, open(args.lexicon, 'w') as fout:
@@ -12,7 +13,7 @@ def main(args):
                words = text.split()
            else:
                words = line.split()
-            
+
            counter.update(words)

        for word in counter:
@@ -20,21 +21,16 @@ def main(args):
            fout.write(f"{word}\t{val}\n")
            fout.flush()

+
 if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description='text(line:utt1 中国 人) to lexicon（line:中国 中 国).')
    parser.add_argument(
-        '--has_key',
-        default=True,
-        help='text path, with utt or not')
+        '--has_key', default=True, help='text path, with utt or not')
    parser.add_argument(
-        '--text',
-        required=True,
-        help='text path. line: utt1 中国 人 or 中国 人')
+        '--text', required=True, help='text path. line: utt1 中国 人 or 中国 人')
    parser.add_argument(
-        '--lexicon',
-        required=True,
-        help='lexicon path. line:中国 中 国')
+        '--lexicon', required=True, help='lexicon path. line:中国 中 国')
    args = parser.parse_args()
    print(args)


--- a/speechx/examples/text_lm/README.md
+++ b/speechx/examples/text_lm/README.md
 # Text PreProcess for building ngram LM

-Output `text` file like this:
+## Input
+
+```
+data/
+|-- text
+```
+
+Input file is kaldi-style, which has `utt` at first column: 
+```
+Y0000000000_--5llN02F84_S00000  怎么样这些日子住得还习惯吧
+Y0000000000_--5llN02F84_S00002  挺好的
+Y0000000000_--5llN02F84_S00003  对了美静这段日子经常不和我们一起用餐
+Y0000000000_--5llN02F84_S00004  是不是对我回来有什么想法啊
+Y0000000000_--5llN02F84_S00005  哪有的事啊
+Y0000000000_--5llN02F84_S00006  她这两天挺累的身体也不太舒服
+Y0000000000_--5llN02F84_S00007  我让她多睡一会那就好如果要是觉得不方便
+Y0000000000_--5llN02F84_S00009  我就搬出去住
+Y0000000000_--5llN02F84_S00010  你看你这个人你就是疑心太重
+Y0000000000_--5llN02F84_S00011  你现在多好一切都井然有序的
+```
+
+
+## Output
+
+```
+data/
+`-- text.tn
+```
+
+Output file like this:

 ```
 BAC009S0002W0122 而 对 楼市 成交 抑制 作用 最 大 的 限 购

--- a/speechx/examples/text_lm/local/data/chars.dic
+++ b/speechx/examples/text_lm/local/data/chars.dic
--- a/speechx/examples/text_lm/local/data/words.dic
+++ b/speechx/examples/text_lm/local/data/words.dic
--- a/speechx/examples/text_lm/local/mmseg.py
+++ b/speechx/examples/text_lm/local/mmseg.py
--- a/speechx/examples/wfst/README.md
+++ b/speechx/examples/wfst/README.md
--- a/speechx/speechx/CMakeLists.txt
+++ b/speechx/speechx/CMakeLists.txt
--- a/speechx/speechx/base/common.h
+++ b/speechx/speechx/base/common.h
--- a/speechx/speechx/decoder/CMakeLists.txt
+++ b/speechx/speechx/decoder/CMakeLists.txt
--- a/speechx/speechx/decoder/ctc_tlg_decoder.cc
+++ b/speechx/speechx/decoder/ctc_tlg_decoder.cc
--- a/speechx/speechx/decoder/param.h
+++ b/speechx/speechx/decoder/param.h
--- a/speechx/speechx/decoder/recognizer.cc
+++ b/speechx/speechx/decoder/recognizer.cc
--- a/speechx/speechx/decoder/recognizer.h
+++ b/speechx/speechx/decoder/recognizer.h
--- a/speechx/speechx/frontend/audio/CMakeLists.txt
+++ b/speechx/speechx/frontend/audio/CMakeLists.txt
--- a/speechx/speechx/frontend/audio/audio_cache.cc
+++ b/speechx/speechx/frontend/audio/audio_cache.cc
--- a/speechx/speechx/frontend/audio/audio_cache.h
+++ b/speechx/speechx/frontend/audio/audio_cache.h
--- a/speechx/speechx/frontend/audio/feature_cache.cc
+++ b/speechx/speechx/frontend/audio/feature_cache.cc
--- a/speechx/speechx/frontend/audio/feature_cache.h
+++ b/speechx/speechx/frontend/audio/feature_cache.h
--- a/speechx/speechx/frontend/audio/feature_pipeline.cc
+++ b/speechx/speechx/frontend/audio/feature_pipeline.cc
--- a/speechx/speechx/frontend/audio/feature_pipeline.h
+++ b/speechx/speechx/frontend/audio/feature_pipeline.h
--- a/speechx/speechx/frontend/audio/linear_spectrogram.cc
+++ b/speechx/speechx/frontend/audio/linear_spectrogram.cc
--- a/speechx/speechx/frontend/audio/linear_spectrogram.h
+++ b/speechx/speechx/frontend/audio/linear_spectrogram.h
--- a/speechx/speechx/nnet/decodable.cc
+++ b/speechx/speechx/nnet/decodable.cc
--- a/speechx/speechx/nnet/paddle_nnet.cc
+++ b/speechx/speechx/nnet/paddle_nnet.cc
--- a/speechx/speechx/nnet/paddle_nnet.h
+++ b/speechx/speechx/nnet/paddle_nnet.h
--- a/speechx/speechx/websocket/CMakeLists.txt
+++ b/speechx/speechx/websocket/CMakeLists.txt
--- a/speechx/speechx/websocket/websocket_client.cc
+++ b/speechx/speechx/websocket/websocket_client.cc
--- a/speechx/speechx/websocket/websocket_client.h
+++ b/speechx/speechx/websocket/websocket_client.h
--- a/speechx/speechx/websocket/websocket_server.cc
+++ b/speechx/speechx/websocket/websocket_server.cc
--- a/speechx/speechx/websocket/websocket_server.h
+++ b/speechx/speechx/websocket/websocket_server.h
--- a/utils/DER.py
+++ b/utils/DER.py
--- a/utils/compute-wer.py
+++ b/utils/compute-wer.py
--- a/utils/format_rsl.py
+++ b/utils/format_rsl.py
--- a/utils/fst/prepare_dict.py
+++ b/utils/fst/prepare_dict.py
--- a/utils/generate_infer_yaml.py
+++ b/utils/generate_infer_yaml.py