# Customized Auto Speech Recognition
## introduction
In some cases, we need to recognize the specific rare words with high accuracy. eg: address recognition in navigation apps. customized ASR can slove those issues.
this demo is customized for expense account, which need to recognize rare address.
* G with slot: 打车到 "address_slot"。
* this is address slot wfst, you can add the address which want to recognize.
* after replace operation, G = fstreplace(G_with_slot, address_slot), we will get the customized graph.
## Usage
### 1. Installation
install paddle:2.2.2 docker.
sudo nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2
sudo nvidia-docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
### 2. demo
* run websocket_server.sh. This script will download resources and libs, and launch the service.
bash websocket_server.sh
this script run in two steps:
1. download the resources.tar.gz, those direcotries will be found in resource directory.
model: acustic model
graph: the decoder graph (TLG.fst)
lib: some libs
bin: binary
data: audio and wav.scp
2. websocket_server_main launch the service.
some params:
port: the service port
graph_path: the decoder graph path
model_path: acustic model path
please refer other params in those files:
* In other terminal, run script websocket_client.sh, the client will send data and get the results.
bash websocket_client.sh
websocket_client_main will launch the client, the wav_scp is the wav set, port is the server service port.
* result:
In the log of client, you will see the message below:
0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90) the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
# 定制化语音识别演示
## 介绍
这个 demo 是打车报销单的场景识别,需要识别一些稀有的地名,可以通过如下操作实现。
* G with slot: 打车到 "address_slot"。
* 这是address slot wfst, 可以添加一些需要识别的地名.
* 通过replace 操作, G = fstreplace(G_with_slot, address_slot), 最终可以得到定制化的解码图。
## 使用方法
### 1. 配置环境
安装paddle:2.2.2 docker镜像。
sudo nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2
sudo nvidia-docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
### 2. 演示
* 运行如下命令,完成相关资源和库的下载和服务启动。
bash websocket_server.sh
1. 完成resource.tar.gz下载,解压后,会在resource中发现如下目录:
model: 声学模型
graph: 解码构图
lib: 相关库
bin: 运行程序
data: 语音数据
2. 通过websocket_server_main来启动服务。
* 在另一个终端中, 通过client发送数据,得到结果。运行如下命令:
bash websocket_client.sh
* 结果:
0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90) the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
export LD_LIBRARY_PATH=$PWD/resource/lib
export PATH=$PATH:$PWD/resource/bin
sudo nvidia-docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
set +x
set -e
. path.sh
# input
# output
export GLOG_logtostderr=1
# websocket client
websocket_client_main \
--wav_rspecifier=scp:$data/$wav_scp \
--streaming_chunk=0.36 \
set +x
set -e
export GLOG_logtostderr=1
. path.sh
#test websocket server
if [ ! -f $cmvn ]; then
wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/resource.tar.gz
tar xzfv resource.tar.gz
ln -s ./resource/data .
websocket_server_main \
--cmvn_file=$cmvn \
--streaming_chunk=0.1 \
--use_fbank=true \
--model_path=$model_dir/avg_10.jit.pdmodel \
--param_path=$model_dir/avg_10.jit.pdiparams \
--model_cache_shapes="5-1-2048,5-1-2048" \
--model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
--word_symbol_table=$graph_dir/words.txt \
--graph_path=$graph_dir/TLG.fst --max_active=7500 \
--port=8881 \
