Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
c1198106
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
11 个月 前同步成功
通知
203
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
未验证
提交
c1198106
编写于
5月 13, 2022
作者:
H
Hui Zhang
提交者:
GitHub
5月 13, 2022
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #1891 from SmileGoat/add_demos
[speechx] add custom_streaming_asr
上级
8ed8c9c1
8126ae72
变更
6
隐藏空白更改
内联
并排
Showing
6 changed file
with
181 addition
and
0 deletion
+181
-0
demos/custom_streaming_asr/README.md
demos/custom_streaming_asr/README.md
+64
-0
demos/custom_streaming_asr/README_cn.md
demos/custom_streaming_asr/README_cn.md
+63
-0
demos/custom_streaming_asr/path.sh
demos/custom_streaming_asr/path.sh
+2
-0
demos/custom_streaming_asr/setup_docker.sh
demos/custom_streaming_asr/setup_docker.sh
+1
-0
demos/custom_streaming_asr/websocket_client.sh
demos/custom_streaming_asr/websocket_client.sh
+18
-0
demos/custom_streaming_asr/websocket_server.sh
demos/custom_streaming_asr/websocket_server.sh
+33
-0
未找到文件。
demos/custom_streaming_asr/README.md
0 → 100644
浏览文件 @
c1198106
(
[
简体中文
](
./README_cn.md
)
|English)
# Customized Auto Speech Recognition
## introduction
In some cases, we need to recognize the specific rare words with high accuracy. eg: address recognition in navigation apps. customized ASR can slove those issues.
this demo is customized for expense account, which need to recognize rare address.
*
G with slot: 打车到 "address_slot"。
![](
https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4
)
*
this is address slot wfst, you can add the address which want to recognize.
![](
https://ai-studio-static-online.cdn.bcebos.com/47c89100ef8c465bac733605ffc53d76abefba33d62f4d818d351f8cea3c8fe2
)
*
after replace operation, G = fstreplace(G_with_slot, address_slot), we will get the customized graph.
![](
https://ai-studio-static-online.cdn.bcebos.com/60a3095293044f10b73039ab10c7950d139a6717580a44a3ba878c6e74de402b
)
## Usage
### 1. Installation
install paddle:2.2.2 docker.
```
sudo nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2
sudo nvidia-docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
```
### 2. demo
*
run websocket_server.sh. This script will download resources and libs, and launch the service.
```
bash websocket_server.sh
```
this script run in two steps:
1.
download the resources.tar.gz, those direcotries will be found in resource directory.
model: acustic model
graph: the decoder graph (TLG.fst)
lib: some libs
bin: binary
data: audio and wav.scp
2.
websocket_server_main launch the service.
some params:
port: the service port
graph_path: the decoder graph path
model_path: acustic model path
please refer other params in those files:
PaddleSpeech/speechx/speechx/decoder/param.h
PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc
*
In other terminal, run script websocket_client.sh, the client will send data and get the results.
```
bash websocket_client.sh
```
websocket_client_main will launch the client, the wav_scp is the wav set, port is the server service port.
*
result:
In the log of client, you will see the message below:
```
0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90) the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
```
\ No newline at end of file
demos/custom_streaming_asr/README_cn.md
0 → 100644
浏览文件 @
c1198106
(简体中文|
[
English
](
./README.md
)
)
# 定制化语音识别演示
## 介绍
在一些场景中,识别系统需要高精度的识别一些稀有词,例如导航软件中地名识别。而通过定制化识别可以满足这一需求。
这个 demo 是打车报销单的场景识别,需要识别一些稀有的地名,可以通过如下操作实现。
*
G with slot: 打车到 "address_slot"。
![](
https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4
)
*
这是address slot wfst, 可以添加一些需要识别的地名.
![](
https://ai-studio-static-online.cdn.bcebos.com/47c89100ef8c465bac733605ffc53d76abefba33d62f4d818d351f8cea3c8fe2
)
*
通过replace 操作, G = fstreplace(G_with_slot, address_slot), 最终可以得到定制化的解码图。
![](
https://ai-studio-static-online.cdn.bcebos.com/60a3095293044f10b73039ab10c7950d139a6717580a44a3ba878c6e74de402b
)
## 使用方法
### 1. 配置环境
安装paddle:2.2.2 docker镜像。
```
sudo nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2
sudo nvidia-docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
```
### 2. 演示
*
运行如下命令,完成相关资源和库的下载和服务启动。
```
bash websocket_server.sh
```
上面脚本完成了如下两个功能:
1.
完成resource.tar.gz下载,解压后,会在resource中发现如下目录:
model: 声学模型
graph: 解码构图
lib: 相关库
bin: 运行程序
data: 语音数据
2.
通过websocket_server_main来启动服务。
这里简单的介绍几个参数:
port是服务端口,
graph_path用来指定解码图文件,
model相关参数用来指定声学模型文件。
其他参数说明可参见代码:
PaddleSpeech/speechx/speechx/decoder/param.h
PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc
*
在另一个终端中, 通过client发送数据,得到结果。运行如下命令:
```
bash websocket_client.sh
```
通过websocket_client_main来启动client服务,其中$wav_scp是发送的语音句子集合,port为服务端口。
*
结果:
client的log中可以看到如下类似的结果
```
0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90) the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
```
demos/custom_streaming_asr/path.sh
0 → 100644
浏览文件 @
c1198106
export
LD_LIBRARY_PATH
=
$PWD
/resource/lib
export
PATH
=
$PATH
:
$PWD
/resource/bin
demos/custom_streaming_asr/setup_docker.sh
0 → 100644
浏览文件 @
c1198106
sudo
nvidia-docker run
--privileged
--net
=
host
--ipc
=
host
-it
--rm
-v
$PWD
:/paddle
--name
=
paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
demos/custom_streaming_asr/websocket_client.sh
0 → 100755
浏览文件 @
c1198106
#!/bin/bash
set
+x
set
-e
.
path.sh
# input
data
=
$PWD
/data
# output
wav_scp
=
wav.scp
export
GLOG_logtostderr
=
1
# websocket client
websocket_client_main
\
--wav_rspecifier
=
scp:
$data
/
$wav_scp
\
--streaming_chunk
=
0.36
\
--port
=
8881
demos/custom_streaming_asr/websocket_server.sh
0 → 100755
浏览文件 @
c1198106
#!/bin/bash
set
+x
set
-e
export
GLOG_logtostderr
=
1
.
path.sh
#test websocket server
model_dir
=
./resource/model
graph_dir
=
./resource/graph
cmvn
=
./data/cmvn.ark
#paddle_asr_online/resource.tar.gz
if
[
!
-f
$cmvn
]
;
then
wget
-c
https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/resource.tar.gz
tar
xzfv resource.tar.gz
ln
-s
./resource/data
.
fi
websocket_server_main
\
--cmvn_file
=
$cmvn
\
--streaming_chunk
=
0.1
\
--use_fbank
=
true
\
--model_path
=
$model_dir
/avg_10.jit.pdmodel
\
--param_path
=
$model_dir
/avg_10.jit.pdiparams
\
--model_cache_shapes
=
"5-1-2048,5-1-2048"
\
--model_output_names
=
softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0
\
--word_symbol_table
=
$graph_dir
/words.txt
\
--graph_path
=
$graph_dir
/TLG.fst
--max_active
=
7500
\
--port
=
8881
\
--acoustic_scale
=
12
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录