提交 81ae5ffd 编写于 作者: Y Yang Zhou

add readme

上级 a5f52d6d
......@@ -3,14 +3,60 @@
# Customized Auto Speech Recognition
## introduction
In some cases, we need to recognize the specific sentence with high accuracy. eg: customized keyword spotting, address recognition in navigation apps . customized ASR can slove those issues.
In some cases, we need to recognize the specific rare words with high accuracy. eg: address recognition in navigation apps. customized ASR can slove those issues.
this demo is customized for expense account of taxi, which need to recognize rare address.
this demo is customized for expense account, which need to recognize rare address.
* G with slot: 打车到 "address_slot"。
![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4)
* this is address slot wfst, you can add the address which want to recognize.
![](https://ai-studio-static-online.cdn.bcebos.com/47c89100ef8c465bac733605ffc53d76abefba33d62f4d818d351f8cea3c8fe2)
* after replace operation, G = fstreplace(G_with_slot, address_slot), we will get the customized graph.
![](https://ai-studio-static-online.cdn.bcebos.com/60a3095293044f10b73039ab10c7950d139a6717580a44a3ba878c6e74de402b)
## Usage
### 1. Installation
Install docker by runing script setup_docker.sh. And then, install tmux (apt-get install tmux).
install paddle:2.2.2 docker.
```
sudo nvidia-docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
```
### 2. demo
* bash websocket_server.sh. This script will download resources and libs, and then setup the server.
* In the other terminal of docker, run script websocket_client.sh, the client will send data and get the results.
\ No newline at end of file
* run websocket_server.sh. This script will download resources and libs, and launch the service.
```
bash websocket_server.sh
```
this script run in two steps:
1. download the resources.tar.gz, those direcotries will be found in resource directory.
model: acustic model
graph: the decoder graph (TLG.fst)
lib: some libs
bin: binary
data: audio and wav.scp
2. websocket_server_main launch the service.
some params:
port: the service port
graph_path: the decoder graph path
model_path: acustic model path
please refer other params in those files:
PaddleSpeech/speechx/speechx/decoder/param.h
PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc
* In other terminal, run script websocket_client.sh, the client will send data and get the results.
```
bash websocket_client.sh
```
websocket_client_main will launch the client, the wav_scp is the wav set, port is the server service port.
* result:
In the log of client, you will see the message below:
```
0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90) the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
```
\ No newline at end of file
(简体中文|[English](./README.md)
(简体中文|[English](./README.md))
# 定制化语音识别演示
## 介绍
定制化的语音识别是满足一些特定场景的语句识别的技术。
可以参见简单的教程:
可以参见简单的原理教程:
https://aistudio.baidu.com/aistudio/projectdetail/3986429
这个 demo 是打车报销单的场景识别,定制化了地点。
## 使用方法
### 1. 配置环境
请通过 setup_docker.sh 安装镜像。进入镜像后,安装tmux (apt-get install tmux),方便后续演示。
安装paddle:2.2.2 docker镜像。
```
sudo nvidia-docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
```
### 2. 演示
* bash websocket_server.sh, 完成相关资源和库的下载。这时候服务已经启动。
* 在镜像另一个终端中,bash websocket_client.sh, 通过client发送数据,得到结果。
* 运行如下命令,完成相关资源和库的下载和服务启动。
```
bash websocket_server.sh
```
上面脚本完成了如下两个功能:
1. 完成resource.tar.gz下载,解压后,会在resource中发现如下目录:
model: 声学模型
graph: 解码构图
lib: 相关库
bin: 运行程序
data: 语音数据
2. 通过websocket_server_main来启动服务。
这里简单的介绍几个参数:
port是服务端口,
graph_path用来指定解码图文件,
model相关参数用来指定声学模型文件。
其他参数说明可参见代码:
PaddleSpeech/speechx/speechx/decoder/param.h
PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc
* 在另一个终端中, 通过client发送数据,得到结果。运行如下命令:
```
bash websocket_client.sh
```
通过websocket_client_main来启动client服务,其中$wav_scp是发送的语音句子集合,port为服务端口。
* 结果:
client的log中可以看到如下类似的结果
```
0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90) the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
```
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册