Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
5bbe6e98
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 1 年 前同步成功
通知
206
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
5bbe6e98
编写于
9月 29, 2022
作者:
T
tianhao zhang
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
support u2pp cli and server, optimiz code of u2pp decode, test=asr
上级
d3e59375
变更
10
隐藏空白更改
内联
并排
Showing
10 changed file
with
83 addition
and
32 deletion
+83
-32
demos/streaming_asr_server/conf/application.yaml
demos/streaming_asr_server/conf/application.yaml
+1
-1
docs/source/released_model.md
docs/source/released_model.md
+1
-0
paddlespeech/cli/asr/infer.py
paddlespeech/cli/asr/infer.py
+2
-2
paddlespeech/resource/model_alias.py
paddlespeech/resource/model_alias.py
+2
-0
paddlespeech/resource/pretrained_models.py
paddlespeech/resource/pretrained_models.py
+40
-0
paddlespeech/s2t/exps/u2/bin/test_wav.py
paddlespeech/s2t/exps/u2/bin/test_wav.py
+1
-3
paddlespeech/s2t/exps/u2/model.py
paddlespeech/s2t/exps/u2/model.py
+1
-3
paddlespeech/s2t/models/u2/u2.py
paddlespeech/s2t/models/u2/u2.py
+15
-18
paddlespeech/server/conf/ws_conformer_application.yaml
paddlespeech/server/conf/ws_conformer_application.yaml
+1
-1
paddlespeech/server/engine/asr/online/python/asr_engine.py
paddlespeech/server/engine/asr/online/python/asr_engine.py
+19
-4
未找到文件。
demos/streaming_asr_server/conf/application.yaml
浏览文件 @
5bbe6e98
...
...
@@ -21,7 +21,7 @@ engine_list: ['asr_online']
################################### ASR #########################################
################### speech task: asr; engine_type: online #######################
asr_online
:
model_type
:
'
conformer_online_wenetspeech'
model_type
:
'
conformer_
u2pp_
online_wenetspeech'
am_model
:
# the pdmodel file of am static model [optional]
am_params
:
# the pdiparams file of am static model [optional]
lang
:
'
zh'
...
...
docs/source/released_model.md
浏览文件 @
5bbe6e98
...
...
@@ -9,6 +9,7 @@ Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER |
[
Ds2 Online Aishell ASR0 Model
](
https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_fbank161_ckpt_0.2.1.model.tar.gz
)
| Aishell Dataset | Char-based | 491 MB | 2 Conv + 5 LSTM layers | 0.0666 |-| 151 h |
[
D2 Online Aishell ASR0
](
../../examples/aishell/asr0
)
| onnx/inference/python |
[
Ds2 Offline Aishell ASR0 Model
](
https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_offline_aishell_ckpt_1.0.1.model.tar.gz
)
| Aishell Dataset | Char-based | 1.4 GB | 2 Conv + 5 bidirectional LSTM layers| 0.0554 |-| 151 h |
[
Ds2 Offline Aishell ASR0
](
../../examples/aishell/asr0
)
| inference/python |
[
Conformer Online Wenetspeech ASR1 Model
](
https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz
)
| WenetSpeech Dataset | Char-based | 457 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.11 (test
\_
net) 0.1879 (test
\_
meeting) |-| 10000 h |- | python |
[
Conformer U2PP Online Wenetspeech ASR1 Model
](
https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_u2pp_wenetspeech_ckpt_1.1.1.model.tar.gz
)
| WenetSpeech Dataset | Char-based | 476 MB | Encoder:Conformer, Decoder:BiTransformer, Decoding method: Attention rescoring| 0.047198 (aishell test
\_
-1) 0.059212 (aishell test
\_
16) |-| 10000 h |- | python |
[
Conformer Online Aishell ASR1 Model
](
https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_chunk_conformer_aishell_ckpt_0.2.0.model.tar.gz
)
| Aishell Dataset | Char-based | 189 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.0544 |-| 151 h |
[
Conformer Online Aishell ASR1
](
../../examples/aishell/asr1
)
| python |
[
Conformer Offline Aishell ASR1 Model
](
https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_conformer_aishell_ckpt_1.0.1.model.tar.gz
)
| Aishell Dataset | Char-based | 189 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0460 |-| 151 h |
[
Conformer Offline Aishell ASR1
](
../../examples/aishell/asr1
)
| python |
[
Transformer Aishell ASR1 Model
](
https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz
)
| Aishell Dataset | Char-based | 128 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0523 || 151 h |
[
Transformer Aishell ASR1
](
../../examples/aishell/asr1
)
| python |
...
...
paddlespeech/cli/asr/infer.py
浏览文件 @
5bbe6e98
...
...
@@ -51,7 +51,7 @@ class ASRExecutor(BaseExecutor):
self
.
parser
.
add_argument
(
'--model'
,
type
=
str
,
default
=
'conformer_wenetspeech'
,
default
=
'conformer_
u2pp_
wenetspeech'
,
choices
=
[
tag
[:
tag
.
index
(
'-'
)]
for
tag
in
self
.
task_resource
.
pretrained_models
.
keys
()
...
...
@@ -465,7 +465,7 @@ class ASRExecutor(BaseExecutor):
@
stats_wrapper
def
__call__
(
self
,
audio_file
:
os
.
PathLike
,
model
:
str
=
'conformer_wenetspeech'
,
model
:
str
=
'conformer_
u2pp_
wenetspeech'
,
lang
:
str
=
'zh'
,
sample_rate
:
int
=
16000
,
config
:
os
.
PathLike
=
None
,
...
...
paddlespeech/resource/model_alias.py
浏览文件 @
5bbe6e98
...
...
@@ -25,6 +25,8 @@ model_alias = {
"deepspeech2online"
:
[
"paddlespeech.s2t.models.ds2:DeepSpeech2Model"
],
"conformer"
:
[
"paddlespeech.s2t.models.u2:U2Model"
],
"conformer_online"
:
[
"paddlespeech.s2t.models.u2:U2Model"
],
"conformer_u2pp"
:
[
"paddlespeech.s2t.models.u2:U2Model"
],
"conformer_u2pp_online"
:
[
"paddlespeech.s2t.models.u2:U2Model"
],
"transformer"
:
[
"paddlespeech.s2t.models.u2:U2Model"
],
"wenetspeech"
:
[
"paddlespeech.s2t.models.u2:U2Model"
],
...
...
paddlespeech/resource/pretrained_models.py
浏览文件 @
5bbe6e98
...
...
@@ -68,6 +68,46 @@ asr_dynamic_pretrained_models = {
''
,
},
},
"conformer_u2pp_wenetspeech-zh-16k"
:
{
'1.0'
:
{
'url'
:
'https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_u2pp_wenetspeech_ckpt_1.1.1.model.tar.gz'
,
'md5'
:
'eae678c04ed3b3f89672052fdc0c5e10'
,
'cfg_path'
:
'model.yaml'
,
'ckpt_path'
:
'exp/chunk_conformer_u2pp/checkpoints/avg_10'
,
'model'
:
'exp/chunk_conformer_u2pp/checkpoints/avg_10.pdparams'
,
'params'
:
'exp/chunk_conformer_u2pp/checkpoints/avg_10.pdparams'
,
'lm_url'
:
''
,
'lm_md5'
:
''
,
},
},
"conformer_u2pp_online_wenetspeech-zh-16k"
:
{
'1.0'
:
{
'url'
:
'https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_u2pp_wenetspeech_ckpt_1.1.2.model.tar.gz'
,
'md5'
:
'925d047e9188dea7f421a718230c9ae3'
,
'cfg_path'
:
'model.yaml'
,
'ckpt_path'
:
'exp/chunk_conformer_u2pp/checkpoints/avg_10'
,
'model'
:
'exp/chunk_conformer_u2pp/checkpoints/avg_10.pdparams'
,
'params'
:
'exp/chunk_conformer_u2pp/checkpoints/avg_10.pdparams'
,
'lm_url'
:
''
,
'lm_md5'
:
''
,
},
},
"conformer_online_multicn-zh-16k"
:
{
'1.0'
:
{
'url'
:
...
...
paddlespeech/s2t/exps/u2/bin/test_wav.py
浏览文件 @
5bbe6e98
...
...
@@ -40,7 +40,6 @@ class U2Infer():
self
.
preprocess_conf
=
config
.
preprocess_config
self
.
preprocess_args
=
{
"train"
:
False
}
self
.
preprocessing
=
Transformation
(
self
.
preprocess_conf
)
self
.
reverse_weight
=
getattr
(
config
.
model_conf
,
'reverse_weight'
,
0.0
)
self
.
text_feature
=
TextFeaturizer
(
unit_type
=
config
.
unit_type
,
vocab
=
config
.
vocab_filepath
,
...
...
@@ -89,8 +88,7 @@ class U2Infer():
ctc_weight
=
decode_config
.
ctc_weight
,
decoding_chunk_size
=
decode_config
.
decoding_chunk_size
,
num_decoding_left_chunks
=
decode_config
.
num_decoding_left_chunks
,
simulate_streaming
=
decode_config
.
simulate_streaming
,
reverse_weight
=
self
.
reverse_weight
)
simulate_streaming
=
decode_config
.
simulate_streaming
)
rsl
=
result_transcripts
[
0
][
0
]
utt
=
Path
(
self
.
audio_file
).
name
logger
.
info
(
f
"hyp:
{
utt
}
{
result_transcripts
[
0
][
0
]
}
"
)
...
...
paddlespeech/s2t/exps/u2/model.py
浏览文件 @
5bbe6e98
...
...
@@ -316,7 +316,6 @@ class U2Tester(U2Trainer):
vocab
=
self
.
config
.
vocab_filepath
,
spm_model_prefix
=
self
.
config
.
spm_model_prefix
)
self
.
vocab_list
=
self
.
text_feature
.
vocab_list
self
.
reverse_weight
=
getattr
(
config
.
model_conf
,
'reverse_weight'
,
0.0
)
def
id2token
(
self
,
texts
,
texts_len
,
text_feature
):
""" ord() id to chr() chr """
...
...
@@ -351,8 +350,7 @@ class U2Tester(U2Trainer):
ctc_weight
=
decode_config
.
ctc_weight
,
decoding_chunk_size
=
decode_config
.
decoding_chunk_size
,
num_decoding_left_chunks
=
decode_config
.
num_decoding_left_chunks
,
simulate_streaming
=
decode_config
.
simulate_streaming
,
reverse_weight
=
self
.
reverse_weight
)
simulate_streaming
=
decode_config
.
simulate_streaming
)
decode_time
=
time
.
time
()
-
start_time
for
utt
,
target
,
result
,
rec_tids
in
zip
(
...
...
paddlespeech/s2t/models/u2/u2.py
浏览文件 @
5bbe6e98
...
...
@@ -507,16 +507,14 @@ class U2BaseModel(ASRInterface, nn.Layer):
num_decoding_left_chunks
,
simulate_streaming
)
return
hyps
[
0
][
0
]
def
attention_rescoring
(
self
,
speech
:
paddle
.
Tensor
,
speech_lengths
:
paddle
.
Tensor
,
beam_size
:
int
,
decoding_chunk_size
:
int
=-
1
,
num_decoding_left_chunks
:
int
=-
1
,
ctc_weight
:
float
=
0.0
,
simulate_streaming
:
bool
=
False
,
reverse_weight
:
float
=
0.0
,
)
->
List
[
int
]:
def
attention_rescoring
(
self
,
speech
:
paddle
.
Tensor
,
speech_lengths
:
paddle
.
Tensor
,
beam_size
:
int
,
decoding_chunk_size
:
int
=-
1
,
num_decoding_left_chunks
:
int
=-
1
,
ctc_weight
:
float
=
0.0
,
simulate_streaming
:
bool
=
False
)
->
List
[
int
]:
""" Apply attention rescoring decoding, CTC prefix beam search
is applied first to get nbest, then we resoring the nbest on
attention decoder with corresponding encoder out
...
...
@@ -536,7 +534,7 @@ class U2BaseModel(ASRInterface, nn.Layer):
"""
assert
speech
.
shape
[
0
]
==
speech_lengths
.
shape
[
0
]
assert
decoding_chunk_size
!=
0
if
reverse_weight
>
0.0
:
if
self
.
reverse_weight
>
0.0
:
# decoder should be a bitransformer decoder if reverse_weight > 0.0
assert
hasattr
(
self
.
decoder
,
'right_decoder'
)
device
=
speech
.
place
...
...
@@ -574,7 +572,7 @@ class U2BaseModel(ASRInterface, nn.Layer):
self
.
eos
)
decoder_out
,
r_decoder_out
,
_
=
self
.
decoder
(
encoder_out
,
encoder_mask
,
hyps_pad
,
hyps_lens
,
r_hyps_pad
,
reverse_weight
)
# (beam_size, max_hyps_len, vocab_size)
self
.
reverse_weight
)
# (beam_size, max_hyps_len, vocab_size)
# ctc score in ln domain
decoder_out
=
paddle
.
nn
.
functional
.
log_softmax
(
decoder_out
,
axis
=-
1
)
decoder_out
=
decoder_out
.
numpy
()
...
...
@@ -594,12 +592,13 @@ class U2BaseModel(ASRInterface, nn.Layer):
score
+=
decoder_out
[
i
][
j
][
w
]
# last decoder output token is `eos`, for laste decoder input token.
score
+=
decoder_out
[
i
][
len
(
hyp
[
0
])][
self
.
eos
]
if
reverse_weight
>
0
:
if
self
.
reverse_weight
>
0
:
r_score
=
0.0
for
j
,
w
in
enumerate
(
hyp
[
0
]):
r_score
+=
r_decoder_out
[
i
][
len
(
hyp
[
0
])
-
j
-
1
][
w
]
r_score
+=
r_decoder_out
[
i
][
len
(
hyp
[
0
])][
self
.
eos
]
score
=
score
*
(
1
-
reverse_weight
)
+
r_score
*
reverse_weight
score
=
score
*
(
1
-
self
.
reverse_weight
)
+
r_score
*
self
.
reverse_weight
# add ctc score (which in ln domain)
score
+=
hyp
[
1
]
*
ctc_weight
if
score
>
best_score
:
...
...
@@ -748,8 +747,7 @@ class U2BaseModel(ASRInterface, nn.Layer):
ctc_weight
:
float
=
0.0
,
decoding_chunk_size
:
int
=-
1
,
num_decoding_left_chunks
:
int
=-
1
,
simulate_streaming
:
bool
=
False
,
reverse_weight
:
float
=
0.0
):
simulate_streaming
:
bool
=
False
):
"""u2 decoding.
Args:
...
...
@@ -821,8 +819,7 @@ class U2BaseModel(ASRInterface, nn.Layer):
decoding_chunk_size
=
decoding_chunk_size
,
num_decoding_left_chunks
=
num_decoding_left_chunks
,
ctc_weight
=
ctc_weight
,
simulate_streaming
=
simulate_streaming
,
reverse_weight
=
reverse_weight
)
simulate_streaming
=
simulate_streaming
)
hyps
=
[
hyp
]
else
:
raise
ValueError
(
f
"Not support decoding method:
{
decoding_method
}
"
)
...
...
paddlespeech/server/conf/ws_conformer_application.yaml
浏览文件 @
5bbe6e98
...
...
@@ -30,7 +30,7 @@ asr_online:
decode_method
:
num_decoding_left_chunks
:
-1
force_yes
:
True
device
:
# cpu or gpu:id
device
:
gpu
# cpu or gpu:id
continuous_decoding
:
True
# enable continue decoding when endpoint detected
am_predictor_conf
:
...
...
paddlespeech/server/engine/asr/online/python/asr_engine.py
浏览文件 @
5bbe6e98
...
...
@@ -22,6 +22,7 @@ from numpy import float32
from
yacs.config
import
CfgNode
from
paddlespeech.audio.transform.transformation
import
Transformation
from
paddlespeech.audio.utils.tensor_utils
import
st_reverse_pad_list
from
paddlespeech.cli.asr.infer
import
ASRExecutor
from
paddlespeech.cli.log
import
logger
from
paddlespeech.resource
import
CommonTaskResource
...
...
@@ -603,24 +604,31 @@ class PaddleASRConnectionHanddler:
hyps_pad
=
pad_sequence
(
hyp_list
,
batch_first
=
True
,
padding_value
=
self
.
model
.
ignore_id
)
ori_hyps_pad
=
hyps_pad
hyps_lens
=
paddle
.
to_tensor
(
[
len
(
hyp
[
0
])
for
hyp
in
hyps
],
place
=
self
.
device
,
dtype
=
paddle
.
long
)
# (beam_size,)
hyps_pad
,
_
=
add_sos_eos
(
hyps_pad
,
self
.
model
.
sos
,
self
.
model
.
eos
,
self
.
model
.
ignore_id
)
hyps_lens
=
hyps_lens
+
1
# Add <sos> at begining
encoder_out
=
self
.
encoder_out
.
repeat
(
beam_size
,
1
,
1
)
encoder_mask
=
paddle
.
ones
(
(
beam_size
,
1
,
encoder_out
.
shape
[
1
]),
dtype
=
paddle
.
bool
)
decoder_out
,
_
,
_
=
self
.
model
.
decoder
(
encoder_out
,
encoder_mask
,
hyps_pad
,
hyps_lens
)
# (beam_size, max_hyps_len, vocab_size)
r_hyps_pad
=
st_reverse_pad_list
(
ori_hyps_pad
,
hyps_lens
-
1
,
self
.
model
.
sos
,
self
.
model
.
eos
)
decoder_out
,
r_decoder_out
,
_
=
self
.
model
.
decoder
(
encoder_out
,
encoder_mask
,
hyps_pad
,
hyps_lens
,
r_hyps_pad
,
self
.
model
.
reverse_weight
)
# (beam_size, max_hyps_len, vocab_size)
# ctc score in ln domain
decoder_out
=
paddle
.
nn
.
functional
.
log_softmax
(
decoder_out
,
axis
=-
1
)
decoder_out
=
decoder_out
.
numpy
()
# r_decoder_out will be 0.0, if reverse_weight is 0.0 or decoder is a
# conventional transformer decoder.
r_decoder_out
=
paddle
.
nn
.
functional
.
log_softmax
(
r_decoder_out
,
axis
=-
1
)
r_decoder_out
=
r_decoder_out
.
numpy
()
# Only use decoder score for rescoring
best_score
=
-
float
(
'inf'
)
best_index
=
0
...
...
@@ -632,6 +640,13 @@ class PaddleASRConnectionHanddler:
# last decoder output token is `eos`, for laste decoder input token.
score
+=
decoder_out
[
i
][
len
(
hyp
[
0
])][
self
.
model
.
eos
]
if
self
.
model
.
reverse_weight
>
0
:
r_score
=
0.0
for
j
,
w
in
enumerate
(
hyp
[
0
]):
r_score
+=
r_decoder_out
[
i
][
len
(
hyp
[
0
])
-
j
-
1
][
w
]
r_score
+=
r_decoder_out
[
i
][
len
(
hyp
[
0
])][
self
.
model
.
eos
]
score
=
score
*
(
1
-
self
.
model
.
reverse_weight
)
+
r_score
*
self
.
model
.
reverse_weight
# add ctc score (which in ln domain)
score
+=
hyp
[
1
]
*
self
.
ctc_decode_config
.
ctc_weight
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录