Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
3456ae4a
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 2 年 前同步成功
通知
210
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
3456ae4a
编写于
4月 12, 2022
作者:
Y
Yang Zhou
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add log & rename LogFrameLikelihood
上级
1f23c4bd
变更
4
隐藏空白更改
内联
并排
Showing
4 changed file
with
18 addition
and
7 deletion
+18
-7
speechx/speechx/decoder/ctc_beam_search_decoder.cc
speechx/speechx/decoder/ctc_beam_search_decoder.cc
+1
-1
speechx/speechx/kaldi/decoder/decodable-itf.h
speechx/speechx/kaldi/decoder/decodable-itf.h
+1
-1
speechx/speechx/nnet/decodable.cc
speechx/speechx/nnet/decodable.cc
+9
-2
speechx/speechx/nnet/decodable.h
speechx/speechx/nnet/decodable.h
+7
-3
未找到文件。
speechx/speechx/decoder/ctc_beam_search_decoder.cc
浏览文件 @
3456ae4a
...
...
@@ -93,7 +93,7 @@ void CTCBeamSearch::AdvanceDecode(
vector
<
vector
<
BaseFloat
>>
likelihood
;
vector
<
BaseFloat
>
frame_prob
;
bool
flag
=
decodable
->
FrameL
ogL
ikelihood
(
num_frame_decoded_
,
&
frame_prob
);
decodable
->
FrameLikelihood
(
num_frame_decoded_
,
&
frame_prob
);
if
(
flag
==
false
)
break
;
likelihood
.
push_back
(
frame_prob
);
AdvanceDecoding
(
likelihood
);
...
...
speechx/speechx/kaldi/decoder/decodable-itf.h
浏览文件 @
3456ae4a
...
...
@@ -143,7 +143,7 @@ class DecodableInterface {
/// this is for compatibility with OpenFst).
virtual
int32
NumIndices
()
const
=
0
;
virtual
bool
FrameL
ogL
ikelihood
(
virtual
bool
FrameLikelihood
(
int32
frame
,
std
::
vector
<
kaldi
::
BaseFloat
>*
likelihood
)
=
0
;
...
...
speechx/speechx/nnet/decodable.cc
浏览文件 @
3456ae4a
...
...
@@ -49,11 +49,18 @@ bool Decodable::IsLastFrame(int32 frame) {
int32
Decodable
::
NumIndices
()
const
{
return
0
;
}
// the ilable(TokenId) of wfst(TLG) insert <eps>(id = 0) in front of Nnet prob id.
int32
Decodable
::
TokenId2NnetId
(
int32
token_id
)
{
return
token_id
-
1
;
}
BaseFloat
Decodable
::
LogLikelihood
(
int32
frame
,
int32
index
)
{
CHECK_LE
(
index
,
nnet_cache_
.
NumCols
());
CHECK_LE
(
frame
,
frames_ready_
);
int32
frame_idx
=
frame
-
frame_offset_
;
return
acoustic_scale_
*
std
::
log
(
nnet_cache_
(
frame_idx
,
index
-
1
)
+
// the nnet output is prob ranther than log prob
// the index - 1, because the ilabel
return
acoustic_scale_
*
std
::
log
(
nnet_cache_
(
frame_idx
,
TokenId2NnetId
(
index
))
+
std
::
numeric_limits
<
float
>::
min
());
}
...
...
@@ -81,7 +88,7 @@ bool Decodable::AdvanceChunk() {
return
true
;
}
bool
Decodable
::
FrameL
ogL
ikelihood
(
int32
frame
,
vector
<
BaseFloat
>*
likelihood
)
{
bool
Decodable
::
FrameLikelihood
(
int32
frame
,
vector
<
BaseFloat
>*
likelihood
)
{
std
::
vector
<
BaseFloat
>
result
;
if
(
EnsureFrameHaveComputed
(
frame
)
==
false
)
return
false
;
likelihood
->
resize
(
nnet_cache_
.
NumCols
());
...
...
speechx/speechx/nnet/decodable.h
浏览文件 @
3456ae4a
...
...
@@ -31,24 +31,28 @@ class Decodable : public kaldi::DecodableInterface {
virtual
kaldi
::
BaseFloat
LogLikelihood
(
int32
frame
,
int32
index
);
virtual
bool
IsLastFrame
(
int32
frame
);
virtual
int32
NumIndices
()
const
;
virtual
bool
FrameLogLikelihood
(
int32
frame
,
std
::
vector
<
kaldi
::
BaseFloat
>*
likelihood
);
// not logprob
virtual
bool
FrameLikelihood
(
int32
frame
,
std
::
vector
<
kaldi
::
BaseFloat
>*
likelihood
);
virtual
int32
NumFramesReady
()
const
;
// for offline test
void
Acceptlikelihood
(
const
kaldi
::
Matrix
<
kaldi
::
BaseFloat
>&
likelihood
);
void
Reset
();
bool
IsInputFinished
()
const
{
return
frontend_
->
IsFinished
();
}
bool
EnsureFrameHaveComputed
(
int32
frame
);
int32
TokenId2NnetId
(
int32
token_id
);
private:
bool
AdvanceChunk
();
std
::
shared_ptr
<
FrontendInterface
>
frontend_
;
std
::
shared_ptr
<
NnetInterface
>
nnet_
;
kaldi
::
Matrix
<
kaldi
::
BaseFloat
>
nnet_cache_
;
// the frame is nnet prob frame rather than audio feature frame
// nnet frame subsample the feature frame
// eg: 35 frame features output 8 frame inferences
int32
frame_offset_
;
int32
frames_ready_
;
// todo: feature frame mismatch with nnet inference frame
// eg: 35 frame features output 8 frame inferences
// so use subsampled_frame
int32
current_log_post_subsampled_offset_
;
int32
num_chunk_computed_
;
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录