Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
9fda521e
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 2 年 前同步成功
通知
210
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
9fda521e
编写于
6月 07, 2017
作者:
Y
Yibing Liu
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
improve external scorer
上级
b046e651
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
11 addition
and
13 deletion
+11
-13
decoder.py
decoder.py
+11
-13
未找到文件。
decoder.py
浏览文件 @
9fda521e
...
@@ -6,6 +6,7 @@ from itertools import groupby
...
@@ -6,6 +6,7 @@ from itertools import groupby
import
numpy
as
np
import
numpy
as
np
import
copy
import
copy
import
kenlm
import
kenlm
import
os
def
ctc_best_path_decode
(
probs_seq
,
vocabulary
):
def
ctc_best_path_decode
(
probs_seq
,
vocabulary
):
...
@@ -54,19 +55,16 @@ class Scorer(object):
...
@@ -54,19 +55,16 @@ class Scorer(object):
def
__init__
(
self
,
alpha
,
beta
,
model_path
):
def
__init__
(
self
,
alpha
,
beta
,
model_path
):
self
.
_alpha
=
alpha
self
.
_alpha
=
alpha
self
.
_beta
=
beta
self
.
_beta
=
beta
if
not
os
.
path
.
isfile
(
model_path
):
raise
IOError
(
"Invaid language model path: %s"
%
model_path
)
self
.
_language_model
=
kenlm
.
LanguageModel
(
model_path
)
self
.
_language_model
=
kenlm
.
LanguageModel
(
model_path
)
# language model scoring
# n-gram language model scoring
def
language_model_score
(
self
,
sentence
,
bos
=
True
,
eos
=
False
):
def
language_model_score
(
self
,
sentence
):
words
=
sentence
.
strip
().
split
(
' '
)
#log prob of last word
length
=
len
(
words
)
log_cond_prob
=
list
(
if
length
==
1
:
self
.
_language_model
.
full_scores
(
sentence
,
eos
=
False
))[
-
1
][
0
]
log_prob
=
self
.
_language_model
.
score
(
sentence
,
bos
,
eos
)
return
np
.
power
(
10
,
log_cond_prob
)
else
:
prefix_sent
=
' '
.
join
(
words
[
0
:
length
-
1
])
log_prob
=
self
.
_language_model
.
score
(
sentence
,
bos
,
eos
)
\
-
self
.
_language_model
.
score
(
prefix_sent
,
bos
,
eos
)
return
np
.
power
(
10
,
log_prob
)
# word insertion term
# word insertion term
def
word_count
(
self
,
sentence
):
def
word_count
(
self
,
sentence
):
...
@@ -74,8 +72,8 @@ class Scorer(object):
...
@@ -74,8 +72,8 @@ class Scorer(object):
return
len
(
words
)
return
len
(
words
)
# execute evaluation
# execute evaluation
def
evaluate
(
self
,
sentence
,
bos
=
True
,
eos
=
False
):
def
evaluate
(
self
,
sentence
):
lm
=
self
.
language_model_score
(
sentence
,
bos
,
eos
)
lm
=
self
.
language_model_score
(
sentence
)
word_cnt
=
self
.
word_count
(
sentence
)
word_cnt
=
self
.
word_count
(
sentence
)
score
=
np
.
power
(
lm
,
self
.
_alpha
)
\
score
=
np
.
power
(
lm
,
self
.
_alpha
)
\
*
np
.
power
(
word_cnt
,
self
.
_beta
)
*
np
.
power
(
word_cnt
,
self
.
_beta
)
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录