Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
78968af6
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
1 年多 前同步成功
通知
207
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
78968af6
编写于
12月 02, 2017
作者:
Y
Yibing Liu
提交者:
GitHub
12月 02, 2017
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #47 from pkuyym/fix-46
Expose edit distance for error_rate.py
上级
fe1501cc
0f9b3ebf
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
65 addition
and
23 deletion
+65
-23
utils/error_rate.py
utils/error_rate.py
+65
-23
未找到文件。
utils/error_rate.py
浏览文件 @
78968af6
...
...
@@ -56,6 +56,62 @@ def _levenshtein_distance(ref, hyp):
return
distance
[
m
%
2
][
n
]
def
word_errors
(
reference
,
hypothesis
,
ignore_case
=
False
,
delimiter
=
' '
):
"""Compute the levenshtein distance between reference sequence and
hypothesis sequence in word-level.
:param reference: The reference sentence.
:type reference: basestring
:param hypothesis: The hypothesis sentence.
:type hypothesis: basestring
:param ignore_case: Whether case-sensitive or not.
:type ignore_case: bool
:param delimiter: Delimiter of input sentences.
:type delimiter: char
:return: Levenshtein distance and word number of reference sentence.
:rtype: list
"""
if
ignore_case
==
True
:
reference
=
reference
.
lower
()
hypothesis
=
hypothesis
.
lower
()
ref_words
=
filter
(
None
,
reference
.
split
(
delimiter
))
hyp_words
=
filter
(
None
,
hypothesis
.
split
(
delimiter
))
edit_distance
=
_levenshtein_distance
(
ref_words
,
hyp_words
)
return
float
(
edit_distance
),
len
(
ref_words
)
def
char_errors
(
reference
,
hypothesis
,
ignore_case
=
False
,
remove_space
=
False
):
"""Compute the levenshtein distance between reference sequence and
hypothesis sequence in char-level.
:param reference: The reference sentence.
:type reference: basestring
:param hypothesis: The hypothesis sentence.
:type hypothesis: basestring
:param ignore_case: Whether case-sensitive or not.
:type ignore_case: bool
:param remove_space: Whether remove internal space characters
:type remove_space: bool
:return: Levenshtein distance and length of reference sentence.
:rtype: list
"""
if
ignore_case
==
True
:
reference
=
reference
.
lower
()
hypothesis
=
hypothesis
.
lower
()
join_char
=
' '
if
remove_space
==
True
:
join_char
=
''
reference
=
join_char
.
join
(
filter
(
None
,
reference
.
split
(
' '
)))
hypothesis
=
join_char
.
join
(
filter
(
None
,
hypothesis
.
split
(
' '
)))
edit_distance
=
_levenshtein_distance
(
reference
,
hypothesis
)
return
float
(
edit_distance
),
len
(
reference
)
def
wer
(
reference
,
hypothesis
,
ignore_case
=
False
,
delimiter
=
' '
):
"""Calculate word error rate (WER). WER compares reference text and
hypothesis text in word-level. WER is defined as:
...
...
@@ -85,20 +141,15 @@ def wer(reference, hypothesis, ignore_case=False, delimiter=' '):
:type delimiter: char
:return: Word error rate.
:rtype: float
:raises ValueError: If
the reference length
is zero.
:raises ValueError: If
word number of reference
is zero.
"""
if
ignore_case
==
True
:
reference
=
reference
.
lower
()
hypothesis
=
hypothesis
.
lower
()
edit_distance
,
ref_len
=
word_errors
(
reference
,
hypothesis
,
ignore_case
,
delimiter
)
ref_words
=
filter
(
None
,
reference
.
split
(
delimiter
))
hyp_words
=
filter
(
None
,
hypothesis
.
split
(
delimiter
))
if
len
(
ref_words
)
==
0
:
if
ref_len
==
0
:
raise
ValueError
(
"Reference's word number should be greater than 0."
)
edit_distance
=
_levenshtein_distance
(
ref_words
,
hyp_words
)
wer
=
float
(
edit_distance
)
/
len
(
ref_words
)
wer
=
float
(
edit_distance
)
/
ref_len
return
wer
...
...
@@ -135,20 +186,11 @@ def cer(reference, hypothesis, ignore_case=False, remove_space=False):
:rtype: float
:raises ValueError: If the reference length is zero.
"""
if
ignore_case
==
True
:
reference
=
reference
.
lower
()
hypothesis
=
hypothesis
.
lower
()
edit_distance
,
ref_len
=
char_errors
(
reference
,
hypothesis
,
ignore_case
,
remove_space
)
join_char
=
' '
if
remove_space
==
True
:
join_char
=
''
reference
=
join_char
.
join
(
filter
(
None
,
reference
.
split
(
' '
)))
hypothesis
=
join_char
.
join
(
filter
(
None
,
hypothesis
.
split
(
' '
)))
if
len
(
reference
)
==
0
:
if
ref_len
==
0
:
raise
ValueError
(
"Length of reference should be greater than 0."
)
edit_distance
=
_levenshtein_distance
(
reference
,
hypothesis
)
cer
=
float
(
edit_distance
)
/
len
(
reference
)
cer
=
float
(
edit_distance
)
/
ref_len
return
cer
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录