Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
81f29359
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 2 年 前同步成功
通知
210
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
81f29359
编写于
8月 19, 2022
作者:
L
liangym
提交者:
GitHub
8月 19, 2022
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #2273 from lym0302/r1.1
[cherry-pick] [r1.1] fix point bug
上级
256f13ca
9b7bf4bb
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
36 addition
and
7 deletion
+36
-7
paddlespeech/t2s/frontend/mix_frontend.py
paddlespeech/t2s/frontend/mix_frontend.py
+36
-7
未找到文件。
paddlespeech/t2s/frontend/mix_frontend.py
浏览文件 @
81f29359
...
...
@@ -62,9 +62,31 @@ class MixFrontend():
def
_split
(
self
,
text
:
str
)
->
List
[
str
]:
text
=
re
.
sub
(
r
'[《》【】<=>{}()()#&@“”^_|…\\]'
,
''
,
text
)
# 替换英文句子的句号 "." --> "。" 用于后续分句
point
=
"."
point_indexs
=
[]
index
=
-
1
for
i
in
range
(
text
.
count
(
point
)):
index
=
text
.
find
(
"."
,
index
+
1
,
len
(
text
))
point_indexs
.
append
(
index
)
print
(
point_indexs
)
for
point_index
in
point_indexs
:
# 如果点在最开始或者最末尾的位置,不处理
if
point_index
==
0
or
point_index
==
len
(
text
)
-
1
:
pass
else
:
if
((
self
.
is_alphabet
(
text
[
point_index
-
1
])
or
text
[
point_index
-
1
]
==
" "
)
and
(
self
.
is_alphabet
(
text
[
point_index
+
1
])
or
text
[
point_index
+
1
]
==
" "
)):
text
=
text
.
replace
(
text
[
point_index
],
"。"
)
text
=
self
.
SENTENCE_SPLITOR
.
sub
(
r
'\1\n'
,
text
)
text
=
text
.
strip
()
sentences
=
[
sentence
.
strip
()
for
sentence
in
re
.
split
(
r
'\n+'
,
text
)]
return
sentences
def
_distinguish
(
self
,
text
:
str
)
->
List
[
str
]:
...
...
@@ -77,9 +99,11 @@ class MixFrontend():
temp_seg
=
""
temp_lang
=
""
# Determine the type of each character. type: blank, chinese, alphabet, number, unk.
# Determine the type of each character. type: blank, chinese, alphabet, number, unk
and point
.
for
ch
in
text
:
if
self
.
is_chinese
(
ch
):
if
ch
==
"."
:
types
.
append
(
"point"
)
elif
self
.
is_chinese
(
ch
):
types
.
append
(
"zh"
)
elif
self
.
is_alphabet
(
ch
):
types
.
append
(
"en"
)
...
...
@@ -96,21 +120,26 @@ class MixFrontend():
# find the first char of the seg
if
flag
==
0
:
if
types
[
i
]
!=
"unk"
and
types
[
i
]
!=
"blank"
:
# 首个字符是中文,英文或者数字
if
types
[
i
]
==
"zh"
or
types
[
i
]
==
"en"
or
types
[
i
]
==
"num"
:
temp_seg
+=
text
[
i
]
temp_lang
=
types
[
i
]
flag
=
1
else
:
if
types
[
i
]
==
temp_lang
or
types
[
i
]
==
"num"
:
# 数字和小数点均与前面的字符合并,类型属于前面一个字符的类型
if
types
[
i
]
==
temp_lang
or
types
[
i
]
==
"num"
or
types
[
i
]
==
"point"
:
temp_seg
+=
text
[
i
]
elif
temp_lang
==
"num"
and
types
[
i
]
!=
"unk"
:
# 数字与后面的任意字符都拼接
elif
temp_lang
==
"num"
:
temp_seg
+=
text
[
i
]
if
types
[
i
]
==
"zh"
or
types
[
i
]
==
"en"
:
temp_lang
=
types
[
i
]
elif
temp_lang
==
"en"
and
types
[
i
]
==
"blank"
:
# 如果是空格则与前面字符拼接
elif
types
[
i
]
==
"blank"
:
temp_seg
+=
text
[
i
]
elif
types
[
i
]
==
"unk"
:
...
...
@@ -119,7 +148,7 @@ class MixFrontend():
else
:
segments
.
append
((
temp_seg
,
temp_lang
))
if
types
[
i
]
!=
"unk"
and
types
[
i
]
!=
"blank
"
:
if
types
[
i
]
==
"zh"
or
types
[
i
]
==
"en
"
:
temp_seg
=
text
[
i
]
temp_lang
=
types
[
i
]
flag
=
1
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录