Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
bd01bc15
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 1 年 前同步成功
通知
206
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
未验证
提交
bd01bc15
编写于
11月 28, 2022
作者:
D
David An (An Hongliang)
提交者:
GitHub
11月 28, 2022
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add greek char and fix issue2571 (#2683)
Co-authored-by:
TianYuan
<
white-sky@qq.com
>
上级
58309aa9
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
28 addition
and
3 deletion
+28
-3
paddlespeech/t2s/frontend/zh_normalization/text_normlization.py
...speech/t2s/frontend/zh_normalization/text_normlization.py
+28
-3
未找到文件。
paddlespeech/t2s/frontend/zh_normalization/text_normlization.py
浏览文件 @
bd01bc15
...
...
@@ -65,7 +65,7 @@ class TextNormalizer():
if
lang
==
"zh"
:
text
=
text
.
replace
(
" "
,
""
)
# 过滤掉特殊字符
text
=
re
.
sub
(
r
'[《》【】<=>{}()()#&@“”^_|…\\]'
,
''
,
text
)
text
=
re
.
sub
(
r
'[
——
《》【】<=>{}()()#&@“”^_|…\\]'
,
''
,
text
)
text
=
self
.
SENTENCE_SPLITOR
.
sub
(
r
'\1\n'
,
text
)
text
=
text
.
strip
()
sentences
=
[
sentence
.
strip
()
for
sentence
in
re
.
split
(
r
'\n+'
,
text
)]
...
...
@@ -85,7 +85,33 @@ class TextNormalizer():
sentence
=
sentence
.
replace
(
'⑧'
,
'八'
)
sentence
=
sentence
.
replace
(
'⑨'
,
'九'
)
sentence
=
sentence
.
replace
(
'⑩'
,
'十'
)
sentence
=
sentence
.
replace
(
'α'
,
'阿尔法'
)
sentence
=
sentence
.
replace
(
'β'
,
'贝塔'
)
sentence
=
sentence
.
replace
(
'γ'
,
'伽玛'
).
replace
(
'Γ'
,
'伽玛'
)
sentence
=
sentence
.
replace
(
'δ'
,
'德尔塔'
).
replace
(
'Δ'
,
'德尔塔'
)
sentence
=
sentence
.
replace
(
'ε'
,
'艾普西龙'
)
sentence
=
sentence
.
replace
(
'ζ'
,
'捷塔'
)
sentence
=
sentence
.
replace
(
'η'
,
'依塔'
)
sentence
=
sentence
.
replace
(
'θ'
,
'西塔'
).
replace
(
'Θ'
,
'西塔'
)
sentence
=
sentence
.
replace
(
'ι'
,
'艾欧塔'
)
sentence
=
sentence
.
replace
(
'κ'
,
'喀帕'
)
sentence
=
sentence
.
replace
(
'λ'
,
'拉姆达'
).
replace
(
'Λ'
,
'拉姆达'
)
sentence
=
sentence
.
replace
(
'μ'
,
'缪'
)
sentence
=
sentence
.
replace
(
'ν'
,
'拗'
)
sentence
=
sentence
.
replace
(
'ξ'
,
'克西'
).
replace
(
'Ξ'
,
'克西'
)
sentence
=
sentence
.
replace
(
'ο'
,
'欧米克伦'
)
sentence
=
sentence
.
replace
(
'π'
,
'派'
).
replace
(
'Π'
,
'派'
)
sentence
=
sentence
.
replace
(
'ρ'
,
'肉'
)
sentence
=
sentence
.
replace
(
'ς'
,
'西格玛'
).
replace
(
'Σ'
,
'西格玛'
).
replace
(
'σ'
,
'西格玛'
)
sentence
=
sentence
.
replace
(
'τ'
,
'套'
)
sentence
=
sentence
.
replace
(
'υ'
,
'宇普西龙'
)
sentence
=
sentence
.
replace
(
'φ'
,
'服艾'
).
replace
(
'Φ'
,
'服艾'
)
sentence
=
sentence
.
replace
(
'χ'
,
'器'
)
sentence
=
sentence
.
replace
(
'ψ'
,
'普赛'
).
replace
(
'Ψ'
,
'普赛'
)
sentence
=
sentence
.
replace
(
'ω'
,
'欧米伽'
).
replace
(
'Ω'
,
'欧米伽'
)
# re filter special characters, have one more character "-" than line 68
sentence
=
re
.
sub
(
r
'[-——《》【】<=>{}()()#&@“”^_|…\\]'
,
''
,
sentence
)
return
sentence
def
normalize_sentence
(
self
,
sentence
:
str
)
->
str
:
...
...
@@ -124,6 +150,5 @@ class TextNormalizer():
def
normalize
(
self
,
text
:
str
)
->
List
[
str
]:
sentences
=
self
.
_split
(
text
)
sentences
=
[
self
.
normalize_sentence
(
sent
)
for
sent
in
sentences
]
return
sentences
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录