Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
8c7859d3
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 2 年 前同步成功
通知
210
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
8c7859d3
编写于
4月 21, 2023
作者:
S
Shuangchi He
提交者:
GitHub
4月 21, 2023
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Fix some typos. (#3178)
Signed-off-by:
Yulv-git
<
yulvchi@qq.com
>
上级
35d874c5
变更
41
隐藏空白更改
内联
并排
Showing
41 changed file
with
63 addition
and
63 deletion
+63
-63
.github/CONTRIBUTING.md
.github/CONTRIBUTING.md
+1
-1
audio/paddleaudio/backends/soundfile_backend.py
audio/paddleaudio/backends/soundfile_backend.py
+1
-1
demos/TTSAndroid/README.md
demos/TTSAndroid/README.md
+1
-1
demos/TTSArmLinux/front.conf
demos/TTSArmLinux/front.conf
+2
-2
demos/TTSCppFrontend/front_demo/front.conf
demos/TTSCppFrontend/front_demo/front.conf
+2
-2
demos/TTSCppFrontend/front_demo/front_demo.cpp
demos/TTSCppFrontend/front_demo/front_demo.cpp
+1
-1
demos/TTSCppFrontend/front_demo/gentools/word2phones.py
demos/TTSCppFrontend/front_demo/gentools/word2phones.py
+3
-3
demos/TTSCppFrontend/src/front/front_interface.cpp
demos/TTSCppFrontend/src/front/front_interface.cpp
+4
-4
demos/TTSCppFrontend/src/front/front_interface.h
demos/TTSCppFrontend/src/front/front_interface.h
+1
-1
demos/speech_web/README.md
demos/speech_web/README.md
+1
-1
demos/speech_web/speech_server/main.py
demos/speech_web/speech_server/main.py
+1
-1
docs/tutorial/st/st_tutorial.ipynb
docs/tutorial/st/st_tutorial.ipynb
+1
-1
docs/tutorial/tts/tts_tutorial.ipynb
docs/tutorial/tts/tts_tutorial.ipynb
+1
-1
examples/librispeech/asr2/README.md
examples/librispeech/asr2/README.md
+1
-1
examples/other/mfa/local/generate_lexicon.py
examples/other/mfa/local/generate_lexicon.py
+1
-1
examples/tiny/asr1/README.md
examples/tiny/asr1/README.md
+1
-1
paddlespeech/s2t/__init__.py
paddlespeech/s2t/__init__.py
+1
-1
paddlespeech/s2t/frontend/augmentor/augmentation.py
paddlespeech/s2t/frontend/augmentor/augmentation.py
+1
-1
paddlespeech/s2t/io/speechbrain/sampler.py
paddlespeech/s2t/io/speechbrain/sampler.py
+1
-1
paddlespeech/s2t/models/u2/u2.py
paddlespeech/s2t/models/u2/u2.py
+2
-2
paddlespeech/s2t/models/u2_st/u2_st.py
paddlespeech/s2t/models/u2_st/u2_st.py
+1
-1
paddlespeech/server/engine/asr/online/python/asr_engine.py
paddlespeech/server/engine/asr/online/python/asr_engine.py
+1
-1
paddlespeech/server/ws/asr_api.py
paddlespeech/server/ws/asr_api.py
+1
-1
paddlespeech/t2s/frontend/generate_lexicon.py
paddlespeech/t2s/frontend/generate_lexicon.py
+1
-1
paddlespeech/t2s/models/waveflow.py
paddlespeech/t2s/models/waveflow.py
+4
-4
paddlespeech/t2s/modules/transformer/lightconv.py
paddlespeech/t2s/modules/transformer/lightconv.py
+1
-1
paddlespeech/vector/exps/ecapa_tdnn/train.py
paddlespeech/vector/exps/ecapa_tdnn/train.py
+2
-2
paddlespeech/vector/exps/ge2e/preprocess.py
paddlespeech/vector/exps/ge2e/preprocess.py
+1
-1
speechx/examples/ds2_ol/onnx/local/onnx_infer_shape.py
speechx/examples/ds2_ol/onnx/local/onnx_infer_shape.py
+1
-1
speechx/speechx/frontend/audio/db_norm.cc
speechx/speechx/frontend/audio/db_norm.cc
+1
-1
speechx/speechx/kaldi/base/kaldi-types.h
speechx/speechx/kaldi/base/kaldi-types.h
+1
-1
speechx/speechx/kaldi/feat/pitch-functions.cc
speechx/speechx/kaldi/feat/pitch-functions.cc
+1
-1
speechx/speechx/kaldi/lat/lattice-functions.h
speechx/speechx/kaldi/lat/lattice-functions.h
+8
-8
speechx/speechx/kaldi/matrix/kaldi-matrix.cc
speechx/speechx/kaldi/matrix/kaldi-matrix.cc
+1
-1
speechx/speechx/kaldi/matrix/sparse-matrix.cc
speechx/speechx/kaldi/matrix/sparse-matrix.cc
+1
-1
speechx/speechx/kaldi/util/kaldi-table-inl.h
speechx/speechx/kaldi/util/kaldi-table-inl.h
+1
-1
speechx/speechx/nnet/ds2_nnet.cc
speechx/speechx/nnet/ds2_nnet.cc
+1
-1
speechx/speechx/protocol/websocket/websocket_server.cc
speechx/speechx/protocol/websocket/websocket_server.cc
+4
-4
tools/extras/install_mkl.sh
tools/extras/install_mkl.sh
+1
-1
utils/fst/ctc_token_fst.py
utils/fst/ctc_token_fst.py
+1
-1
utils/tokenizer.perl
utils/tokenizer.perl
+1
-1
未找到文件。
.github/CONTRIBUTING.md
浏览文件 @
8c7859d3
...
@@ -27,4 +27,4 @@ git commit -m "xxxxxx, test=doc"
...
@@ -27,4 +27,4 @@ git commit -m "xxxxxx, test=doc"
1.
虽然跳过了 CI,但是还要先排队排到才能跳过,所以非自己方向看到 pending 不要着急 🤣
1.
虽然跳过了 CI,但是还要先排队排到才能跳过,所以非自己方向看到 pending 不要着急 🤣
2.
在
`git commit --amend`
的时候才加
`test=xxx`
可能不太有效
2.
在
`git commit --amend`
的时候才加
`test=xxx`
可能不太有效
3.
一个 pr 多次提交 commit 注意每次都要加
`test=xxx`
,因为每个 commit 都会触发 CI
3.
一个 pr 多次提交 commit 注意每次都要加
`test=xxx`
,因为每个 commit 都会触发 CI
4.
删除 python 环境中已经安装好的
的
paddlespeech,否则可能会影响 import paddlespeech 的顺序
</div>
4.
删除 python 环境中已经安装好的 paddlespeech,否则可能会影响 import paddlespeech 的顺序
</div>
audio/paddleaudio/backends/soundfile_backend.py
浏览文件 @
8c7859d3
...
@@ -191,7 +191,7 @@ def soundfile_save(y: np.ndarray, sr: int, file: os.PathLike) -> None:
...
@@ -191,7 +191,7 @@ def soundfile_save(y: np.ndarray, sr: int, file: os.PathLike) -> None:
if
sr
<=
0
:
if
sr
<=
0
:
raise
ParameterError
(
raise
ParameterError
(
f
'Sample rate should be larger than 0, rec
ie
ved sr =
{
sr
}
'
)
f
'Sample rate should be larger than 0, rec
ei
ved sr =
{
sr
}
'
)
if
y
.
dtype
not
in
[
'int16'
,
'int8'
]:
if
y
.
dtype
not
in
[
'int16'
,
'int8'
]:
warnings
.
warn
(
warnings
.
warn
(
...
...
demos/TTSAndroid/README.md
浏览文件 @
8c7859d3
# 语音合成 Java API Demo 使用指南
# 语音合成 Java API Demo 使用指南
在 Android 上实现语音合成功能,此 Demo 有很好的
的
易用性和开放性,如在 Demo 中跑自己训练好的模型等。
在 Android 上实现语音合成功能,此 Demo 有很好的易用性和开放性,如在 Demo 中跑自己训练好的模型等。
本文主要介绍语音合成 Demo 运行方法。
本文主要介绍语音合成 Demo 运行方法。
...
...
demos/TTSArmLinux/front.conf
浏览文件 @
8c7859d3
...
@@ -6,13 +6,13 @@
...
@@ -6,13 +6,13 @@
--
jieba_stop_word_path
=./
dict
/
jieba
/
stop_words
.
utf8
--
jieba_stop_word_path
=./
dict
/
jieba
/
stop_words
.
utf8
# dict conf fastspeech2_0.4
# dict conf fastspeech2_0.4
--
sep
e
rate_tone
=
false
--
sep
a
rate_tone
=
false
--
word2phone_path
=./
dict
/
fastspeech2_nosil_baker_ckpt_0
.
4
/
word2phone_fs2
.
dict
--
word2phone_path
=./
dict
/
fastspeech2_nosil_baker_ckpt_0
.
4
/
word2phone_fs2
.
dict
--
phone2id_path
=./
dict
/
fastspeech2_nosil_baker_ckpt_0
.
4
/
phone_id_map
.
txt
--
phone2id_path
=./
dict
/
fastspeech2_nosil_baker_ckpt_0
.
4
/
phone_id_map
.
txt
--
tone2id_path
=./
dict
/
fastspeech2_nosil_baker_ckpt_0
.
4
/
word2phone_fs2
.
dict
--
tone2id_path
=./
dict
/
fastspeech2_nosil_baker_ckpt_0
.
4
/
word2phone_fs2
.
dict
# dict conf speedyspeech_0.5
# dict conf speedyspeech_0.5
#--sep
e
rate_tone=true
#--sep
a
rate_tone=true
#--word2phone_path=./dict/speedyspeech_nosil_baker_ckpt_0.5/word2phone.dict
#--word2phone_path=./dict/speedyspeech_nosil_baker_ckpt_0.5/word2phone.dict
#--phone2id_path=./dict/speedyspeech_nosil_baker_ckpt_0.5/phone_id_map.txt
#--phone2id_path=./dict/speedyspeech_nosil_baker_ckpt_0.5/phone_id_map.txt
#--tone2id_path=./dict/speedyspeech_nosil_baker_ckpt_0.5/tone_id_map.txt
#--tone2id_path=./dict/speedyspeech_nosil_baker_ckpt_0.5/tone_id_map.txt
...
...
demos/TTSCppFrontend/front_demo/front.conf
浏览文件 @
8c7859d3
...
@@ -6,13 +6,13 @@
...
@@ -6,13 +6,13 @@
--
jieba_stop_word_path
=./
front_demo
/
dict
/
jieba
/
stop_words
.
utf8
--
jieba_stop_word_path
=./
front_demo
/
dict
/
jieba
/
stop_words
.
utf8
# dict conf fastspeech2_0.4
# dict conf fastspeech2_0.4
--
sep
e
rate_tone
=
false
--
sep
a
rate_tone
=
false
--
word2phone_path
=./
front_demo
/
dict
/
fastspeech2_nosil_baker_ckpt_0
.
4
/
word2phone_fs2
.
dict
--
word2phone_path
=./
front_demo
/
dict
/
fastspeech2_nosil_baker_ckpt_0
.
4
/
word2phone_fs2
.
dict
--
phone2id_path
=./
front_demo
/
dict
/
fastspeech2_nosil_baker_ckpt_0
.
4
/
phone_id_map
.
txt
--
phone2id_path
=./
front_demo
/
dict
/
fastspeech2_nosil_baker_ckpt_0
.
4
/
phone_id_map
.
txt
--
tone2id_path
=./
front_demo
/
dict
/
fastspeech2_nosil_baker_ckpt_0
.
4
/
word2phone_fs2
.
dict
--
tone2id_path
=./
front_demo
/
dict
/
fastspeech2_nosil_baker_ckpt_0
.
4
/
word2phone_fs2
.
dict
# dict conf speedyspeech_0.5
# dict conf speedyspeech_0.5
#--sep
e
rate_tone=true
#--sep
a
rate_tone=true
#--word2phone_path=./front_demo/dict/speedyspeech_nosil_baker_ckpt_0.5/word2phone.dict
#--word2phone_path=./front_demo/dict/speedyspeech_nosil_baker_ckpt_0.5/word2phone.dict
#--phone2id_path=./front_demo/dict/speedyspeech_nosil_baker_ckpt_0.5/phone_id_map.txt
#--phone2id_path=./front_demo/dict/speedyspeech_nosil_baker_ckpt_0.5/phone_id_map.txt
#--tone2id_path=./front_demo/dict/speedyspeech_nosil_baker_ckpt_0.5/tone_id_map.txt
#--tone2id_path=./front_demo/dict/speedyspeech_nosil_baker_ckpt_0.5/tone_id_map.txt
...
...
demos/TTSCppFrontend/front_demo/front_demo.cpp
浏览文件 @
8c7859d3
...
@@ -20,7 +20,7 @@
...
@@ -20,7 +20,7 @@
DEFINE_string
(
sentence
,
"你好,欢迎使用语音合成服务"
,
"Text to be synthesized"
);
DEFINE_string
(
sentence
,
"你好,欢迎使用语音合成服务"
,
"Text to be synthesized"
);
DEFINE_string
(
front_conf
,
"./front_demo/front.conf"
,
"Front conf file"
);
DEFINE_string
(
front_conf
,
"./front_demo/front.conf"
,
"Front conf file"
);
// DEFINE_string(sep
e
rate_tone, "true", "If true, get phoneids and tonesid");
// DEFINE_string(sep
a
rate_tone, "true", "If true, get phoneids and tonesid");
int
main
(
int
argc
,
char
**
argv
)
{
int
main
(
int
argc
,
char
**
argv
)
{
...
...
demos/TTSCppFrontend/front_demo/gentools/word2phones.py
浏览文件 @
8c7859d3
...
@@ -20,7 +20,7 @@ worddict = "./dict/jieba_part.dict.utf8"
...
@@ -20,7 +20,7 @@ worddict = "./dict/jieba_part.dict.utf8"
newdict
=
"./dict/word_phones.dict"
newdict
=
"./dict/word_phones.dict"
def
GenPhones
(
initials
,
finals
,
sep
e
rate
=
True
):
def
GenPhones
(
initials
,
finals
,
sep
a
rate
=
True
):
phones
=
[]
phones
=
[]
for
c
,
v
in
zip
(
initials
,
finals
):
for
c
,
v
in
zip
(
initials
,
finals
):
...
@@ -30,9 +30,9 @@ def GenPhones(initials, finals, seperate=True):
...
@@ -30,9 +30,9 @@ def GenPhones(initials, finals, seperate=True):
elif
c
in
[
'zh'
,
'ch'
,
'sh'
,
'r'
]:
elif
c
in
[
'zh'
,
'ch'
,
'sh'
,
'r'
]:
v
=
re
.
sub
(
'i'
,
'iii'
,
v
)
v
=
re
.
sub
(
'i'
,
'iii'
,
v
)
if
c
:
if
c
:
if
sep
e
rate
is
True
:
if
sep
a
rate
is
True
:
phones
.
append
(
c
+
'0'
)
phones
.
append
(
c
+
'0'
)
elif
sep
e
rate
is
False
:
elif
sep
a
rate
is
False
:
phones
.
append
(
c
)
phones
.
append
(
c
)
else
:
else
:
print
(
"Not sure whether phone and tone need to be separated"
)
print
(
"Not sure whether phone and tone need to be separated"
)
...
...
demos/TTSCppFrontend/src/front/front_interface.cpp
浏览文件 @
8c7859d3
...
@@ -126,7 +126,7 @@ int FrontEngineInterface::init() {
...
@@ -126,7 +126,7 @@ int FrontEngineInterface::init() {
}
}
// 生成音调字典(音调到音调id的映射)
// 生成音调字典(音调到音调id的映射)
if
(
_sep
e
rate_tone
==
"true"
)
{
if
(
_sep
a
rate_tone
==
"true"
)
{
if
(
0
!=
GenDict
(
_tone2id_path
,
&
tone_id_map
))
{
if
(
0
!=
GenDict
(
_tone2id_path
,
&
tone_id_map
))
{
LOG
(
ERROR
)
<<
"Genarate tone2id dict failed"
;
LOG
(
ERROR
)
<<
"Genarate tone2id dict failed"
;
return
-
1
;
return
-
1
;
...
@@ -168,7 +168,7 @@ int FrontEngineInterface::ReadConfFile() {
...
@@ -168,7 +168,7 @@ int FrontEngineInterface::ReadConfFile() {
_jieba_stop_word_path
=
conf_map
[
"jieba_stop_word_path"
];
_jieba_stop_word_path
=
conf_map
[
"jieba_stop_word_path"
];
// dict path
// dict path
_sep
erate_tone
=
conf_map
[
"sepe
rate_tone"
];
_sep
arate_tone
=
conf_map
[
"sepa
rate_tone"
];
_word2phone_path
=
conf_map
[
"word2phone_path"
];
_word2phone_path
=
conf_map
[
"word2phone_path"
];
_phone2id_path
=
conf_map
[
"phone2id_path"
];
_phone2id_path
=
conf_map
[
"phone2id_path"
];
_tone2id_path
=
conf_map
[
"tone2id_path"
];
_tone2id_path
=
conf_map
[
"tone2id_path"
];
...
@@ -295,7 +295,7 @@ int FrontEngineInterface::GetWordsIds(
...
@@ -295,7 +295,7 @@ int FrontEngineInterface::GetWordsIds(
}
}
}
}
}
else
{
// 标点符号
}
else
{
// 标点符号
if
(
_sep
e
rate_tone
==
"true"
)
{
if
(
_sep
a
rate_tone
==
"true"
)
{
phone
=
"sp0"
;
// speedyspeech
phone
=
"sp0"
;
// speedyspeech
}
else
{
}
else
{
phone
=
"sp"
;
// fastspeech2
phone
=
"sp"
;
// fastspeech2
...
@@ -354,7 +354,7 @@ int FrontEngineInterface::Phone2Phoneid(const std::string &phone,
...
@@ -354,7 +354,7 @@ int FrontEngineInterface::Phone2Phoneid(const std::string &phone,
std
::
string
temp_phone
;
std
::
string
temp_phone
;
for
(
int
i
=
0
;
i
<
phone_vec
.
size
();
i
++
)
{
for
(
int
i
=
0
;
i
<
phone_vec
.
size
();
i
++
)
{
temp_phone
=
phone_vec
[
i
];
temp_phone
=
phone_vec
[
i
];
if
(
_sep
e
rate_tone
==
"true"
)
{
if
(
_sep
a
rate_tone
==
"true"
)
{
phoneid
->
push_back
(
atoi
(
phoneid
->
push_back
(
atoi
(
(
phone_id_map
[
temp_phone
.
substr
(
0
,
temp_phone
.
length
()
-
1
)])
(
phone_id_map
[
temp_phone
.
substr
(
0
,
temp_phone
.
length
()
-
1
)])
.
c_str
()));
.
c_str
()));
...
...
demos/TTSCppFrontend/src/front/front_interface.h
浏览文件 @
8c7859d3
...
@@ -182,7 +182,7 @@ class FrontEngineInterface : public TextNormalizer {
...
@@ -182,7 +182,7 @@ class FrontEngineInterface : public TextNormalizer {
std
::
string
_jieba_idf_path
;
std
::
string
_jieba_idf_path
;
std
::
string
_jieba_stop_word_path
;
std
::
string
_jieba_stop_word_path
;
std
::
string
_sep
e
rate_tone
;
std
::
string
_sep
a
rate_tone
;
std
::
string
_word2phone_path
;
std
::
string
_word2phone_path
;
std
::
string
_phone2id_path
;
std
::
string
_phone2id_path
;
std
::
string
_tone2id_path
;
std
::
string
_tone2id_path
;
...
...
demos/speech_web/README.md
浏览文件 @
8c7859d3
...
@@ -23,7 +23,7 @@ Paddle Speech Demo 是一个以 PaddleSpeech 的语音交互功能为主体开
...
@@ -23,7 +23,7 @@ Paddle Speech Demo 是一个以 PaddleSpeech 的语音交互功能为主体开
+
ERNIE-SAT:语言-语音跨模态大模型 ERNIE-SAT 可视化展示示例,支持个性化合成,跨语言语音合成(音频为中文则输入英文文本进行合成),语音编辑(修改音频文字中间的结果)功能。 ERNIE-SAT 更多实现细节,可以参考:
+
ERNIE-SAT:语言-语音跨模态大模型 ERNIE-SAT 可视化展示示例,支持个性化合成,跨语言语音合成(音频为中文则输入英文文本进行合成),语音编辑(修改音频文字中间的结果)功能。 ERNIE-SAT 更多实现细节,可以参考:
+
[
【ERNIE-SAT with AISHELL-3 dataset】
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/ernie_sat
)
+
[
【ERNIE-SAT with AISHELL-3 dataset】
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/ernie_sat
)
+
[
【ERNIE-SAT with
with
AISHELL3 and VCTK datasets】
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3_vctk/ernie_sat
)
+
[
【ERNIE-SAT with AISHELL3 and VCTK datasets】
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3_vctk/ernie_sat
)
+
[
【ERNIE-SAT with VCTK dataset】
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/ernie_sat
)
+
[
【ERNIE-SAT with VCTK dataset】
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/ernie_sat
)
运行效果:
运行效果:
...
...
demos/speech_web/speech_server/main.py
浏览文件 @
8c7859d3
...
@@ -260,7 +260,7 @@ async def websocket_endpoint_online(websocket: WebSocket):
...
@@ -260,7 +260,7 @@ async def websocket_endpoint_online(websocket: WebSocket):
# and we break the loop
# and we break the loop
if
message
[
'signal'
]
==
'start'
:
if
message
[
'signal'
]
==
'start'
:
resp
=
{
"status"
:
"ok"
,
"signal"
:
"server_ready"
}
resp
=
{
"status"
:
"ok"
,
"signal"
:
"server_ready"
}
# do something at begining here
# do something at begin
n
ing here
# create the instance to process the audio
# create the instance to process the audio
# connection_handler = chatbot.asr.connection_handler
# connection_handler = chatbot.asr.connection_handler
connection_handler
=
PaddleASRConnectionHanddler
(
engine
)
connection_handler
=
PaddleASRConnectionHanddler
(
engine
)
...
...
docs/tutorial/st/st_tutorial.ipynb
浏览文件 @
8c7859d3
...
@@ -62,7 +62,7 @@
...
@@ -62,7 +62,7 @@
"collapsed": false
"collapsed": false
},
},
"source": [
"source": [
"# 使用Transformer进行端到端语音翻译的
的
基本流程\n",
"# 使用Transformer进行端到端语音翻译的基本流程\n",
"## 基础模型\n",
"## 基础模型\n",
"由于 ASR 章节已经介绍了 Transformer 以及语音特征抽取,在此便不做过多介绍,感兴趣的同学可以去相关章节进行了解。\n",
"由于 ASR 章节已经介绍了 Transformer 以及语音特征抽取,在此便不做过多介绍,感兴趣的同学可以去相关章节进行了解。\n",
"\n",
"\n",
...
...
docs/tutorial/tts/tts_tutorial.ipynb
浏览文件 @
8c7859d3
...
@@ -464,7 +464,7 @@
...
@@ -464,7 +464,7 @@
"<br><center> FastSpeech2 网络结构图</center></br>\n",
"<br><center> FastSpeech2 网络结构图</center></br>\n",
"\n",
"\n",
"\n",
"\n",
"PaddleSpeech TTS 实现的 FastSpeech2 与论文不同的地方在于,我们使用的
的
是 phone 级别的 `pitch` 和 `energy`(与 FastPitch 类似),这样的合成结果可以更加**稳定**。\n",
"PaddleSpeech TTS 实现的 FastSpeech2 与论文不同的地方在于,我们使用的是 phone 级别的 `pitch` 和 `energy`(与 FastPitch 类似),这样的合成结果可以更加**稳定**。\n",
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/862c21456c784c41a83a308b7d9707f0810cc3b3c6f94ed48c60f5d32d0072f0\"></center>\n",
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/862c21456c784c41a83a308b7d9707f0810cc3b3c6f94ed48c60f5d32d0072f0\"></center>\n",
"<br><center> FastPitch 网络结构图</center></br>\n",
"<br><center> FastPitch 网络结构图</center></br>\n",
"\n",
"\n",
...
...
examples/librispeech/asr2/README.md
浏览文件 @
8c7859d3
...
@@ -153,7 +153,7 @@ After training the model, we need to get the final model for testing and inferen
...
@@ -153,7 +153,7 @@ After training the model, we need to get the final model for testing and inferen
```
bash
```
bash
if
[
${
stage
}
-le
2
]
&&
[
${
stop_stage
}
-ge
2
]
;
then
if
[
${
stage
}
-le
2
]
&&
[
${
stop_stage
}
-ge
2
]
;
then
# avg n best model
# avg n best model
avg.sh la
s
test exp/
${
ckpt
}
/checkpoints
${
avg_num
}
avg.sh latest exp/
${
ckpt
}
/checkpoints
${
avg_num
}
fi
fi
```
```
The
`avg.sh`
is in the
`../../../utils/`
which is define in the
`path.sh`
.
The
`avg.sh`
is in the
`../../../utils/`
which is define in the
`path.sh`
.
...
...
examples/other/mfa/local/generate_lexicon.py
浏览文件 @
8c7859d3
...
@@ -48,7 +48,7 @@ def rule(C, V, R, T):
...
@@ -48,7 +48,7 @@ def rule(C, V, R, T):
'i' is distinguished when appeared in phonemes, and separated into 3 categories, 'i', 'ii' and 'iii'.
'i' is distinguished when appeared in phonemes, and separated into 3 categories, 'i', 'ii' and 'iii'.
Erhua is
is
possibly applied to every finals, except for finals that already ends with 'r'.
Erhua is possibly applied to every finals, except for finals that already ends with 'r'.
When a syllable is impossible or does not have any characters with this pronunciation, return None
When a syllable is impossible or does not have any characters with this pronunciation, return None
to filter it out.
to filter it out.
...
...
examples/tiny/asr1/README.md
浏览文件 @
8c7859d3
...
@@ -37,7 +37,7 @@ It will support the way of using `--variable value` in the shell scripts.
...
@@ -37,7 +37,7 @@ It will support the way of using `--variable value` in the shell scripts.
Some local variables are set in
`run.sh`
.
Some local variables are set in
`run.sh`
.
`gpus`
denotes the GPU number you want to use. If you set
`gpus=`
, it means you only use CPU.
`gpus`
denotes the GPU number you want to use. If you set
`gpus=`
, it means you only use CPU.
`stage`
denotes the number of stage you want the start from in the experiments.
`stage`
denotes the number of stage you want the start from in the experiments.
`stop stage`
denotes the number of stage you want the stop at in the expriments.
`stop stage`
denotes the number of stage you want the stop at in the exp
e
riments.
`conf_path`
denotes the config path of the model.
`conf_path`
denotes the config path of the model.
`avg_num`
denotes the number K of top-K models you want to average to get the final model.
`avg_num`
denotes the number K of top-K models you want to average to get the final model.
`ckpt`
denotes the checkpoint prefix of the model, e.g. "transformerr"
`ckpt`
denotes the checkpoint prefix of the model, e.g. "transformerr"
...
...
paddlespeech/s2t/__init__.py
浏览文件 @
8c7859d3
...
@@ -267,7 +267,7 @@ def to(x: paddle.Tensor, *args, **kwargs) -> paddle.Tensor:
...
@@ -267,7 +267,7 @@ def to(x: paddle.Tensor, *args, **kwargs) -> paddle.Tensor:
if
not
hasattr
(
paddle
.
Tensor
,
'to'
):
if
not
hasattr
(
paddle
.
Tensor
,
'to'
):
logger
.
debug
(
"register user to
to
paddle.Tensor, remove this when fixed!"
)
logger
.
debug
(
"register user to paddle.Tensor, remove this when fixed!"
)
setattr
(
paddle
.
Tensor
,
'to'
,
to
)
setattr
(
paddle
.
Tensor
,
'to'
,
to
)
setattr
(
paddle
.
static
.
Variable
,
'to'
,
to
)
setattr
(
paddle
.
static
.
Variable
,
'to'
,
to
)
...
...
paddlespeech/s2t/frontend/augmentor/augmentation.py
浏览文件 @
8c7859d3
...
@@ -45,7 +45,7 @@ class AugmentationPipeline():
...
@@ -45,7 +45,7 @@ class AugmentationPipeline():
samples to make the model invariant to certain types of perturbations in the
samples to make the model invariant to certain types of perturbations in the
real world, improving model's generalization ability.
real world, improving model's generalization ability.
The pipeline is built according t
he
the augmentation configuration in json
The pipeline is built according t
o
the augmentation configuration in json
string, e.g.
string, e.g.
.. code-block::
.. code-block::
...
...
paddlespeech/s2t/io/speechbrain/sampler.py
浏览文件 @
8c7859d3
...
@@ -283,7 +283,7 @@ class DynamicBatchSampler(Sampler):
...
@@ -283,7 +283,7 @@ class DynamicBatchSampler(Sampler):
num_quantiles
,
)
num_quantiles
,
)
# get quantiles using lognormal distribution
# get quantiles using lognormal distribution
quantiles
=
lognorm
.
ppf
(
latent_boundaries
,
1
)
quantiles
=
lognorm
.
ppf
(
latent_boundaries
,
1
)
# scale up to
to
max_batch_length
# scale up to max_batch_length
bucket_boundaries
=
quantiles
*
max_batch_length
/
quantiles
[
-
1
]
bucket_boundaries
=
quantiles
*
max_batch_length
/
quantiles
[
-
1
]
# compute resulting bucket length multipliers
# compute resulting bucket length multipliers
length_multipliers
=
[
length_multipliers
=
[
...
...
paddlespeech/s2t/models/u2/u2.py
浏览文件 @
8c7859d3
...
@@ -560,7 +560,7 @@ class U2BaseModel(ASRInterface, nn.Layer):
...
@@ -560,7 +560,7 @@ class U2BaseModel(ASRInterface, nn.Layer):
[
len
(
hyp
[
0
])
for
hyp
in
hyps
],
place
=
device
,
[
len
(
hyp
[
0
])
for
hyp
in
hyps
],
place
=
device
,
dtype
=
paddle
.
long
)
# (beam_size,)
dtype
=
paddle
.
long
)
# (beam_size,)
hyps_pad
,
_
=
add_sos_eos
(
hyps_pad
,
self
.
sos
,
self
.
eos
,
self
.
ignore_id
)
hyps_pad
,
_
=
add_sos_eos
(
hyps_pad
,
self
.
sos
,
self
.
eos
,
self
.
ignore_id
)
hyps_lens
=
hyps_lens
+
1
# Add <sos> at begining
hyps_lens
=
hyps_lens
+
1
# Add <sos> at begin
n
ing
logger
.
debug
(
logger
.
debug
(
f
"hyps pad:
{
hyps_pad
}
{
self
.
sos
}
{
self
.
eos
}
{
self
.
ignore_id
}
"
)
f
"hyps pad:
{
hyps_pad
}
{
self
.
sos
}
{
self
.
eos
}
{
self
.
ignore_id
}
"
)
...
@@ -709,7 +709,7 @@ class U2BaseModel(ASRInterface, nn.Layer):
...
@@ -709,7 +709,7 @@ class U2BaseModel(ASRInterface, nn.Layer):
hypothesis from ctc prefix beam search and one encoder output
hypothesis from ctc prefix beam search and one encoder output
Args:
Args:
hyps (paddle.Tensor): hyps from ctc prefix beam search, already
hyps (paddle.Tensor): hyps from ctc prefix beam search, already
pad sos at the begining, (B, T)
pad sos at the begin
n
ing, (B, T)
hyps_lens (paddle.Tensor): length of each hyp in hyps, (B)
hyps_lens (paddle.Tensor): length of each hyp in hyps, (B)
encoder_out (paddle.Tensor): corresponding encoder output, (B=1, T, D)
encoder_out (paddle.Tensor): corresponding encoder output, (B=1, T, D)
Returns:
Returns:
...
...
paddlespeech/s2t/models/u2_st/u2_st.py
浏览文件 @
8c7859d3
...
@@ -455,7 +455,7 @@ class U2STBaseModel(nn.Layer):
...
@@ -455,7 +455,7 @@ class U2STBaseModel(nn.Layer):
hypothesis from ctc prefix beam search and one encoder output
hypothesis from ctc prefix beam search and one encoder output
Args:
Args:
hyps (paddle.Tensor): hyps from ctc prefix beam search, already
hyps (paddle.Tensor): hyps from ctc prefix beam search, already
pad sos at the begining, (B, T)
pad sos at the begin
n
ing, (B, T)
hyps_lens (paddle.Tensor): length of each hyp in hyps, (B)
hyps_lens (paddle.Tensor): length of each hyp in hyps, (B)
encoder_out (paddle.Tensor): corresponding encoder output, (B=1, T, D)
encoder_out (paddle.Tensor): corresponding encoder output, (B=1, T, D)
Returns:
Returns:
...
...
paddlespeech/server/engine/asr/online/python/asr_engine.py
浏览文件 @
8c7859d3
...
@@ -609,7 +609,7 @@ class PaddleASRConnectionHanddler:
...
@@ -609,7 +609,7 @@ class PaddleASRConnectionHanddler:
dtype
=
paddle
.
long
)
# (beam_size,)
dtype
=
paddle
.
long
)
# (beam_size,)
hyps_pad
,
_
=
add_sos_eos
(
hyps_pad
,
self
.
model
.
sos
,
self
.
model
.
eos
,
hyps_pad
,
_
=
add_sos_eos
(
hyps_pad
,
self
.
model
.
sos
,
self
.
model
.
eos
,
self
.
model
.
ignore_id
)
self
.
model
.
ignore_id
)
hyps_lens
=
hyps_lens
+
1
# Add <sos> at begining
hyps_lens
=
hyps_lens
+
1
# Add <sos> at begin
n
ing
# ctc score in ln domain
# ctc score in ln domain
# (beam_size, max_hyps_len, vocab_size)
# (beam_size, max_hyps_len, vocab_size)
...
...
paddlespeech/server/ws/asr_api.py
浏览文件 @
8c7859d3
...
@@ -67,7 +67,7 @@ async def websocket_endpoint(websocket: WebSocket):
...
@@ -67,7 +67,7 @@ async def websocket_endpoint(websocket: WebSocket):
# and we break the loop
# and we break the loop
if
message
[
'signal'
]
==
'start'
:
if
message
[
'signal'
]
==
'start'
:
resp
=
{
"status"
:
"ok"
,
"signal"
:
"server_ready"
}
resp
=
{
"status"
:
"ok"
,
"signal"
:
"server_ready"
}
# do something at begining here
# do something at begin
n
ing here
# create the instance to process the audio
# create the instance to process the audio
#connection_handler = PaddleASRConnectionHanddler(asr_model)
#connection_handler = PaddleASRConnectionHanddler(asr_model)
connection_handler
=
asr_model
.
new_handler
()
connection_handler
=
asr_model
.
new_handler
()
...
...
paddlespeech/t2s/frontend/generate_lexicon.py
浏览文件 @
8c7859d3
...
@@ -45,7 +45,7 @@ def rule(C, V, R, T):
...
@@ -45,7 +45,7 @@ def rule(C, V, R, T):
'u' in syllables when certain conditions are satisfied.
'u' in syllables when certain conditions are satisfied.
'i' is distinguished when appeared in phonemes, and separated into 3 categories, 'i', 'ii' and 'iii'.
'i' is distinguished when appeared in phonemes, and separated into 3 categories, 'i', 'ii' and 'iii'.
Erhua is
is
possibly applied to every finals, except for finals that already ends with 'r'.
Erhua is possibly applied to every finals, except for finals that already ends with 'r'.
When a syllable is impossible or does not have any characters with this pronunciation, return None
When a syllable is impossible or does not have any characters with this pronunciation, return None
to filter it out.
to filter it out.
"""
"""
...
...
paddlespeech/t2s/models/waveflow.py
浏览文件 @
8c7859d3
...
@@ -236,7 +236,7 @@ class ResidualBlock(nn.Layer):
...
@@ -236,7 +236,7 @@ class ResidualBlock(nn.Layer):
Returns:
Returns:
res (Tensor):
res (Tensor):
A row of the
the
residual output. shape=(batch_size, channel, 1, width)
A row of the residual output. shape=(batch_size, channel, 1, width)
skip (Tensor):
skip (Tensor):
A row of the skip output. shape=(batch_size, channel, 1, width)
A row of the skip output. shape=(batch_size, channel, 1, width)
...
@@ -343,7 +343,7 @@ class ResidualNet(nn.LayerList):
...
@@ -343,7 +343,7 @@ class ResidualNet(nn.LayerList):
Returns:
Returns:
res (Tensor):
res (Tensor):
A row of the
the
residual output. shape=(batch_size, channel, 1, width)
A row of the residual output. shape=(batch_size, channel, 1, width)
skip (Tensor):
skip (Tensor):
A row of the skip output. shape=(batch_size, channel, 1, width)
A row of the skip output. shape=(batch_size, channel, 1, width)
...
@@ -465,7 +465,7 @@ class Flow(nn.Layer):
...
@@ -465,7 +465,7 @@ class Flow(nn.Layer):
self
.
resnet
.
start_sequence
()
self
.
resnet
.
start_sequence
()
def
inverse
(
self
,
z
,
condition
):
def
inverse
(
self
,
z
,
condition
):
"""Sampling from the
the
distrition p(X). It is done by sample form
"""Sampling from the distrition p(X). It is done by sample form
p(Z) and transform the sample. It is a auto regressive transformation.
p(Z) and transform the sample. It is a auto regressive transformation.
Args:
Args:
...
@@ -600,7 +600,7 @@ class WaveFlow(nn.LayerList):
...
@@ -600,7 +600,7 @@ class WaveFlow(nn.LayerList):
return
z
,
log_det_jacobian
return
z
,
log_det_jacobian
def
inverse
(
self
,
z
,
condition
):
def
inverse
(
self
,
z
,
condition
):
"""Sampling from the
the
distrition p(X).
"""Sampling from the distrition p(X).
It is done by sample a ``z`` form p(Z) and transform it into ``x``.
It is done by sample a ``z`` form p(Z) and transform it into ``x``.
Each Flow transform .. math:: `z_{i-1}` to .. math:: `z_{i}` in an
Each Flow transform .. math:: `z_{i-1}` to .. math:: `z_{i}` in an
...
...
paddlespeech/t2s/modules/transformer/lightconv.py
浏览文件 @
8c7859d3
...
@@ -110,7 +110,7 @@ class LightweightConvolution(nn.Layer):
...
@@ -110,7 +110,7 @@ class LightweightConvolution(nn.Layer):
(batch, time1, time2) mask
(batch, time1, time2) mask
Return:
Return:
Tensor: ouput. (batch, time1, d_model)
Tensor: ou
t
put. (batch, time1, d_model)
"""
"""
# linear -> GLU -> lightconv -> linear
# linear -> GLU -> lightconv -> linear
...
...
paddlespeech/vector/exps/ecapa_tdnn/train.py
浏览文件 @
8c7859d3
...
@@ -51,7 +51,7 @@ def main(args, config):
...
@@ -51,7 +51,7 @@ def main(args, config):
# stage0: set the training device, cpu or gpu
# stage0: set the training device, cpu or gpu
paddle
.
set_device
(
args
.
device
)
paddle
.
set_device
(
args
.
device
)
# stage1: we must call the paddle.distributed.init_parallel_env() api at the begining
# stage1: we must call the paddle.distributed.init_parallel_env() api at the begin
n
ing
paddle
.
distributed
.
init_parallel_env
()
paddle
.
distributed
.
init_parallel_env
()
nranks
=
paddle
.
distributed
.
get_world_size
()
nranks
=
paddle
.
distributed
.
get_world_size
()
rank
=
paddle
.
distributed
.
get_rank
()
rank
=
paddle
.
distributed
.
get_rank
()
...
@@ -146,7 +146,7 @@ def main(args, config):
...
@@ -146,7 +146,7 @@ def main(args, config):
timer
.
start
()
timer
.
start
()
for
epoch
in
range
(
start_epoch
+
1
,
config
.
epochs
+
1
):
for
epoch
in
range
(
start_epoch
+
1
,
config
.
epochs
+
1
):
# at the begining, model must set to train mode
# at the begin
n
ing, model must set to train mode
model
.
train
()
model
.
train
()
avg_loss
=
0
avg_loss
=
0
...
...
paddlespeech/vector/exps/ge2e/preprocess.py
浏览文件 @
8c7859d3
...
@@ -42,7 +42,7 @@ if __name__ == "__main__":
...
@@ -42,7 +42,7 @@ if __name__ == "__main__":
parser
.
add_argument
(
parser
.
add_argument
(
"--skip_existing"
,
"--skip_existing"
,
action
=
"store_true"
,
action
=
"store_true"
,
help
=
"Whether to skip ouput files with the same name. Useful if this script was interrupted."
help
=
"Whether to skip ou
t
put files with the same name. Useful if this script was interrupted."
)
)
parser
.
add_argument
(
parser
.
add_argument
(
"--no_trim"
,
"--no_trim"
,
...
...
speechx/examples/ds2_ol/onnx/local/onnx_infer_shape.py
浏览文件 @
8c7859d3
...
@@ -2078,7 +2078,7 @@ class SymbolicShapeInference:
...
@@ -2078,7 +2078,7 @@ class SymbolicShapeInference:
output_tensor_ranks
=
get_attribute
(
node
,
'output_tensor_ranks'
)
output_tensor_ranks
=
get_attribute
(
node
,
'output_tensor_ranks'
)
assert
output_tensor_ranks
assert
output_tensor_ranks
# set the context output sep
e
rately.
# set the context output sep
a
rately.
# The first output is autograd's context.
# The first output is autograd's context.
vi
=
self
.
known_vi_
[
node
.
output
[
0
]]
vi
=
self
.
known_vi_
[
node
.
output
[
0
]]
vi
.
CopyFrom
(
vi
.
CopyFrom
(
...
...
speechx/speechx/frontend/audio/db_norm.cc
浏览文件 @
8c7859d3
...
@@ -76,7 +76,7 @@ bool DecibelNormalizer::Compute(VectorBase<BaseFloat>* waves) const {
...
@@ -76,7 +76,7 @@ bool DecibelNormalizer::Compute(VectorBase<BaseFloat>* waves) const {
if
(
gain
>
opts_
.
max_gain_db
)
{
if
(
gain
>
opts_
.
max_gain_db
)
{
LOG
(
ERROR
)
LOG
(
ERROR
)
<<
"Unable to normalize segment to "
<<
opts_
.
target_db
<<
"dB,"
<<
"Unable to normalize segment to "
<<
opts_
.
target_db
<<
"dB,"
<<
"because the
the probable gain have exceeds
opts_.max_gain_db"
<<
"because the
probable gain has exceeded
opts_.max_gain_db"
<<
opts_
.
max_gain_db
<<
"dB."
;
<<
opts_
.
max_gain_db
<<
"dB."
;
return
false
;
return
false
;
}
}
...
...
speechx/speechx/kaldi/base/kaldi-types.h
浏览文件 @
8c7859d3
...
@@ -40,7 +40,7 @@ typedef float BaseFloat;
...
@@ -40,7 +40,7 @@ typedef float BaseFloat;
#include <stdint.h>
#include <stdint.h>
// for discussion on what to do if you need compile kaldi
// for discussion on what to do if you need compile kaldi
// without OpenFST, see the bottom of this
this
file
// without OpenFST, see the bottom of this file
#ifndef COMPILE_WITHOUT_OPENFST
#ifndef COMPILE_WITHOUT_OPENFST
...
...
speechx/speechx/kaldi/feat/pitch-functions.cc
浏览文件 @
8c7859d3
...
@@ -746,7 +746,7 @@ OnlinePitchFeatureImpl::OnlinePitchFeatureImpl(
...
@@ -746,7 +746,7 @@ OnlinePitchFeatureImpl::OnlinePitchFeatureImpl(
Vector
<
BaseFloat
>
lags_offset
(
lags_
);
Vector
<
BaseFloat
>
lags_offset
(
lags_
);
// lags_offset equals lags_ (which are the log-spaced lag values we want to
// lags_offset equals lags_ (which are the log-spaced lag values we want to
// measure the NCCF at) with nccf_first_lag_ / opts.resample_freq subtracted
// measure the NCCF at) with nccf_first_lag_ / opts.resample_freq subtracted
// from each element, so we can treat the measured NCCF values as
as
starting
// from each element, so we can treat the measured NCCF values as starting
// from sample zero in a signal that starts at the point start /
// from sample zero in a signal that starts at the point start /
// opts.resample_freq. This is necessary because the ArbitraryResample code
// opts.resample_freq. This is necessary because the ArbitraryResample code
// assumes that the input signal starts from sample zero.
// assumes that the input signal starts from sample zero.
...
...
speechx/speechx/kaldi/lat/lattice-functions.h
浏览文件 @
8c7859d3
...
@@ -355,12 +355,12 @@ bool PruneLattice(BaseFloat beam, LatticeType *lat);
...
@@ -355,12 +355,12 @@ bool PruneLattice(BaseFloat beam, LatticeType *lat);
//
//
//
//
// /// This function returns the number of words in the longest sentence in a
// /// This function returns the number of words in the longest sentence in a
// /// CompactLattice (i.e. the
the
maximum of any path, of the count of
// /// CompactLattice (i.e. the maximum of any path, of the count of
// /// olabels on that path).
// /// olabels on that path).
// int32 LongestSentenceLength(const Lattice &lat);
// int32 LongestSentenceLength(const Lattice &lat);
//
//
// /// This function returns the number of words in the longest sentence in a
// /// This function returns the number of words in the longest sentence in a
// /// CompactLattice, i.e. the
the
maximum of any path, of the count of
// /// CompactLattice, i.e. the maximum of any path, of the count of
// /// labels on that path... note, in CompactLattice, the ilabels and olabels
// /// labels on that path... note, in CompactLattice, the ilabels and olabels
// /// are identical because it is an acceptor.
// /// are identical because it is an acceptor.
// int32 LongestSentenceLength(const CompactLattice &lat);
// int32 LongestSentenceLength(const CompactLattice &lat);
...
@@ -408,7 +408,7 @@ bool PruneLattice(BaseFloat beam, LatticeType *lat);
...
@@ -408,7 +408,7 @@ bool PruneLattice(BaseFloat beam, LatticeType *lat);
//
//
// /// This function computes the mapping from the pair
// /// This function computes the mapping from the pair
// /// (frame-index, transition-id) to the pair
// /// (frame-index, transition-id) to the pair
// /// (sum-of-acoustic-scores, num-of-occur
ences) over all occu
rences of the
// /// (sum-of-acoustic-scores, num-of-occur
rences) over all occur
rences of the
// /// transition-id in that frame.
// /// transition-id in that frame.
// /// frame-index in the lattice.
// /// frame-index in the lattice.
// /// This function is useful for retaining the acoustic scores in a
// /// This function is useful for retaining the acoustic scores in a
...
@@ -422,13 +422,13 @@ bool PruneLattice(BaseFloat beam, LatticeType *lat);
...
@@ -422,13 +422,13 @@ bool PruneLattice(BaseFloat beam, LatticeType *lat);
// /// @param [out] acoustic_scores
// /// @param [out] acoustic_scores
// /// Pointer to a map from the pair (frame-index,
// /// Pointer to a map from the pair (frame-index,
// /// transition-id) to a pair (sum-of-acoustic-scores,
// /// transition-id) to a pair (sum-of-acoustic-scores,
// /// num-of-occurences).
// /// num-of-occur
r
ences).
// /// Usually the acoustic scores for a pdf-id (and hence
// /// Usually the acoustic scores for a pdf-id (and hence
// /// transition-id) on a frame will be the same for all the
// /// transition-id) on a frame will be the same for all the
// /// occurences of the pdf-id in that frame.
// /// occur
r
ences of the pdf-id in that frame.
// /// But if not, we will take the average of the acoustic
// /// But if not, we will take the average of the acoustic
// /// scores. Hence, we store both the sum-of-acoustic-scores
// /// scores. Hence, we store both the sum-of-acoustic-scores
// /// and the num-of-occurences of the transition-id in that
// /// and the num-of-occur
r
ences of the transition-id in that
// /// frame.
// /// frame.
// void ComputeAcousticScoresMap(
// void ComputeAcousticScoresMap(
// const Lattice &lat,
// const Lattice &lat,
...
@@ -440,8 +440,8 @@ bool PruneLattice(BaseFloat beam, LatticeType *lat);
...
@@ -440,8 +440,8 @@ bool PruneLattice(BaseFloat beam, LatticeType *lat);
// ///
// ///
// /// @param [in] acoustic_scores
// /// @param [in] acoustic_scores
// /// A map from the pair (frame-index, transition-id) to a
// /// A map from the pair (frame-index, transition-id) to a
// /// pair (sum-of-acoustic-scores, num-of-occurences) of
// /// pair (sum-of-acoustic-scores, num-of-occur
r
ences) of
// /// the occurences of the transition-id in that frame.
// /// the occur
r
ences of the transition-id in that frame.
// /// See the comments for ComputeAcousticScoresMap for
// /// See the comments for ComputeAcousticScoresMap for
// /// details.
// /// details.
// /// @param [out] lat Pointer to the output lattice.
// /// @param [out] lat Pointer to the output lattice.
...
...
speechx/speechx/kaldi/matrix/kaldi-matrix.cc
浏览文件 @
8c7859d3
...
@@ -1646,7 +1646,7 @@ SubMatrix<Real>::SubMatrix(const MatrixBase<Real> &M,
...
@@ -1646,7 +1646,7 @@ SubMatrix<Real>::SubMatrix(const MatrixBase<Real> &M,
static_cast
<
UnsignedMatrixIndexT
>
(
M
.
num_rows_
-
ro
)
&&
static_cast
<
UnsignedMatrixIndexT
>
(
M
.
num_rows_
-
ro
)
&&
static_cast
<
UnsignedMatrixIndexT
>
(
c
)
<=
static_cast
<
UnsignedMatrixIndexT
>
(
c
)
<=
static_cast
<
UnsignedMatrixIndexT
>
(
M
.
num_cols_
-
co
));
static_cast
<
UnsignedMatrixIndexT
>
(
M
.
num_cols_
-
co
));
// point to the begining of window
// point to the begin
n
ing of window
MatrixBase
<
Real
>::
num_rows_
=
r
;
MatrixBase
<
Real
>::
num_rows_
=
r
;
MatrixBase
<
Real
>::
num_cols_
=
c
;
MatrixBase
<
Real
>::
num_cols_
=
c
;
MatrixBase
<
Real
>::
stride_
=
M
.
Stride
();
MatrixBase
<
Real
>::
stride_
=
M
.
Stride
();
...
...
speechx/speechx/kaldi/matrix/sparse-matrix.cc
浏览文件 @
8c7859d3
...
@@ -998,7 +998,7 @@ void FilterCompressedMatrixRows(const CompressedMatrix &in,
...
@@ -998,7 +998,7 @@ void FilterCompressedMatrixRows(const CompressedMatrix &in,
// iterating row-wise versus column-wise in compressed-matrix uncompression.
// iterating row-wise versus column-wise in compressed-matrix uncompression.
if
(
num_kept_rows
>
heuristic
*
in
.
NumRows
())
{
if
(
num_kept_rows
>
heuristic
*
in
.
NumRows
())
{
// if quite a few of the
the
rows are kept, it may be more efficient
// if quite a few of the rows are kept, it may be more efficient
// to uncompress the entire compressed matrix, since per-column operation
// to uncompress the entire compressed matrix, since per-column operation
// is more efficient.
// is more efficient.
Matrix
<
BaseFloat
>
full_mat
(
in
);
Matrix
<
BaseFloat
>
full_mat
(
in
);
...
...
speechx/speechx/kaldi/util/kaldi-table-inl.h
浏览文件 @
8c7859d3
...
@@ -1587,7 +1587,7 @@ template<class Holder> class RandomAccessTableReaderImplBase {
...
@@ -1587,7 +1587,7 @@ template<class Holder> class RandomAccessTableReaderImplBase {
// this from a pipe. In principle we could read it on-demand as for the
// this from a pipe. In principle we could read it on-demand as for the
// archives, but this would probably be overkill.
// archives, but this would probably be overkill.
// Note: the code for this
this
class is similar to TableWriterScriptImpl:
// Note: the code for this class is similar to TableWriterScriptImpl:
// try to keep them in sync.
// try to keep them in sync.
template
<
class
Holder
>
template
<
class
Holder
>
class
RandomAccessTableReaderScriptImpl
:
class
RandomAccessTableReaderScriptImpl
:
...
...
speechx/speechx/nnet/ds2_nnet.cc
浏览文件 @
8c7859d3
...
@@ -105,7 +105,7 @@ paddle_infer::Predictor* PaddleNnet::GetPredictor() {
...
@@ -105,7 +105,7 @@ paddle_infer::Predictor* PaddleNnet::GetPredictor() {
while
(
pred_id
<
pool_usages
.
size
())
{
while
(
pred_id
<
pool_usages
.
size
())
{
if
(
pool_usages
[
pred_id
]
==
false
)
{
if
(
pool_usages
[
pred_id
]
==
false
)
{
predictor
=
pool
->
Retrive
(
pred_id
);
predictor
=
pool
->
Retri
e
ve
(
pred_id
);
break
;
break
;
}
}
++
pred_id
;
++
pred_id
;
...
...
speechx/speechx/protocol/websocket/websocket_server.cc
浏览文件 @
8c7859d3
...
@@ -32,14 +32,14 @@ void ConnectionHandler::OnSpeechStart() {
...
@@ -32,14 +32,14 @@ void ConnectionHandler::OnSpeechStart() {
decode_thread_
=
std
::
make_shared
<
std
::
thread
>
(
decode_thread_
=
std
::
make_shared
<
std
::
thread
>
(
&
ConnectionHandler
::
DecodeThreadFunc
,
this
);
&
ConnectionHandler
::
DecodeThreadFunc
,
this
);
got_start_tag_
=
true
;
got_start_tag_
=
true
;
LOG
(
INFO
)
<<
"Server: Rec
ie
ved speech start signal, start reading speech"
;
LOG
(
INFO
)
<<
"Server: Rec
ei
ved speech start signal, start reading speech"
;
json
::
value
rv
=
{{
"status"
,
"ok"
},
{
"type"
,
"server_ready"
}};
json
::
value
rv
=
{{
"status"
,
"ok"
},
{
"type"
,
"server_ready"
}};
ws_
.
text
(
true
);
ws_
.
text
(
true
);
ws_
.
write
(
asio
::
buffer
(
json
::
serialize
(
rv
)));
ws_
.
write
(
asio
::
buffer
(
json
::
serialize
(
rv
)));
}
}
void
ConnectionHandler
::
OnSpeechEnd
()
{
void
ConnectionHandler
::
OnSpeechEnd
()
{
LOG
(
INFO
)
<<
"Server: Rec
ie
ved speech end signal"
;
LOG
(
INFO
)
<<
"Server: Rec
ei
ved speech end signal"
;
if
(
recognizer_
!=
nullptr
)
{
if
(
recognizer_
!=
nullptr
)
{
recognizer_
->
SetFinished
();
recognizer_
->
SetFinished
();
}
}
...
@@ -70,8 +70,8 @@ void ConnectionHandler::OnSpeechData(const beast::flat_buffer& buffer) {
...
@@ -70,8 +70,8 @@ void ConnectionHandler::OnSpeechData(const beast::flat_buffer& buffer) {
pcm_data
(
i
)
=
static_cast
<
float
>
(
*
pdata
);
pcm_data
(
i
)
=
static_cast
<
float
>
(
*
pdata
);
pdata
++
;
pdata
++
;
}
}
VLOG
(
2
)
<<
"Server: Rec
ie
ved "
<<
num_samples
<<
" samples"
;
VLOG
(
2
)
<<
"Server: Rec
ei
ved "
<<
num_samples
<<
" samples"
;
LOG
(
INFO
)
<<
"Server: Rec
ie
ved "
<<
num_samples
<<
" samples"
;
LOG
(
INFO
)
<<
"Server: Rec
ei
ved "
<<
num_samples
<<
" samples"
;
CHECK
(
recognizer_
!=
nullptr
);
CHECK
(
recognizer_
!=
nullptr
);
recognizer_
->
Accept
(
pcm_data
);
recognizer_
->
Accept
(
pcm_data
);
...
...
tools/extras/install_mkl.sh
浏览文件 @
8c7859d3
...
@@ -166,7 +166,7 @@ variable, sudo might not allow it to propagate to the command that it invokes."
...
@@ -166,7 +166,7 @@ variable, sudo might not allow it to propagate to the command that it invokes."
fi
fi
# The install variants, each in a function to simplify error reporting.
# The install variants, each in a function to simplify error reporting.
# Each one invokes a subshell with a 'set -x' to
to
show system-modifying
# Each one invokes a subshell with a 'set -x' to show system-modifying
# commands it runs. The subshells simply limit the scope of this diagnostics
# commands it runs. The subshells simply limit the scope of this diagnostics
# and avoid creating noise (if we were using 'set +x', it would be printed).
# and avoid creating noise (if we were using 'set +x', it would be printed).
Install_redhat
()
{
Install_redhat
()
{
...
...
utils/fst/ctc_token_fst.py
浏览文件 @
8c7859d3
...
@@ -6,7 +6,7 @@ def main(args):
...
@@ -6,7 +6,7 @@ def main(args):
"""Token Transducer"""
"""Token Transducer"""
# <eps> entry
# <eps> entry
print
(
'0 1 <eps> <eps>'
)
print
(
'0 1 <eps> <eps>'
)
# skip begining and ending <blank>
# skip begin
n
ing and ending <blank>
print
(
'1 1 <blank> <eps>'
)
print
(
'1 1 <blank> <eps>'
)
print
(
'2 2 <blank> <eps>'
)
print
(
'2 2 <blank> <eps>'
)
# <eps> exit
# <eps> exit
...
...
utils/tokenizer.perl
浏览文件 @
8c7859d3
...
@@ -296,7 +296,7 @@ sub tokenize
...
@@ -296,7 +296,7 @@ sub tokenize
$text
=~ s/DOTMULTI
\
./DOTDOTMULTI/g;
$text
=~ s/DOTMULTI
\
./DOTDOTMULTI/g;
}
}
# sep
e
rate out
","
except if within numbers (5,300)
# sep
a
rate out
","
except if within numbers (5,300)
#
$text
=~ s/([^
\
p{IsN}])[,]([^
\
p{IsN}])/$1 , $2/g;
#
$text
=~ s/([^
\
p{IsN}])[,]([^
\
p{IsN}])/$1 , $2/g;
# separate out
","
except if within numbers (5,300)
# separate out
","
except if within numbers (5,300)
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录