Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
be99807d
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 1 年 前同步成功
通知
207
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
be99807d
编写于
1月 11, 2022
作者:
J
Jerryuhoo
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Add durations to gen_gta_mel.py inference
上级
61b68ed3
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
34 addition
and
24 deletion
+34
-24
paddlespeech/t2s/exps/speedyspeech/gen_gta_mel.py
paddlespeech/t2s/exps/speedyspeech/gen_gta_mel.py
+4
-2
paddlespeech/t2s/models/speedyspeech/speedyspeech.py
paddlespeech/t2s/models/speedyspeech/speedyspeech.py
+30
-22
未找到文件。
paddlespeech/t2s/exps/speedyspeech/gen_gta_mel.py
浏览文件 @
be99807d
...
...
@@ -73,7 +73,7 @@ def evaluate(args, speedyspeech_config):
speedyspeech_normalizer
=
ZScore
(
mu
,
std
)
speedyspeech_inference
=
SpeedySpeechInference
(
speedyspeech_normalizer
,
model
)
model
)
speedyspeech_inference
.
eval
()
output_dir
=
Path
(
args
.
output_dir
)
...
...
@@ -138,6 +138,8 @@ def evaluate(args, speedyspeech_config):
speaker_id
=
None
durations
=
paddle
.
to_tensor
(
np
.
array
(
durations
))
durations
=
paddle
.
unsqueeze
(
durations
,
axis
=
0
)
# 生成的和真实的可能有 1, 2 帧的差距,但是 batch_fn 会修复
# split data into 3 sections
...
...
@@ -153,7 +155,7 @@ def evaluate(args, speedyspeech_config):
sub_output_dir
.
mkdir
(
parents
=
True
,
exist_ok
=
True
)
with
paddle
.
no_grad
():
mel
=
speedyspeech_inference
(
phone_ids
,
tone_ids
,
spk_id
=
speaker_id
)
mel
=
speedyspeech_inference
(
phone_ids
,
tone_ids
,
durations
=
durations
,
spk_id
=
speaker_id
)
np
.
save
(
sub_output_dir
/
(
utt_id
+
"_feats.npy"
),
mel
)
...
...
paddlespeech/t2s/models/speedyspeech/speedyspeech.py
浏览文件 @
be99807d
...
...
@@ -222,7 +222,7 @@ class SpeedySpeech(nn.Layer):
decoded
=
self
.
decoder
(
encodings
)
return
decoded
,
pred_durations
def
inference
(
self
,
text
,
tones
=
None
,
spk_id
=
None
):
def
inference
(
self
,
text
,
tones
=
None
,
durations
=
None
,
spk_id
=
None
):
# text: [T]
# tones: [T]
# input of embedding must be int64
...
...
@@ -234,24 +234,28 @@ class SpeedySpeech(nn.Layer):
encodings
=
self
.
encoder
(
text
,
tones
,
spk_id
)
pred_durations
=
self
.
duration_predictor
(
encodings
)
# (1, T)
durations_to_expand
=
paddle
.
round
(
pred_durations
.
exp
())
durations_to_expand
=
(
durations_to_expand
).
astype
(
paddle
.
int64
)
slens
=
paddle
.
sum
(
durations_to_expand
,
-
1
)
# [1]
t_dec
=
slens
[
0
]
# [1]
t_enc
=
paddle
.
shape
(
pred_durations
)[
-
1
]
M
=
paddle
.
zeros
([
1
,
t_dec
,
t_enc
])
k
=
paddle
.
full
([
1
],
0
,
dtype
=
paddle
.
int64
)
for
j
in
range
(
t_enc
):
d
=
durations_to_expand
[
0
,
j
]
# If the d == 0, slice action is meaningless and not supported
if
d
>=
1
:
M
[
0
,
k
:
k
+
d
,
j
]
=
1
k
+=
d
encodings
=
paddle
.
matmul
(
M
,
encodings
)
if
type
(
durations
)
==
type
(
None
):
pred_durations
=
self
.
duration_predictor
(
encodings
)
# (1, T)
durations_to_expand
=
paddle
.
round
(
pred_durations
.
exp
())
durations_to_expand
=
(
durations_to_expand
).
astype
(
paddle
.
int64
)
slens
=
paddle
.
sum
(
durations_to_expand
,
-
1
)
# [1]
t_dec
=
slens
[
0
]
# [1]
t_enc
=
paddle
.
shape
(
pred_durations
)[
-
1
]
M
=
paddle
.
zeros
([
1
,
t_dec
,
t_enc
])
k
=
paddle
.
full
([
1
],
0
,
dtype
=
paddle
.
int64
)
for
j
in
range
(
t_enc
):
d
=
durations_to_expand
[
0
,
j
]
# If the d == 0, slice action is meaningless and not supported
if
d
>=
1
:
M
[
0
,
k
:
k
+
d
,
j
]
=
1
k
+=
d
encodings
=
paddle
.
matmul
(
M
,
encodings
)
else
:
durations_to_expand
=
durations
encodings
=
expand
(
encodings
,
durations_to_expand
)
shape
=
paddle
.
shape
(
encodings
)
t_dec
,
feature_size
=
shape
[
1
],
shape
[
2
]
...
...
@@ -266,7 +270,11 @@ class SpeedySpeechInference(nn.Layer):
self
.
normalizer
=
normalizer
self
.
acoustic_model
=
speedyspeech_model
def
forward
(
self
,
phones
,
tones
,
spk_id
=
None
):
normalized_mel
=
self
.
acoustic_model
.
inference
(
phones
,
tones
,
spk_id
)
def
forward
(
self
,
phones
,
tones
,
durations
=
None
,
spk_id
=
None
):
normalized_mel
=
self
.
acoustic_model
.
inference
(
phones
,
tones
,
durations
=
durations
,
spk_id
=
spk_id
)
logmel
=
self
.
normalizer
.
inverse
(
normalized_mel
)
return
logmel
return
logmel
\ No newline at end of file
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录