Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
mrywhh
Real-Time-Voice-Cloning
提交
3df589c1
R
Real-Time-Voice-Cloning
项目概览
mrywhh
/
Real-Time-Voice-Cloning
落后 Fork 源项目 12 个版本
从无法访问的项目Fork
通知
1
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
R
Real-Time-Voice-Cloning
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
3df589c1
编写于
3月 29, 2019
作者:
C
Corentin Jemine
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
--amend
上级
5597d67b
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
80 addition
and
5 deletion
+80
-5
notes/vocoder.txt
notes/vocoder.txt
+5
-2
tacotron2/inference_demo.py
tacotron2/inference_demo.py
+75
-3
未找到文件。
notes/vocoder.txt
浏览文件 @
3df589c1
...
...
@@ -3,9 +3,12 @@ How about a nonlinear encoding? https://i.imgur.com/xRWn6AE.png
NOTES:
- On eddard, tacotron_model.ckpt-486000 was the model used to generate GTA.
- tacotron_model.ckpt-486000 was the model used to generate GTA.
- best lost on mu_law: 2.935
TODO:
- Meanwhile, work on a rough inference demo (don't forget to show side-by-side generated and original sample)
- Pruning
- Begin merging the three projects
- Clean up the rest of the code
\ No newline at end of file
- Clean up the rest of the code
- Batch inputs?
\ No newline at end of file
tacotron2/inference_demo.py
浏览文件 @
3df589c1
...
...
@@ -5,9 +5,18 @@ from vlibs import fileio
import
sounddevice
as
sd
import
tensorflow
as
tf
import
numpy
as
np
import
sys
sys
.
path
.
append
(
'../wave-rnn'
)
from
vocoder
import
inference
as
vocoder
import
os
os
.
environ
[
'TF_CPP_MIN_LOG_LEVEL'
]
=
'3'
use_griffin_lim
=
False
if
not
use_griffin_lim
:
vocoder
.
load_model
(
'../wave-rnn/checkpoints/mu_law.pt'
)
all_embeds_fpaths
=
fileio
.
get_files
(
r
"E:\Datasets\Synthesizer\embed"
,
"embed"
)
def
get_speaker_embed
(
speaker_id
):
embed_root
=
r
"E:\Datasets\Synthesizer\embed"
embeds
=
[
np
.
load
(
f
)
for
f
in
fileio
.
get_files
(
embed_root
,
"embed-%d-"
%
speaker_id
)]
...
...
@@ -15,19 +24,82 @@ def get_speaker_embed(speaker_id):
speaker_embed
/=
np
.
linalg
.
norm
(
speaker_embed
,
2
)
return
speaker_embed
[
None
,
...]
def
get_random_embed
():
fpath
=
np
.
random
.
choice
(
all_embeds_fpaths
)
return
np
.
load
(
fpath
)[
None
,
...],
fpath
if
__name__
==
'__main__'
:
checkpoint_dir
=
os
.
path
.
join
(
'logs-two_asr'
,
'taco_pretrained'
)
checkpoint_fpath
=
tf
.
train
.
get_checkpoint_state
(
checkpoint_dir
).
model_checkpoint_path
synth
=
synthesizer
.
Synthesizer
()
synth
.
load
(
checkpoint_fpath
,
hparams
)
from
datasets.audio
import
save_wav
while
True
:
speaker_id
=
int
(
input
(
"Speaker ID: "
))
speaker_embed
=
get_speaker_embed
(
speaker_id
)
# Retrieve the embedding
# speaker_id = int(input("Speaker ID: "))
# speaker_embed = get_speaker_embed(speaker_id)
speaker_embed
,
embed_fpath
=
get_random_embed
()
print
(
embed_fpath
)
a
=
embed_fpath
[
embed_fpath
.
find
(
'embed-'
)
+
6
:]
speaker_id
=
int
(
a
[:
a
.
find
(
'-'
)])
print
(
speaker_id
)
# Synthesize the text with the embedding
text
=
input
(
"Text: "
)
mel
=
synth
.
my_synthesize
(
speaker_embed
,
text
)
wav
=
inv_mel_spectrogram
(
mel
.
T
,
hparams
)
wav
=
np
.
concatenate
((
wav
,
[
0
]
*
hparams
.
sample_rate
))
print
(
"Griffin-lim:"
)
sd
.
play
(
wav
,
16000
)
wav1
=
wav
wav
=
vocoder
.
infer_waveform
(
mel
.
T
)
wav
=
np
.
concatenate
((
wav
,
[
0
]
*
hparams
.
sample_rate
))
sd
.
wait
()
print
(
"
\n
Wave-RNN:"
)
sd
.
play
(
wav
,
16000
)
sd
.
wait
()
save_wav
(
wav1
,
"%s_%s.wav"
%
(
speaker_id
,
'griffin'
),
16000
)
save_wav
(
wav
,
"%s_%s.wav"
%
(
speaker_id
,
'wavernn'
),
16000
)
# # Synthesize the text with the embedding
# speaker_embed = get_speaker_embed(speaker_id)
#
# mel = synth.my_synthesize(speaker_embed, text)
#
# wav = inv_mel_spectrogram(mel.T, hparams)
# wav = np.concatenate((wav, [0] * hparams.sample_rate))
# print("Griffin-lim:")
# sd.play(wav, 16000)
#
# wav = vocoder.infer_waveform(mel.T)
# wav = np.concatenate((wav, [0] * hparams.sample_rate))
# sd.wait()
# print("\nWave-RNN:")
# sd.play(wav, 16000)
# sd.wait()
# # Infer the waveform of the synthsized spectrogram
# if use_griffin_lim:
# wav = inv_mel_spectrogram(mel.T, hparams)
# else:
# wav = vocoder.infer_waveform(mel.T)
# print('')
#
# # Pad the end of the waveform
# wav = np.concatenate((wav, [0] * hparams.sample_rate))
#
# # Play the audio
# sd.play(wav, 16000)
# sd.wait()
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录