Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
bb2a370b
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 2 年 前同步成功
通知
210
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
bb2a370b
编写于
12月 28, 2021
作者:
H
Hui Zhang
提交者:
GitHub
12月 28, 2021
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
[asr] remove useless conf of librispeech (#1227)
* remve useless conf * format code * update conf * update conf * update conf
上级
425b085f
变更
8
隐藏空白更改
内联
并排
Showing
8 changed file
with
76 addition
and
88 deletion
+76
-88
examples/csmsc/voc5/README.md
examples/csmsc/voc5/README.md
+3
-3
examples/librispeech/asr1/conf/chunk_conformer.yaml
examples/librispeech/asr1/conf/chunk_conformer.yaml
+16
-28
examples/librispeech/asr1/conf/chunk_transformer.yaml
examples/librispeech/asr1/conf/chunk_transformer.yaml
+16
-19
examples/librispeech/asr1/conf/conformer.yaml
examples/librispeech/asr1/conf/conformer.yaml
+16
-16
examples/librispeech/asr1/conf/transformer.yaml
examples/librispeech/asr1/conf/transformer.yaml
+14
-15
paddlespeech/s2t/exps/u2/model.py
paddlespeech/s2t/exps/u2/model.py
+2
-2
paddlespeech/t2s/frontend/zh_frontend.py
paddlespeech/t2s/frontend/zh_frontend.py
+1
-1
paddlespeech/t2s/models/fastspeech2/fastspeech2.py
paddlespeech/t2s/models/fastspeech2/fastspeech2.py
+8
-4
未找到文件。
examples/csmsc/voc5/README.md
浏览文件 @
bb2a370b
...
@@ -127,10 +127,10 @@ HiFiGAN checkpoint contains files listed below.
...
@@ -127,10 +127,10 @@ HiFiGAN checkpoint contains files listed below.
```
text
```
text
hifigan_csmsc_ckpt_0.1.1
hifigan_csmsc_ckpt_0.1.1
├── default.yaml
# default config used to train hifigan
├── default.yaml
# default config used to train hifigan
├── feats_stats.npy
# generator parameters of hifigan
├── feats_stats.npy
# generator parameters of hifigan
└── snapshot_iter_2500000.pdz # statistics used to normalize spectrogram when training hifigan
└── snapshot_iter_2500000.pdz # statistics used to normalize spectrogram when training hifigan
```
```
## Acknowledgement
## Acknowledgement
We adapted some code from https://github.com/kan-bayashi/ParallelWaveGAN.
We adapted some code from https://github.com/kan-bayashi/ParallelWaveGAN.
\ No newline at end of file
examples/librispeech/asr1/conf/chunk_conformer.yaml
浏览文件 @
bb2a370b
...
@@ -47,63 +47,51 @@ data:
...
@@ -47,63 +47,51 @@ data:
dev_manifest
:
data/manifest.dev
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
test_manifest
:
data/manifest.test
collator
:
collator
:
vocab_filepath
:
data/lang_char/vocab.txt
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
mean_std_filepath
:
"
"
mean_std_filepath
:
"
"
augmentation_config
:
conf/preprocess.yaml
augmentation_config
:
conf/preprocess.yaml
batch_size
:
16
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
stride_ms
:
10.0
window_ms
:
25.0
window_ms
:
25.0
use_dB_normalization
:
True
sortagrad
:
0
# Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
target_dB
:
-20
batch_size
:
16
random_seed
:
0
maxlen_in
:
512
# if input length > maxlen-in, batchsize is automatically reduced
keep_transcription_text
:
False
maxlen_out
:
150
# if output length > maxlen-out, batchsize is automatically reduced
sortagrad
:
True
minibatches
:
0
# for debug
shuffle_method
:
batch_shuffle
batch_count
:
auto
num_workers
:
2
batch_bins
:
0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
augmentation_config
:
conf/preprocess.yaml
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
training
:
training
:
n_epoch
:
24
0
n_epoch
:
12
0
accum_grad
:
8
accum_grad
:
8
global_grad_clip
:
5.0
global_grad_clip
:
5.0
optim
:
adam
optim
:
adam
optim_conf
:
optim_conf
:
lr
:
0.001
lr
:
0.001
weight_decay
:
1e-06
weight_decay
:
1e-06
scheduler
:
warmuplr
scheduler
:
warmuplr
scheduler_conf
:
scheduler_conf
:
warmup_steps
:
25000
warmup_steps
:
25000
lr_decay
:
1.0
log_interval
:
100
log_interval
:
100
checkpoint
:
checkpoint
:
kbest_n
:
50
kbest_n
:
50
latest_n
:
5
latest_n
:
5
decoding
:
decoding
:
batch_size
:
128
batch_size
:
128
error_rate_type
:
wer
error_rate_type
:
wer
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
2.5
beta
:
0.3
beam_size
:
10
beam_size
:
10
cutoff_prob
:
1.0
cutoff_top_n
:
0
num_proc_bsearch
:
8
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# <0: for decoding, use full chunk.
...
...
examples/librispeech/asr1/conf/chunk_transformer.yaml
浏览文件 @
bb2a370b
...
@@ -34,36 +34,35 @@ model:
...
@@ -34,36 +34,35 @@ model:
lsm_weight
:
0.1
# label smoothing option
lsm_weight
:
0.1
# label smoothing option
length_normalized_loss
:
false
length_normalized_loss
:
false
data
:
data
:
train_manifest
:
data/manifest.train
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
test_manifest
:
data/manifest.test
collator
:
collator
:
vocab_filepath
:
data/lang_char/vocab.txt
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
mean_std_filepath
:
"
"
mean_std_filepath
:
"
"
augmentation_config
:
conf/preprocess.yaml
augmentation_config
:
conf/preprocess.yaml
batch_size
:
64
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
stride_ms
:
10.0
window_ms
:
25.0
window_ms
:
25.0
use_dB_normalization
:
True
sortagrad
:
0
# Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
target_dB
:
-20
batch_size
:
64
random_seed
:
0
maxlen_in
:
512
# if input length > maxlen-in, batchsize is automatically reduced
keep_transcription_text
:
False
maxlen_out
:
150
# if output length > maxlen-out, batchsize is automatically reduced
sortagrad
:
True
minibatches
:
0
# for debug
shuffle_method
:
batch_shuffle
batch_count
:
auto
num_workers
:
2
batch_bins
:
0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
augmentation_config
:
conf/preprocess.yaml
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
training
:
training
:
...
@@ -101,6 +100,4 @@ decoding:
...
@@ -101,6 +100,4 @@ decoding:
# >0: for decoding, use fixed chunk size as set.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
true
# simulate streaming inference. Defaults to False.
simulate_streaming
:
true
# simulate streaming inference. Defaults to False.
\ No newline at end of file
examples/librispeech/asr1/conf/conformer.yaml
浏览文件 @
bb2a370b
...
@@ -34,6 +34,7 @@ model:
...
@@ -34,6 +34,7 @@ model:
# hybrid CTC/attention
# hybrid CTC/attention
model_conf
:
model_conf
:
ctc_weight
:
0.3
ctc_weight
:
0.3
ctc_grad_norm_type
:
null
lsm_weight
:
0.1
# label smoothing option
lsm_weight
:
0.1
# label smoothing option
length_normalized_loss
:
false
length_normalized_loss
:
false
...
@@ -50,25 +51,24 @@ collator:
...
@@ -50,25 +51,24 @@ collator:
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
mean_std_filepath
:
"
"
mean_std_filepath
:
"
"
augmentation_config
:
conf/preprocess.yaml
augmentation_config
:
conf/preprocess.yaml
batch_size
:
16
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
stride_ms
:
10.0
window_ms
:
25.0
window_ms
:
25.0
use_dB_normalization
:
True
sortagrad
:
0
# Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
target_dB
:
-20
batch_size
:
16
random_seed
:
0
maxlen_in
:
512
# if input length > maxlen-in, batchsize is automatically reduced
keep_transcription_text
:
False
maxlen_out
:
150
# if output length > maxlen-out, batchsize is automatically reduced
sortagrad
:
True
minibatches
:
0
# for debug
shuffle_method
:
batch_shuffle
batch_count
:
auto
num_workers
:
2
batch_bins
:
0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
augmentation_config
:
conf/preprocess.yaml
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
training
:
training
:
n_epoch
:
70
n_epoch
:
70
...
...
examples/librispeech/asr1/conf/transformer.yaml
浏览文件 @
bb2a370b
...
@@ -51,24 +51,23 @@ collator:
...
@@ -51,24 +51,23 @@ collator:
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
mean_std_filepath
:
"
"
mean_std_filepath
:
"
"
augmentation_config
:
conf/preprocess.yaml
augmentation_config
:
conf/preprocess.yaml
batch_size
:
32
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
stride_ms
:
10.0
window_ms
:
25.0
window_ms
:
25.0
use_dB_normalization
:
True
sortagrad
:
0
# Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
target_dB
:
-20
batch_size
:
32
random_seed
:
0
maxlen_in
:
512
# if input length > maxlen-in, batchsize is automatically reduced
keep_transcription_text
:
False
maxlen_out
:
150
# if output length > maxlen-out, batchsize is automatically reduced
sortagrad
:
True
minibatches
:
0
# for debug
shuffle_method
:
batch_shuffle
batch_count
:
auto
num_workers
:
2
batch_bins
:
0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
augmentation_config
:
conf/preprocess.yaml
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
training
:
training
:
...
...
paddlespeech/s2t/exps/u2/model.py
浏览文件 @
bb2a370b
...
@@ -265,7 +265,7 @@ class U2Trainer(Trainer):
...
@@ -265,7 +265,7 @@ class U2Trainer(Trainer):
batch_frames_in
=
config
.
collator
.
batch_frames_in
,
batch_frames_in
=
config
.
collator
.
batch_frames_in
,
batch_frames_out
=
config
.
collator
.
batch_frames_out
,
batch_frames_out
=
config
.
collator
.
batch_frames_out
,
batch_frames_inout
=
config
.
collator
.
batch_frames_inout
,
batch_frames_inout
=
config
.
collator
.
batch_frames_inout
,
preprocess_conf
=
config
.
collator
.
augmentation_config
,
preprocess_conf
=
config
.
collator
.
augmentation_config
,
n_iter_processes
=
config
.
collator
.
num_workers
,
n_iter_processes
=
config
.
collator
.
num_workers
,
subsampling_factor
=
1
,
subsampling_factor
=
1
,
num_encs
=
1
)
num_encs
=
1
)
...
@@ -284,7 +284,7 @@ class U2Trainer(Trainer):
...
@@ -284,7 +284,7 @@ class U2Trainer(Trainer):
batch_frames_in
=
0
,
batch_frames_in
=
0
,
batch_frames_out
=
0
,
batch_frames_out
=
0
,
batch_frames_inout
=
0
,
batch_frames_inout
=
0
,
preprocess_conf
=
config
.
collator
.
augmentation_config
,
preprocess_conf
=
config
.
collator
.
augmentation_config
,
n_iter_processes
=
config
.
collator
.
num_workers
,
n_iter_processes
=
config
.
collator
.
num_workers
,
subsampling_factor
=
1
,
subsampling_factor
=
1
,
num_encs
=
1
)
num_encs
=
1
)
...
...
paddlespeech/t2s/frontend/zh_frontend.py
浏览文件 @
bb2a370b
...
@@ -106,7 +106,7 @@ class Frontend():
...
@@ -106,7 +106,7 @@ class Frontend():
for
seg
in
segments
:
for
seg
in
segments
:
phones
=
[]
phones
=
[]
# Replace all English words in the sentence
# Replace all English words in the sentence
seg
=
re
.
sub
(
'[a-zA-Z]+'
,
''
,
seg
)
seg
=
re
.
sub
(
'[a-zA-Z]+'
,
''
,
seg
)
seg_cut
=
psg
.
lcut
(
seg
)
seg_cut
=
psg
.
lcut
(
seg
)
initials
=
[]
initials
=
[]
finals
=
[]
finals
=
[]
...
...
paddlespeech/t2s/models/fastspeech2/fastspeech2.py
浏览文件 @
bb2a370b
...
@@ -942,7 +942,12 @@ class StyleFastSpeech2Inference(FastSpeech2Inference):
...
@@ -942,7 +942,12 @@ class StyleFastSpeech2Inference(FastSpeech2Inference):
"""
"""
spk_id
=
paddle
.
to_tensor
(
spk_id
)
spk_id
=
paddle
.
to_tensor
(
spk_id
)
normalized_mel
,
d_outs
,
p_outs
,
e_outs
=
self
.
acoustic_model
.
inference
(
normalized_mel
,
d_outs
,
p_outs
,
e_outs
=
self
.
acoustic_model
.
inference
(
text
,
durations
=
None
,
pitch
=
None
,
energy
=
None
,
spk_emb
=
spk_emb
,
spk_id
=
spk_id
)
text
,
durations
=
None
,
pitch
=
None
,
energy
=
None
,
spk_emb
=
spk_emb
,
spk_id
=
spk_id
)
# priority: groundtruth > scale/bias > previous output
# priority: groundtruth > scale/bias > previous output
# set durations
# set durations
if
isinstance
(
durations
,
np
.
ndarray
):
if
isinstance
(
durations
,
np
.
ndarray
):
...
@@ -995,9 +1000,8 @@ class StyleFastSpeech2Inference(FastSpeech2Inference):
...
@@ -995,9 +1000,8 @@ class StyleFastSpeech2Inference(FastSpeech2Inference):
pitch
=
pitch
,
pitch
=
pitch
,
energy
=
energy
,
energy
=
energy
,
use_teacher_forcing
=
True
,
use_teacher_forcing
=
True
,
spk_emb
=
spk_emb
,
spk_emb
=
spk_emb
,
spk_id
=
spk_id
spk_id
=
spk_id
)
)
logmel
=
self
.
normalizer
.
inverse
(
normalized_mel
)
logmel
=
self
.
normalizer
.
inverse
(
normalized_mel
)
return
logmel
return
logmel
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录