Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
c4a79cce
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
1 年多 前同步成功
通知
207
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
c4a79cce
编写于
12月 14, 2021
作者:
H
Hui Zhang
提交者:
GitHub
12月 14, 2021
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
[asr] update librispeech conformer result (#1116)
* update librispeech result * change conf order
上级
65f68480
变更
4
隐藏空白更改
内联
并排
Showing
4 changed file
with
105 addition
and
126 deletion
+105
-126
examples/librispeech/asr1/RESULTS.md
examples/librispeech/asr1/RESULTS.md
+8
-5
examples/librispeech/asr1/conf/chunk_conformer.yaml
examples/librispeech/asr1/conf/chunk_conformer.yaml
+33
-38
examples/librispeech/asr1/conf/chunk_transformer.yaml
examples/librispeech/asr1/conf/chunk_transformer.yaml
+31
-38
examples/librispeech/asr1/conf/conformer.yaml
examples/librispeech/asr1/conf/conformer.yaml
+33
-45
未找到文件。
examples/librispeech/asr1/RESULTS.md
浏览文件 @
c4a79cce
# LibriSpeech
## Conformer
train: Epoch 70, 4 V100-32G, best avg: 20
| Model | Params | Config | Augmentation| Test set | Decode method | Loss | WER |
| --- | --- | --- | --- | --- | --- | --- | --- |
| conformer | 47.63 M | conf/conformer.yaml | spec_aug
+ shift | test-clean | attention | 6.738649845123291 | 0.041159
|
| conformer | 47.63 M | conf/conformer.yaml | spec_aug
+ shift | test-clean | ctc_greedy_search | 6.738649845123291 | 0.039847
|
| conformer | 47.63 M | conf/conformer.yaml | spec_aug
+ shift | test-clean | ctc_prefix_beam_search | 6.738649845123291 | 0.039790
|
| conformer | 47.63 M | conf/conformer.yaml | spec_aug
+ shift | test-clean | attention_rescoring | 6.738649845123291 | 0.034617
|
| conformer | 47.63 M | conf/conformer.yaml | spec_aug
| test-clean | attention | 6.433612394332886 | 0.039771
|
| conformer | 47.63 M | conf/conformer.yaml | spec_aug
| test-clean | ctc_greedy_search | 6.433612394332886 | 0.040342
|
| conformer | 47.63 M | conf/conformer.yaml | spec_aug
| test-clean | ctc_prefix_beam_search | 6.433612394332886 | 0.040342
|
| conformer | 47.63 M | conf/conformer.yaml | spec_aug
| test-clean | attention_rescoring | 6.433612394332886 | 0.033761
|
## Chunk Conformer
| Model | Params | Config | Augmentation| Test set | Decode method | Chunk Size & Left Chunks | Loss | WER |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| conformer | 47.63 M | conf/chunk_conformer.yaml | spec_aug + shift | test-clean | attention | 16, -1 | 7.11 | 0.063193 |
...
...
@@ -20,7 +23,7 @@
## Transformer
train: Epoch 120, 4 V100-32G, 27 Day, avg: 10
train: Epoch 120, 4 V100-32G, 27 Day,
best
avg: 10
| Model | Params | Config | Augmentation| Test set | Decode method | Loss | WER |
| --- | --- | --- | --- | --- | --- | --- | --- |
...
...
examples/librispeech/asr1/conf/chunk_conformer.yaml
浏览文件 @
c4a79cce
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
0.5
max_input_len
:
30.0
min_output_len
:
0.0
max_output_len
:
400.0
min_output_input_ratio
:
0.05
max_output_input_ratio
:
100.0
collator
:
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
mean_std_filepath
:
"
"
augmentation_config
:
conf/preprocess.yaml
batch_size
:
16
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
# network architecture
model
:
cmvn_file
:
...
...
@@ -80,6 +42,39 @@ model:
length_normalized_loss
:
false
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
collator
:
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
mean_std_filepath
:
"
"
augmentation_config
:
conf/preprocess.yaml
batch_size
:
16
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
training
:
n_epoch
:
240
accum_grad
:
8
...
...
examples/librispeech/asr1/conf/chunk_transformer.yaml
浏览文件 @
c4a79cce
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
0.5
# second
max_input_len
:
30.0
# second
min_output_len
:
0.0
# tokens
max_output_len
:
400.0
# tokens
min_output_input_ratio
:
0.05
max_output_input_ratio
:
100.0
collator
:
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
mean_std_filepath
:
"
"
augmentation_config
:
conf/preprocess.yaml
batch_size
:
64
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
# network architecture
model
:
cmvn_file
:
...
...
@@ -73,6 +35,37 @@ model:
length_normalized_loss
:
false
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
collator
:
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
mean_std_filepath
:
"
"
augmentation_config
:
conf/preprocess.yaml
batch_size
:
64
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
training
:
n_epoch
:
120
accum_grad
:
1
...
...
examples/librispeech/asr1/conf/conformer.yaml
浏览文件 @
c4a79cce
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test-clean
min_input_len
:
0.5
# seconds
max_input_len
:
30.0
# seconds
min_output_len
:
0.0
# tokens
max_output_len
:
400.0
# tokens
min_output_input_ratio
:
0.05
max_output_input_ratio
:
100.0
collator
:
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
mean_std_filepath
:
"
"
augmentation_config
:
conf/preprocess.yaml
batch_size
:
16
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
# network architecture
model
:
cmvn_file
:
...
...
@@ -76,8 +38,40 @@ model:
length_normalized_loss
:
false
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test-clean
collator
:
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
mean_std_filepath
:
"
"
augmentation_config
:
conf/preprocess.yaml
batch_size
:
16
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
training
:
n_epoch
:
12
0
n_epoch
:
7
0
accum_grad
:
8
global_grad_clip
:
3.0
optim
:
adam
...
...
@@ -98,13 +92,7 @@ decoding:
batch_size
:
64
error_rate_type
:
wer
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
2.5
beta
:
0.3
beam_size
:
10
cutoff_prob
:
1.0
cutoff_top_n
:
0
num_proc_bsearch
:
8
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录