Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
c907a8de
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 2 年 前同步成功
通知
210
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
c907a8de
编写于
12月 31, 2021
作者:
H
huangyuxin
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
change all recipes
上级
5d6494de
变更
122
展开全部
显示空白变更内容
内联
并排
Showing
122 changed file
with
2427 addition
and
2359 deletion
+2427
-2359
examples/aishell/asr0/conf/deepspeech2.yaml
examples/aishell/asr0/conf/deepspeech2.yaml
+60
-64
examples/aishell/asr0/conf/deepspeech2_online.yaml
examples/aishell/asr0/conf/deepspeech2_online.yaml
+62
-64
examples/aishell/asr0/conf/tuning/chunk_decode.yaml
examples/aishell/asr0/conf/tuning/chunk_decode.yaml
+10
-0
examples/aishell/asr0/conf/tuning/decode.yaml
examples/aishell/asr0/conf/tuning/decode.yaml
+10
-0
examples/aishell/asr0/local/test.sh
examples/aishell/asr0/local/test.sh
+6
-4
examples/aishell/asr0/local/test_export.sh
examples/aishell/asr0/local/test_export.sh
+6
-4
examples/aishell/asr0/local/test_hub_ori
examples/aishell/asr0/local/test_hub_ori
+47
-0
examples/aishell/asr0/local/test_wav.sh
examples/aishell/asr0/local/test_wav.sh
+7
-5
examples/aishell/asr0/run.sh
examples/aishell/asr0/run.sh
+4
-3
examples/aishell/asr1/conf/chunk_conformer.yaml
examples/aishell/asr1/conf/chunk_conformer.yaml
+4
-3
examples/aishell/asr1/conf/conformer.yaml
examples/aishell/asr1/conf/conformer.yaml
+3
-2
examples/aishell/asr1/conf/transformer.yaml
examples/aishell/asr1/conf/transformer.yaml
+4
-3
examples/aishell/asr1/local/align.sh
examples/aishell/asr1/local/align.sh
+1
-1
examples/aishell/asr1/local/test.sh
examples/aishell/asr1/local/test.sh
+3
-3
examples/aishell/asr1/local/test_wav.sh
examples/aishell/asr1/local/test_wav.sh
+1
-1
examples/callcenter/asr1/conf/chunk_conformer.yaml
examples/callcenter/asr1/conf/chunk_conformer.yaml
+91
-113
examples/callcenter/asr1/conf/conformer.yaml
examples/callcenter/asr1/conf/conformer.yaml
+84
-109
examples/callcenter/asr1/conf/preprocess.yaml
examples/callcenter/asr1/conf/preprocess.yaml
+1
-1
examples/callcenter/asr1/conf/tuning/chunk_decode.yaml
examples/callcenter/asr1/conf/tuning/chunk_decode.yaml
+11
-0
examples/callcenter/asr1/conf/tuning/decode.yaml
examples/callcenter/asr1/conf/tuning/decode.yaml
+13
-0
examples/callcenter/asr1/local/align.sh
examples/callcenter/asr1/local/align.sh
+6
-4
examples/callcenter/asr1/local/test.sh
examples/callcenter/asr1/local/test.sh
+11
-7
examples/callcenter/asr1/run.sh
examples/callcenter/asr1/run.sh
+3
-2
examples/librispeech/asr0/conf/deepspeech2.yaml
examples/librispeech/asr0/conf/deepspeech2.yaml
+60
-63
examples/librispeech/asr0/conf/deepspeech2_online.yaml
examples/librispeech/asr0/conf/deepspeech2_online.yaml
+62
-65
examples/librispeech/asr0/conf/tuning/chunk_decode.yaml
examples/librispeech/asr0/conf/tuning/chunk_decode.yaml
+10
-0
examples/librispeech/asr0/conf/tuning/decode.yaml
examples/librispeech/asr0/conf/tuning/decode.yaml
+10
-0
examples/librispeech/asr0/local/test.sh
examples/librispeech/asr0/local/test.sh
+6
-4
examples/librispeech/asr0/local/test_wav.sh
examples/librispeech/asr0/local/test_wav.sh
+7
-5
examples/librispeech/asr0/run.sh
examples/librispeech/asr0/run.sh
+3
-2
examples/librispeech/asr1/conf/chunk_conformer.yaml
examples/librispeech/asr1/conf/chunk_conformer.yaml
+4
-4
examples/librispeech/asr1/conf/chunk_transformer.yaml
examples/librispeech/asr1/conf/chunk_transformer.yaml
+2
-3
examples/librispeech/asr1/conf/conformer.yaml
examples/librispeech/asr1/conf/conformer.yaml
+2
-3
examples/librispeech/asr1/conf/transformer.yaml
examples/librispeech/asr1/conf/transformer.yaml
+2
-3
examples/librispeech/asr1/local/align.sh
examples/librispeech/asr1/local/align.sh
+1
-1
examples/librispeech/asr1/local/test.sh
examples/librispeech/asr1/local/test.sh
+3
-3
examples/librispeech/asr1/local/test_wav.sh
examples/librispeech/asr1/local/test_wav.sh
+1
-1
examples/librispeech/asr2/conf/decode/decode_base.yaml
examples/librispeech/asr2/conf/decode/decode_base.yaml
+11
-0
examples/librispeech/asr2/conf/transformer.yaml
examples/librispeech/asr2/conf/transformer.yaml
+70
-81
examples/librispeech/asr2/local/align.sh
examples/librispeech/asr2/local/align.sh
+7
-5
examples/librispeech/asr2/local/test.sh
examples/librispeech/asr2/local/test.sh
+6
-4
examples/librispeech/asr2/run.sh
examples/librispeech/asr2/run.sh
+5
-3
examples/other/1xt2x/aishell/conf/deepspeech2.yaml
examples/other/1xt2x/aishell/conf/deepspeech2.yaml
+60
-62
examples/other/1xt2x/aishell/conf/tuning/decode.yaml
examples/other/1xt2x/aishell/conf/tuning/decode.yaml
+10
-0
examples/other/1xt2x/aishell/local/test.sh
examples/other/1xt2x/aishell/local/test.sh
+6
-4
examples/other/1xt2x/aishell/run.sh
examples/other/1xt2x/aishell/run.sh
+2
-1
examples/other/1xt2x/baidu_en8k/conf/deepspeech2.yaml
examples/other/1xt2x/baidu_en8k/conf/deepspeech2.yaml
+60
-63
examples/other/1xt2x/baidu_en8k/conf/tuning/decode.yaml
examples/other/1xt2x/baidu_en8k/conf/tuning/decode.yaml
+10
-0
examples/other/1xt2x/baidu_en8k/local/test.sh
examples/other/1xt2x/baidu_en8k/local/test.sh
+6
-4
examples/other/1xt2x/baidu_en8k/run.sh
examples/other/1xt2x/baidu_en8k/run.sh
+2
-1
examples/other/1xt2x/librispeech/conf/deepspeech2.yaml
examples/other/1xt2x/librispeech/conf/deepspeech2.yaml
+60
-63
examples/other/1xt2x/librispeech/conf/tuning/decode.yaml
examples/other/1xt2x/librispeech/conf/tuning/decode.yaml
+10
-0
examples/other/1xt2x/librispeech/local/test.sh
examples/other/1xt2x/librispeech/local/test.sh
+6
-4
examples/other/1xt2x/librispeech/run.sh
examples/other/1xt2x/librispeech/run.sh
+2
-1
examples/other/1xt2x/src_deepspeech2x/bin/test.py
examples/other/1xt2x/src_deepspeech2x/bin/test.py
+5
-0
examples/other/1xt2x/src_deepspeech2x/models/ds2/deepspeech2.py
...es/other/1xt2x/src_deepspeech2x/models/ds2/deepspeech2.py
+6
-6
examples/other/1xt2x/src_deepspeech2x/test_model.py
examples/other/1xt2x/src_deepspeech2x/test_model.py
+29
-61
examples/ted_en_zh/st0/conf/transformer.yaml
examples/ted_en_zh/st0/conf/transformer.yaml
+89
-102
examples/ted_en_zh/st0/conf/transformer_mtl_noam.yaml
examples/ted_en_zh/st0/conf/transformer_mtl_noam.yaml
+89
-101
examples/ted_en_zh/st0/conf/tuning/decode.yaml
examples/ted_en_zh/st0/conf/tuning/decode.yaml
+11
-0
examples/ted_en_zh/st0/local/test.sh
examples/ted_en_zh/st0/local/test.sh
+7
-5
examples/ted_en_zh/st0/run.sh
examples/ted_en_zh/st0/run.sh
+2
-1
examples/ted_en_zh/st1/conf/transformer.yaml
examples/ted_en_zh/st1/conf/transformer.yaml
+89
-102
examples/ted_en_zh/st1/conf/transformer_mtl_noam.yaml
examples/ted_en_zh/st1/conf/transformer_mtl_noam.yaml
+89
-102
examples/ted_en_zh/st1/conf/tuning/decode.yaml
examples/ted_en_zh/st1/conf/tuning/decode.yaml
+12
-0
examples/ted_en_zh/st1/local/test.sh
examples/ted_en_zh/st1/local/test.sh
+7
-5
examples/ted_en_zh/st1/run.sh
examples/ted_en_zh/st1/run.sh
+2
-1
examples/timit/asr1/conf/transformer.yaml
examples/timit/asr1/conf/transformer.yaml
+80
-101
examples/timit/asr1/conf/tuning/decode.yaml
examples/timit/asr1/conf/tuning/decode.yaml
+11
-0
examples/timit/asr1/local/align.sh
examples/timit/asr1/local/align.sh
+6
-4
examples/timit/asr1/local/test.sh
examples/timit/asr1/local/test.sh
+13
-9
examples/timit/asr1/run.sh
examples/timit/asr1/run.sh
+7
-6
examples/tiny/asr0/conf/deepspeech2.yaml
examples/tiny/asr0/conf/deepspeech2.yaml
+59
-62
examples/tiny/asr0/conf/deepspeech2_online.yaml
examples/tiny/asr0/conf/deepspeech2_online.yaml
+61
-65
examples/tiny/asr0/conf/tuning/chunk_decode.yaml
examples/tiny/asr0/conf/tuning/chunk_decode.yaml
+10
-0
examples/tiny/asr0/conf/tuning/decode.yaml
examples/tiny/asr0/conf/tuning/decode.yaml
+10
-0
examples/tiny/asr0/local/test.sh
examples/tiny/asr0/local/test.sh
+6
-4
examples/tiny/asr0/run.sh
examples/tiny/asr0/run.sh
+2
-1
examples/tiny/asr1/conf/chunk_confermer.yaml
examples/tiny/asr1/conf/chunk_confermer.yaml
+92
-114
examples/tiny/asr1/conf/chunk_transformer.yaml
examples/tiny/asr1/conf/chunk_transformer.yaml
+84
-106
examples/tiny/asr1/conf/conformer.yaml
examples/tiny/asr1/conf/conformer.yaml
+36
-44
examples/tiny/asr1/conf/transformer.yaml
examples/tiny/asr1/conf/transformer.yaml
+34
-42
examples/tiny/asr1/conf/tuning/chunk_decode.yaml
examples/tiny/asr1/conf/tuning/chunk_decode.yaml
+11
-0
examples/tiny/asr1/conf/tuning/decode.yaml
examples/tiny/asr1/conf/tuning/decode.yaml
+11
-0
examples/tiny/asr1/local/align.sh
examples/tiny/asr1/local/align.sh
+6
-4
examples/tiny/asr1/local/test.sh
examples/tiny/asr1/local/test.sh
+10
-7
examples/tiny/asr1/run.sh
examples/tiny/asr1/run.sh
+3
-2
examples/wenetspeech/asr1/conf/conformer.yaml
examples/wenetspeech/asr1/conf/conformer.yaml
+85
-104
examples/wenetspeech/asr1/conf/tuning/decode.yaml
examples/wenetspeech/asr1/conf/tuning/decode.yaml
+11
-0
examples/wenetspeech/asr1/local/test.sh
examples/wenetspeech/asr1/local/test.sh
+10
-7
examples/wenetspeech/asr1/local/test_wav.sh
examples/wenetspeech/asr1/local/test_wav.sh
+8
-6
examples/wenetspeech/asr1/run.sh
examples/wenetspeech/asr1/run.sh
+4
-4
paddlespeech/s2t/exps/deepspeech2/bin/deploy/runtime.py
paddlespeech/s2t/exps/deepspeech2/bin/deploy/runtime.py
+18
-14
paddlespeech/s2t/exps/deepspeech2/bin/deploy/server.py
paddlespeech/s2t/exps/deepspeech2/bin/deploy/server.py
+18
-14
paddlespeech/s2t/exps/deepspeech2/bin/test.py
paddlespeech/s2t/exps/deepspeech2/bin/test.py
+6
-0
paddlespeech/s2t/exps/deepspeech2/bin/test_export.py
paddlespeech/s2t/exps/deepspeech2/bin/test_export.py
+6
-0
paddlespeech/s2t/exps/deepspeech2/bin/test_wav.py
paddlespeech/s2t/exps/deepspeech2/bin/test_wav.py
+13
-8
paddlespeech/s2t/exps/deepspeech2/config.py
paddlespeech/s2t/exps/deepspeech2/config.py
+0
-11
paddlespeech/s2t/exps/deepspeech2/model.py
paddlespeech/s2t/exps/deepspeech2/model.py
+65
-67
paddlespeech/s2t/exps/u2/bin/alignment.py
paddlespeech/s2t/exps/u2/bin/alignment.py
+2
-2
paddlespeech/s2t/exps/u2/bin/test.py
paddlespeech/s2t/exps/u2/bin/test.py
+2
-2
paddlespeech/s2t/exps/u2/bin/test_wav.py
paddlespeech/s2t/exps/u2/bin/test_wav.py
+3
-3
paddlespeech/s2t/exps/u2/config.py
paddlespeech/s2t/exps/u2/config.py
+5
-5
paddlespeech/s2t/exps/u2/model.py
paddlespeech/s2t/exps/u2/model.py
+9
-7
paddlespeech/s2t/exps/u2/trainer.py
paddlespeech/s2t/exps/u2/trainer.py
+29
-29
paddlespeech/s2t/exps/u2_kaldi/bin/test.py
paddlespeech/s2t/exps/u2_kaldi/bin/test.py
+4
-0
paddlespeech/s2t/exps/u2_kaldi/model.py
paddlespeech/s2t/exps/u2_kaldi/model.py
+39
-36
paddlespeech/s2t/exps/u2_st/bin/test.py
paddlespeech/s2t/exps/u2_st/bin/test.py
+8
-2
paddlespeech/s2t/exps/u2_st/config.py
paddlespeech/s2t/exps/u2_st/config.py
+5
-5
paddlespeech/s2t/exps/u2_st/model.py
paddlespeech/s2t/exps/u2_st/model.py
+56
-54
paddlespeech/s2t/io/collator.py
paddlespeech/s2t/io/collator.py
+28
-30
paddlespeech/s2t/io/dataset.py
paddlespeech/s2t/io/dataset.py
+9
-9
paddlespeech/s2t/models/ds2/deepspeech2.py
paddlespeech/s2t/models/ds2/deepspeech2.py
+7
-7
paddlespeech/s2t/models/ds2_online/deepspeech2.py
paddlespeech/s2t/models/ds2_online/deepspeech2.py
+9
-9
paddlespeech/s2t/training/cli.py
paddlespeech/s2t/training/cli.py
+1
-1
tests/benchmark/conformer/run.sh
tests/benchmark/conformer/run.sh
+3
-2
tests/benchmark/conformer/run_benchmark.sh
tests/benchmark/conformer/run_benchmark.sh
+11
-9
tests/chains/ds2/ds2_params_lite_train_infer.txt
tests/chains/ds2/ds2_params_lite_train_infer.txt
+2
-2
tests/chains/ds2/ds2_params_whole_train_infer.txt
tests/chains/ds2/ds2_params_whole_train_infer.txt
+1
-1
tests/chains/ds2/lite_train_infer.sh
tests/chains/ds2/lite_train_infer.sh
+2
-2
tests/chains/ds2/prepare.sh
tests/chains/ds2/prepare.sh
+4
-4
tests/chains/ds2/test.sh
tests/chains/ds2/test.sh
+1
-0
未找到文件。
examples/aishell/asr0/conf/deepspeech2.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
0.0
max_input_len
:
27.0
# second
min_output_len
:
0.0
max_output_len
:
.inf
min_output_input_ratio
:
0.00
max_output_input_ratio
:
.inf
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
0.0
max_input_len
:
27.0
# second
min_output_len
:
0.0
max_output_len
:
.inf
min_output_input_ratio
:
0.00
max_output_input_ratio
:
.inf
collator
:
batch_size
:
64
# one gpu
mean_std_filepath
:
data/mean_std.json
unit_type
:
char
vocab_filepath
:
data/lang_char/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
feat_dim
:
delta_delta
:
False
stride_ms
:
10.0
window_ms
:
20.0
n_fft
:
None
max_freq
:
None
target_sample_rate
:
16000
use_dB_normalization
:
True
target_dB
:
-20
dither
:
1.0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
###########################################
# Dataloader #
###########################################
batch_size
:
64
# one gpu
mean_std_filepath
:
data/mean_std.json
unit_type
:
char
vocab_filepath
:
data/lang_char/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
feat_dim
:
delta_delta
:
False
stride_ms
:
10.0
window_ms
:
20.0
n_fft
:
None
max_freq
:
None
target_sample_rate
:
16000
use_dB_normalization
:
True
target_dB
:
-20
dither
:
1.0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
model
:
num_conv_layers
:
2
num_rnn_layers
:
3
rnn_layer_size
:
1024
use_gru
:
True
share_rnn_weights
:
False
blank_id
:
0
ctc_grad_norm_type
:
instance
############################################
# Network Architecture #
############################################
num_conv_layers
:
2
num_rnn_layers
:
3
rnn_layer_size
:
1024
use_gru
:
True
share_rnn_weights
:
False
blank_id
:
0
ctc_grad_norm_type
:
instance
training
:
n_epoch
:
80
accum_grad
:
1
lr
:
2e-3
lr_decay
:
0.83
weight_decay
:
1e-06
global_grad_clip
:
3.0
log_interval
:
100
checkpoint
:
###########################################
# Training #
###########################################
n_epoch
:
80
accum_grad
:
1
lr
:
2e-3
lr_decay
:
0.83
weight_decay
:
1e-06
global_grad_clip
:
3.0
log_interval
:
100
checkpoint
:
kbest_n
:
50
latest_n
:
5
decoding
:
batch_size
:
128
error_rate_type
:
cer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/zh_giga.no_cna_cmn.prune01244.klm
alpha
:
1.9
beta
:
5.0
beam_size
:
300
cutoff_prob
:
0.99
cutoff_top_n
:
40
num_proc_bsearch
:
10
examples/aishell/asr0/conf/deepspeech2_online.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
0.0
max_input_len
:
27.0
# second
min_output_len
:
0.0
max_output_len
:
.inf
min_output_input_ratio
:
0.00
max_output_input_ratio
:
.inf
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
0.0
max_input_len
:
27.0
# second
min_output_len
:
0.0
max_output_len
:
.inf
min_output_input_ratio
:
0.00
max_output_input_ratio
:
.inf
collator
:
batch_size
:
64
# one gpu
mean_std_filepath
:
data/mean_std.json
unit_type
:
char
vocab_filepath
:
data/lang_char/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
#linear, mfcc, fbank
feat_dim
:
delta_delta
:
False
stride_ms
:
10.0
window_ms
:
20.0
n_fft
:
None
max_freq
:
None
target_sample_rate
:
16000
use_dB_normalization
:
True
target_dB
:
-20
dither
:
1.0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
0
###########################################
# Dataloader #
###########################################
batch_size
:
64
# one gpu
mean_std_filepath
:
data/mean_std.json
unit_type
:
char
vocab_filepath
:
data/lang_char/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
#linear, mfcc, fbank
feat_dim
:
delta_delta
:
False
stride_ms
:
10.0
window_ms
:
20.0
n_fft
:
None
max_freq
:
None
target_sample_rate
:
16000
use_dB_normalization
:
True
target_dB
:
-20
dither
:
1.0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
0
model
:
num_conv_layers
:
2
num_rnn_layers
:
5
rnn_layer_size
:
1024
rnn_direction
:
forward
# [forward, bidirect]
num_fc_layers
:
0
fc_layers_size_list
:
-1,
use_gru
:
False
blank_id
:
0
############################################
# Network Architecture #
############################################
num_conv_layers
:
2
num_rnn_layers
:
5
rnn_layer_size
:
1024
rnn_direction
:
forward
# [forward, bidirect]
num_fc_layers
:
0
fc_layers_size_list
:
-1,
use_gru
:
False
blank_id
:
0
training
:
n_epoch
:
65
accum_grad
:
1
lr
:
5e-4
lr_decay
:
0.93
weight_decay
:
1e-06
global_grad_clip
:
3.0
log_interval
:
100
checkpoint
:
###########################################
# Training #
###########################################
n_epoch
:
65
accum_grad
:
1
lr
:
5e-4
lr_decay
:
0.93
weight_decay
:
1e-06
global_grad_clip
:
3.0
log_interval
:
100
checkpoint
:
kbest_n
:
50
latest_n
:
5
decoding
:
batch_size
:
32
error_rate_type
:
cer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/zh_giga.no_cna_cmn.prune01244.klm
alpha
:
2.2
#1.9
beta
:
4.3
beam_size
:
300
cutoff_prob
:
0.99
cutoff_top_n
:
40
num_proc_bsearch
:
10
examples/aishell/asr0/conf/tuning/chunk_decode.yaml
0 → 100644
浏览文件 @
c907a8de
chunk_batch_size
:
32
error_rate_type
:
cer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/zh_giga.no_cna_cmn.prune01244.klm
alpha
:
2.2
#1.9
beta
:
4.3
beam_size
:
300
cutoff_prob
:
0.99
cutoff_top_n
:
40
num_proc_bsearch
:
10
examples/aishell/asr0/conf/tuning/decode.yaml
0 → 100644
浏览文件 @
c907a8de
decode_batch_size
:
128
error_rate_type
:
cer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/zh_giga.no_cna_cmn.prune01244.klm
alpha
:
1.9
beta
:
5.0
beam_size
:
300
cutoff_prob
:
0.99
cutoff_top_n
:
40
num_proc_bsearch
:
10
examples/aishell/asr0/local/test.sh
浏览文件 @
c907a8de
#!/bin/bash
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix model_type"
if
[
$#
!=
4
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix model_type"
exit
-1
fi
...
...
@@ -9,8 +9,9 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
model_type
=
$3
decode_config_path
=
$2
ckpt_prefix
=
$3
model_type
=
$4
# download language model
bash
local
/download_lm_ch.sh
...
...
@@ -21,6 +22,7 @@ fi
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--model_type
${
model_type
}
...
...
examples/aishell/asr0/local/test_export.sh
浏览文件 @
c907a8de
#!/bin/bash
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix model_type"
if
[
$#
!=
4
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix model_type"
exit
-1
fi
...
...
@@ -9,8 +9,9 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
jit_model_export_path
=
$2
model_type
=
$3
decode_config_path
=
$2
jit_model_export_path
=
$3
model_type
=
$4
# download language model
bash
local
/download_lm_ch.sh
>
/dev/null 2>&1
...
...
@@ -21,6 +22,7 @@ fi
python3
-u
${
BIN_DIR
}
/test_export.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
jit_model_export_path
}
.rsl
\
--export_path
${
jit_model_export_path
}
\
--model_type
${
model_type
}
...
...
examples/aishell/asr0/local/test_hub_ori
0 → 100755
浏览文件 @
c907a8de
#!/bin/bash
if
[
$#
!=
4
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix model_type audio_file"
exit
-1
fi
ngpu
=
$(
echo
$CUDA_VISIBLE_DEVICES
|
awk
-F
","
'{print NF}'
)
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
model_type
=
$3
audio_file
=
$4
mkdir
-p
data
wget
-nc
https://paddlespeech.bj.bcebos.com/datasets/single_wav/zh/demo_01_03.wav
-P
data/
if
[
$?
-ne
0
]
;
then
exit
1
fi
if
[
!
-f
${
audio_file
}
]
;
then
echo
"Plase input the right audio_file path"
exit
1
fi
# download language model
bash
local
/download_lm_ch.sh
if
[
$?
-ne
0
]
;
then
exit
1
fi
python3
-u
${
BIN_DIR
}
/test_hub.py
\
--nproc
${
ngpu
}
\
--config
${
config_path
}
\
--result_file
${
ckpt_prefix
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--model_type
${
model_type
}
\
--audio_file
${
audio_file
}
if
[
$?
-ne
0
]
;
then
echo
"Failed in evaluation!"
exit
1
fi
exit
0
examples/aishell/asr0/local/test_wav.sh
浏览文件 @
c907a8de
#!/bin/bash
if
[
$#
!=
4
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix model_type audio_file"
if
[
$#
!=
5
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix model_type audio_file"
exit
-1
fi
...
...
@@ -9,9 +9,10 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
model_type
=
$3
audio_file
=
$4
decode_config_path
=
$2
ckpt_prefix
=
$3
model_type
=
$4
audio_file
=
$5
mkdir
-p
data
wget
-nc
https://paddlespeech.bj.bcebos.com/datasets/single_wav/zh/demo_01_03.wav
-P
data/
...
...
@@ -33,6 +34,7 @@ fi
python3
-u
${
BIN_DIR
}
/test_wav.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--model_type
${
model_type
}
\
...
...
examples/aishell/asr0/run.sh
浏览文件 @
c907a8de
...
...
@@ -6,6 +6,7 @@ gpus=0,1,2,3
stage
=
0
stop_stage
=
100
conf_path
=
conf/deepspeech2.yaml
#conf/deepspeech2.yaml or conf/deepspeeech2_online.yaml
decode_conf_path
=
conf/tuning/decode.yaml
avg_num
=
1
model_type
=
offline
# offline or online
audio_file
=
data/demo_01_03.wav
...
...
@@ -34,7 +35,7 @@ fi
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
# test ckpt avg_n
CUDA_VISIBLE_DEVICES
=
0 ./local/test.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
${
model_type
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
0 ./local/test.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
${
model_type
}
||
exit
-1
fi
if
[
${
stage
}
-le
4
]
&&
[
${
stop_stage
}
-ge
4
]
;
then
...
...
@@ -44,11 +45,11 @@ fi
if
[
${
stage
}
-le
5
]
&&
[
${
stop_stage
}
-ge
5
]
;
then
# test export ckpt avg_n
CUDA_VISIBLE_DEVICES
=
0 ./local/test_export.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
.jit
${
model_type
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
0 ./local/test_export.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
.jit
${
model_type
}
||
exit
-1
fi
# Optionally, you can add LM and test it with runtime.
if
[
${
stage
}
-le
6
]
&&
[
${
stop_stage
}
-ge
6
]
;
then
# test a single .wav file
CUDA_VISIBLE_DEVICES
=
0 ./local/test_wav.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
${
model_type
}
${
audio_file
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
0 ./local/test_wav.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
${
model_type
}
${
audio_file
}
||
exit
-1
fi
examples/aishell/asr1/conf/chunk_conformer.yaml
浏览文件 @
c907a8de
...
...
@@ -54,8 +54,9 @@ test_manifest: data/manifest.test
###########################################
vocab_filepath
:
data/lang_char/vocab.txt
spm_model_prefix
:
'
'
unit_type
:
'
char'
augmentation
_config
:
conf/preprocess.yaml
preprocess
_config
:
conf/preprocess.yaml
feat_dim
:
80
stride_ms
:
10.0
window_ms
:
25.0
...
...
@@ -74,7 +75,7 @@ subsampling_factor: 1
num_encs
:
1
###########################################
#
t
raining #
#
T
raining #
###########################################
n_epoch
:
240
accum_grad
:
2
...
...
@@ -82,7 +83,7 @@ global_grad_clip: 5.0
optim
:
adam
optim_conf
:
lr
:
0.002
weight_decay
:
1e-6
weight_decay
:
1
.0
e-6
scheduler
:
warmuplr
scheduler_conf
:
warmup_steps
:
25000
...
...
examples/aishell/asr1/conf/conformer.yaml
浏览文件 @
c907a8de
...
...
@@ -49,8 +49,9 @@ test_manifest: data/manifest.test
# Dataloader #
###########################################
vocab_filepath
:
data/lang_char/vocab.txt
spm_model_prefix
:
'
'
unit_type
:
'
char'
augmentation
_config
:
conf/preprocess.yaml
preprocess
_config
:
conf/preprocess.yaml
feat_dim
:
80
stride_ms
:
10.0
window_ms
:
25.0
...
...
@@ -69,7 +70,7 @@ subsampling_factor: 1
num_encs
:
1
###########################################
#
training
#
#
Training
#
###########################################
n_epoch
:
240
accum_grad
:
2
...
...
examples/aishell/asr1/conf/transformer.yaml
浏览文件 @
c907a8de
...
...
@@ -46,6 +46,7 @@ test_manifest: data/manifest.test
###########################################
unit_type
:
'
char'
vocab_filepath
:
data/lang_char/vocab.txt
spm_model_prefix
:
'
'
feat_dim
:
80
stride_ms
:
10.0
window_ms
:
25.0
...
...
@@ -59,13 +60,13 @@ batch_bins: 0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
augmentation
_config
:
conf/preprocess.yaml
preprocess
_config
:
conf/preprocess.yaml
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
###########################################
#
t
raining #
#
T
raining #
###########################################
n_epoch
:
240
accum_grad
:
2
...
...
@@ -73,7 +74,7 @@ global_grad_clip: 5.0
optim
:
adam
optim_conf
:
lr
:
0.002
weight_decay
:
1e-6
weight_decay
:
1
.0
e-6
scheduler
:
warmuplr
scheduler_conf
:
warmup_steps
:
25000
...
...
examples/aishell/asr1/local/align.sh
浏览文件 @
c907a8de
...
...
@@ -21,7 +21,7 @@ mkdir -p ${output_dir}
python3
-u
${
BIN_DIR
}
/alignment.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_c
onfi
g
${
decode_config_path
}
\
--decode_c
f
g
${
decode_config_path
}
\
--result_file
${
output_dir
}
/
${
type
}
.align
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decode.decode_batch_size
${
batch_size
}
...
...
examples/aishell/asr1/local/test.sh
浏览文件 @
c907a8de
...
...
@@ -30,14 +30,14 @@ for type in attention ctc_greedy_search; do
# stream decoding only support batchsize=1
batch_size
=
1
else
batch_size
=
1
batch_size
=
64
fi
output_dir
=
${
ckpt_prefix
}
mkdir
-p
${
output_dir
}
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_c
onfi
g
${
decode_config_path
}
\
--decode_c
f
g
${
decode_config_path
}
\
--result_file
${
output_dir
}
/
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decode.decoding_method
${
type
}
\
...
...
@@ -57,7 +57,7 @@ for type in ctc_prefix_beam_search attention_rescoring; do
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_c
onfi
g
${
decode_config_path
}
\
--decode_c
f
g
${
decode_config_path
}
\
--result_file
${
output_dir
}
/
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decode.decoding_method
${
type
}
\
...
...
examples/aishell/asr1/local/test_wav.sh
浏览文件 @
c907a8de
...
...
@@ -43,7 +43,7 @@ for type in attention_rescoring; do
python3
-u
${
BIN_DIR
}
/test_wav.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_c
onfi
g
${
decode_config_path
}
\
--decode_c
f
g
${
decode_config_path
}
\
--result_file
${
output_dir
}
/
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decode.decoding_method
${
type
}
\
...
...
examples/callcenter/asr1/conf/chunk_conformer.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
0.5
max_input_len
:
20.0
# second
min_output_len
:
0.0
max_output_len
:
400.0
min_output_input_ratio
:
0.05
max_output_input_ratio
:
10.0
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
collator
:
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
char'
spm_model_prefix
:
'
'
augmentation_config
:
conf/preprocess.yaml
batch_size
:
32
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
8000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
###########################################
# Dataloader #
###########################################
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
char'
spm_model_prefix
:
'
'
preprocess_config
:
conf/preprocess.yaml
batch_size
:
32
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
8000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
# network architecture
model
:
cmvn_file
:
cmvn_file_type
:
"
json"
# encoder related
encoder
:
conformer
encoder_conf
:
############################################
# Network Architecture #
############################################
cmvn_file
:
cmvn_file_type
:
"
json"
# encoder related
encoder
:
conformer
encoder_conf
:
output_size
:
256
# dimension of attention
attention_heads
:
4
linear_units
:
2048
# the number of units of position-wise feed forward
...
...
@@ -62,9 +61,9 @@ model:
cnn_module_norm
:
'
layer_norm'
# using nn.LayerNorm makes model converge faster
use_dynamic_left_chunk
:
false
# decoder related
decoder
:
transformer
decoder_conf
:
# decoder related
decoder
:
transformer
decoder_conf
:
attention_heads
:
4
linear_units
:
2048
num_blocks
:
6
...
...
@@ -73,48 +72,27 @@ model:
self_attention_dropout_rate
:
0.0
src_attention_dropout_rate
:
0.0
# hybrid CTC/attention
model_conf
:
# hybrid CTC/attention
model_conf
:
ctc_weight
:
0.3
lsm_weight
:
0.1
# label smoothing option
length_normalized_loss
:
false
training
:
n_epoch
:
240
accum_grad
:
4
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
###########################################
# Training #
###########################################
n_epoch
:
240
accum_grad
:
4
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
lr
:
0.001
weight_decay
:
1
e-6
scheduler
:
warmuplr
scheduler_conf
:
weight_decay
:
1.0
e-6
scheduler
:
warmuplr
scheduler_conf
:
warmup_steps
:
25000
lr_decay
:
1.0
log_interval
:
100
checkpoint
:
log_interval
:
100
checkpoint
:
kbest_n
:
50
latest_n
:
5
decoding
:
batch_size
:
128
error_rate_type
:
cer
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
2.5
beta
:
0.3
beam_size
:
10
cutoff_prob
:
1.0
cutoff_top_n
:
0
num_proc_bsearch
:
8
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
true
# simulate streaming inference. Defaults to False.
examples/callcenter/asr1/conf/conformer.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
0.5
max_input_len
:
20.0
# second
min_output_len
:
0.0
max_output_len
:
400.0
min_output_input_ratio
:
0.0
max_output_input_ratio
:
.inf
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
collator
:
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
char'
spm_model_prefix
:
'
'
augmentation_config
:
conf/preprocess.yaml
batch_size
:
32
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
8
0
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
8000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-2
0
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
###########################################
# Dataloader #
###########################################
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
char'
spm_model_prefix
:
'
'
preprocess_config
:
conf/preprocess.yaml
feat_dim
:
80
stride_ms
:
10.
0
window_ms
:
25.0
sortagrad
:
0
# Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
batch_size
:
64
maxlen_in
:
512
# if input length > maxlen-in, batchsize is automatically reduced
maxlen_out
:
150
# if output length > maxlen-out, batchsize is automatically reduced
minibatches
:
0
# for debug
batch_count
:
auto
batch_bins
:
0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
# network architecture
model
:
cmvn_file
:
cmvn_file_type
:
"
json"
# encoder related
encoder
:
conformer
encoder_conf
:
############################################
# Network Architecture #
############################################
cmvn_file
:
cmvn_file_type
:
"
json"
# encoder related
encoder
:
conformer
encoder_conf
:
output_size
:
256
# dimension of attention
attention_heads
:
4
linear_units
:
2048
# the number of units of position-wise feed forward
...
...
@@ -57,9 +54,9 @@ model:
pos_enc_layer_type
:
'
rel_pos'
selfattention_layer_type
:
'
rel_selfattn'
# decoder related
decoder
:
transformer
decoder_conf
:
# decoder related
decoder
:
transformer
decoder_conf
:
attention_heads
:
4
linear_units
:
2048
num_blocks
:
6
...
...
@@ -68,50 +65,28 @@ model:
self_attention_dropout_rate
:
0.0
src_attention_dropout_rate
:
0.0
# hybrid CTC/attention
model_conf
:
# hybrid CTC/attention
model_conf
:
ctc_weight
:
0.3
lsm_weight
:
0.1
# label smoothing option
length_normalized_loss
:
false
training
:
n_epoch
:
100
# 50 will be lowest
accum_grad
:
4
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
###########################################
# Training #
###########################################
n_epoch
:
100
# 50 will be lowest
accum_grad
:
4
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
lr
:
0.002
weight_decay
:
1
e-6
scheduler
:
warmuplr
scheduler_conf
:
weight_decay
:
1.0
e-6
scheduler
:
warmuplr
scheduler_conf
:
warmup_steps
:
25000
lr_decay
:
1.0
log_interval
:
100
checkpoint
:
log_interval
:
100
checkpoint
:
kbest_n
:
50
latest_n
:
5
decoding
:
batch_size
:
128
error_rate_type
:
cer
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
2.5
beta
:
0.3
beam_size
:
10
cutoff_prob
:
1.0
cutoff_top_n
:
0
num_proc_bsearch
:
8
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
examples/callcenter/asr1/conf/preprocess.yaml
浏览文件 @
c907a8de
process
:
# extract kaldi fbank from PCM
-
type
:
fbank_kaldi
fs
:
16
000
fs
:
8
000
n_mels
:
80
n_shift
:
160
win_length
:
400
...
...
examples/callcenter/asr1/conf/tuning/chunk_decode.yaml
0 → 100644
浏览文件 @
c907a8de
decode_batch_size
:
128
error_rate_type
:
cer
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
beam_size
:
10
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
true
# simulate streaming inference. Defaults to False.
\ No newline at end of file
examples/callcenter/asr1/conf/tuning/decode.yaml
0 → 100644
浏览文件 @
c907a8de
decode_batch_size
:
128
error_rate_type
:
cer
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
beam_size
:
10
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
examples/callcenter/asr1/local/align.sh
浏览文件 @
c907a8de
#! /usr/bin/env bash
if
[
$#
!=
2
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix"
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix"
exit
-1
fi
...
...
@@ -9,7 +9,8 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
decode_config_path
=
$2
ckpt_prefix
=
$3
ckpt_name
=
$(
basename
${
ckpt_prefxi
}
)
...
...
@@ -25,9 +26,10 @@ mkdir -p ${output_dir}
python3
-u
${
BIN_DIR
}
/alignment.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
output_dir
}
/
${
type
}
.align
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
--opts
decod
e.decode_
batch_size
${
batch_size
}
if
[
$?
-ne
0
]
;
then
echo
"Failed in ctc alignment!"
...
...
examples/callcenter/asr1/local/test.sh
浏览文件 @
c907a8de
#! /usr/bin/env bash
if
[
$#
!=
2
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix"
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix"
exit
-1
fi
...
...
@@ -9,7 +9,9 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
decode_config_path
=
$2
ckpt_prefix
=
$3
ckpt_name
=
$(
basename
${
ckpt_prefxi
}
)
...
...
@@ -30,10 +32,11 @@ for type in attention ctc_greedy_search; do
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
output_dir
}
/
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decod
ing
.decoding_method
${
type
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
--opts
decod
e
.decoding_method
${
type
}
\
--opts
decod
e.decode_
batch_size
${
batch_size
}
if
[
$?
-ne
0
]
;
then
echo
"Failed in evaluation!"
...
...
@@ -49,10 +52,11 @@ for type in ctc_prefix_beam_search attention_rescoring; do
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
output_dir
}
/
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decod
ing
.decoding_method
${
type
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
--opts
decod
e
.decoding_method
${
type
}
\
--opts
decod
e.decode_
batch_size
${
batch_size
}
if
[
$?
-ne
0
]
;
then
echo
"Failed in evaluation!"
...
...
examples/callcenter/asr1/run.sh
浏览文件 @
c907a8de
...
...
@@ -6,6 +6,7 @@ gpus=0,1,2,3
stage
=
0
stop_stage
=
100
conf_path
=
conf/conformer.yaml
decode_conf_path
=
conf/tuning/decode.yaml
avg_num
=
20
source
${
MAIN_ROOT
}
/utils/parse_options.sh
||
exit
1
;
...
...
@@ -31,12 +32,12 @@ fi
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
# test ckpt avg_n
CUDA_VISIBLE_DEVICES
=
0 ./local/test.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
0 ./local/test.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
fi
if
[
${
stage
}
-le
4
]
&&
[
${
stop_stage
}
-ge
4
]
;
then
# ctc alignment of test data
CUDA_VISIBLE_DEVICES
=
0 ./local/align.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
0 ./local/align.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
fi
if
[
${
stage
}
-le
5
]
&&
[
${
stop_stage
}
-ge
5
]
;
then
...
...
examples/librispeech/asr0/conf/deepspeech2.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev-clean
test_manifest
:
data/manifest.test-clean
min_input_len
:
0.0
max_input_len
:
30.0
# second
min_output_len
:
0.0
max_output_len
:
.inf
min_output_input_ratio
:
0.00
max_output_input_ratio
:
.inf
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev-clean
test_manifest
:
data/manifest.test-clean
min_input_len
:
0.0
max_input_len
:
30.0
# second
min_output_len
:
0.0
max_output_len
:
.inf
min_output_input_ratio
:
0.00
max_output_input_ratio
:
.inf
collator
:
batch_size
:
20
mean_std_filepath
:
data/mean_std.json
unit_type
:
char
vocab_filepath
:
data/lang_char/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
20.0
delta_delta
:
False
dither
:
1.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
###########################################
# Dataloader #
###########################################
batch_size
:
20
mean_std_filepath
:
data/mean_std.json
unit_type
:
char
vocab_filepath
:
data/lang_char/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
feat_dim
:
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
20.0
delta_delta
:
False
dither
:
1.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
model
:
num_conv_layers
:
2
num_rnn_layers
:
3
rnn_layer_size
:
2048
use_gru
:
False
share_rnn_weights
:
True
blank_id
:
0
############################################
# Network Architecture #
############################################
num_conv_layers
:
2
num_rnn_layers
:
3
rnn_layer_size
:
2048
use_gru
:
False
share_rnn_weights
:
True
blank_id
:
0
training
:
n_epoch
:
50
accum_grad
:
1
lr
:
1e-3
lr_decay
:
0.83
weight_decay
:
1e-06
global_grad_clip
:
5.0
log_interval
:
100
checkpoint
:
###########################################
# Training #
###########################################
n_epoch
:
50
accum_grad
:
1
lr
:
1e-3
lr_decay
:
0.83
weight_decay
:
1e-06
global_grad_clip
:
5.0
log_interval
:
100
checkpoint
:
kbest_n
:
50
latest_n
:
5
decoding
:
batch_size
:
128
error_rate_type
:
wer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
1.9
beta
:
0.3
beam_size
:
500
cutoff_prob
:
1.0
cutoff_top_n
:
40
num_proc_bsearch
:
8
examples/librispeech/asr0/conf/deepspeech2_online.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev-clean
test_manifest
:
data/manifest.test-clean
min_input_len
:
0.0
max_input_len
:
30.0
# second
min_output_len
:
0.0
max_output_len
:
.inf
min_output_input_ratio
:
0.00
max_output_input_ratio
:
.inf
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev-clean
test_manifest
:
data/manifest.test-clean
min_input_len
:
0.0
max_input_len
:
30.0
# second
min_output_len
:
0.0
max_output_len
:
.inf
min_output_input_ratio
:
0.00
max_output_input_ratio
:
.inf
collator
:
batch_size
:
15
mean_std_filepath
:
data/mean_std.json
unit_type
:
char
vocab_filepath
:
data/lang_char/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
20.0
delta_delta
:
False
dither
:
1.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
0
###########################################
# Dataloader #
###########################################
batch_size
:
15
mean_std_filepath
:
data/mean_std.json
unit_type
:
char
vocab_filepath
:
data/lang_char/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
feat_dim
:
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
20.0
delta_delta
:
False
dither
:
1.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
0
model
:
num_conv_layers
:
2
num_rnn_layers
:
3
rnn_layer_size
:
2048
rnn_direction
:
forward
num_fc_layers
:
2
fc_layers_size_list
:
512,
256
use_gru
:
False
blank_id
:
0
############################################
# Network Architecture #
############################################
num_conv_layers
:
2
num_rnn_layers
:
3
rnn_layer_size
:
2048
rnn_direction
:
forward
num_fc_layers
:
2
fc_layers_size_list
:
512,
256
use_gru
:
False
blank_id
:
0
training
:
n_epoch
:
50
accum_grad
:
4
lr
:
1e-3
lr_decay
:
0.83
weight_decay
:
1e-06
global_grad_clip
:
5.0
log_interval
:
100
checkpoint
:
###########################################
# Training #
###########################################
n_epoch
:
50
accum_grad
:
4
lr
:
1e-3
lr_decay
:
0.83
weight_decay
:
1e-06
global_grad_clip
:
5.0
log_interval
:
100
checkpoint
:
kbest_n
:
50
latest_n
:
5
decoding
:
batch_size
:
128
error_rate_type
:
wer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
1.9
beta
:
0.3
beam_size
:
500
cutoff_prob
:
1.0
cutoff_top_n
:
40
num_proc_bsearch
:
8
examples/librispeech/asr0/conf/tuning/chunk_decode.yaml
0 → 100644
浏览文件 @
c907a8de
decode_batch_size
:
128
error_rate_type
:
wer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
1.9
beta
:
0.3
beam_size
:
500
cutoff_prob
:
1.0
cutoff_top_n
:
40
num_proc_bsearch
:
8
\ No newline at end of file
examples/librispeech/asr0/conf/tuning/decode.yaml
0 → 100644
浏览文件 @
c907a8de
decode_batch_size
:
128
error_rate_type
:
wer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
1.9
beta
:
0.3
beam_size
:
500
cutoff_prob
:
1.0
cutoff_top_n
:
40
num_proc_bsearch
:
8
\ No newline at end of file
examples/librispeech/asr0/local/test.sh
浏览文件 @
c907a8de
#!/bin/bash
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix model_type"
if
[
$#
!=
4
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix model_type"
exit
-1
fi
...
...
@@ -9,8 +9,9 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
model_type
=
$3
decode_config_path
=
$2
ckpt_prefix
=
$3
model_type
=
$4
# download language model
bash
local
/download_lm_en.sh
...
...
@@ -21,6 +22,7 @@ fi
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--model_type
${
model_type
}
...
...
examples/librispeech/asr0/local/test_wav.sh
浏览文件 @
c907a8de
#!/bin/bash
if
[
$#
!=
4
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix model_type audio_file"
if
[
$#
!=
5
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix model_type audio_file"
exit
-1
fi
...
...
@@ -9,9 +9,10 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
model_type
=
$3
audio_file
=
$4
decode_config_path
=
$2
ckpt_prefix
=
$3
model_type
=
$4
audio_file
=
$5
mkdir
-p
data
wget
-nc
https://paddlespeech.bj.bcebos.com/datasets/single_wav/en/demo_002_en.wav
-P
data/
...
...
@@ -33,6 +34,7 @@ fi
python3
-u
${
BIN_DIR
}
/test_wav.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--model_type
${
model_type
}
\
...
...
examples/librispeech/asr0/run.sh
浏览文件 @
c907a8de
...
...
@@ -6,6 +6,7 @@ gpus=0,1,2,3,4,5,6,7
stage
=
0
stop_stage
=
100
conf_path
=
conf/deepspeech2.yaml
decode_conf_path
=
conf/tuning/decode.yaml
avg_num
=
30
model_type
=
offline
audio_file
=
data/demo_002_en.wav
...
...
@@ -33,7 +34,7 @@ fi
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
# test ckpt avg_n
CUDA_VISIBLE_DEVICES
=
0 ./local/test.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
${
model_type
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
0 ./local/test.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
${
model_type
}
||
exit
-1
fi
if
[
${
stage
}
-le
4
]
&&
[
${
stop_stage
}
-ge
4
]
;
then
...
...
@@ -43,5 +44,5 @@ fi
if
[
${
stage
}
-le
5
]
&&
[
${
stop_stage
}
-ge
5
]
;
then
# test a single .wav file
CUDA_VISIBLE_DEVICES
=
0 ./local/test_wav.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
${
model_type
}
${
audio_file
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
0 ./local/test_wav.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
${
model_type
}
${
audio_file
}
||
exit
-1
fi
examples/librispeech/asr1/conf/chunk_conformer.yaml
浏览文件 @
c907a8de
...
...
@@ -57,7 +57,7 @@ vocab_filepath: data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
mean_std_filepath
:
"
"
augmentation
_config
:
conf/preprocess.yaml
preprocess
_config
:
conf/preprocess.yaml
feat_dim
:
80
stride_ms
:
10.0
window_ms
:
25.0
...
...
@@ -71,7 +71,6 @@ batch_bins: 0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
augmentation_config
:
conf/preprocess.yaml
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
...
...
@@ -85,10 +84,11 @@ global_grad_clip: 5.0
optim
:
adam
optim_conf
:
lr
:
0.001
weight_decay
:
1e-06
weight_decay
:
1
.0
e-06
scheduler
:
warmuplr
scheduler_conf
:
warmup_steps
:
25000
lr_decay
:
1.0
log_interval
:
100
checkpoint
:
kbest_n
:
50
...
...
examples/librispeech/asr1/conf/chunk_transformer.yaml
浏览文件 @
c907a8de
...
...
@@ -50,7 +50,7 @@ vocab_filepath: data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
mean_std_filepath
:
"
"
augmentation
_config
:
conf/preprocess.yaml
preprocess
_config
:
conf/preprocess.yaml
feat_dim
:
80
stride_ms
:
10.0
window_ms
:
25.0
...
...
@@ -64,7 +64,6 @@ batch_bins: 0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
augmentation_config
:
conf/preprocess.yaml
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
...
...
@@ -79,7 +78,7 @@ global_grad_clip: 5.0
optim
:
adam
optim_conf
:
lr
:
0.001
weight_decay
:
1e-06
weight_decay
:
1
.0
e-06
scheduler
:
warmuplr
scheduler_conf
:
warmup_steps
:
25000
...
...
examples/librispeech/asr1/conf/conformer.yaml
浏览文件 @
c907a8de
...
...
@@ -55,7 +55,7 @@ vocab_filepath: data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
mean_std_filepath
:
"
"
augmentation
_config
:
conf/preprocess.yaml
preprocess
_config
:
conf/preprocess.yaml
feat_dim
:
80
stride_ms
:
10.0
window_ms
:
25.0
...
...
@@ -69,7 +69,6 @@ batch_bins: 0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
augmentation_config
:
conf/preprocess.yaml
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
...
...
@@ -84,7 +83,7 @@ global_grad_clip: 3.0
optim
:
adam
optim_conf
:
lr
:
0.004
weight_decay
:
1e-06
weight_decay
:
1
.0
e-06
scheduler
:
warmuplr
scheduler_conf
:
warmup_steps
:
25000
...
...
examples/librispeech/asr1/conf/transformer.yaml
浏览文件 @
c907a8de
...
...
@@ -49,7 +49,7 @@ vocab_filepath: data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_5000'
mean_std_filepath
:
"
"
augmentation
_config
:
conf/preprocess.yaml
preprocess
_config
:
conf/preprocess.yaml
feat_dim
:
80
stride_ms
:
10.0
window_ms
:
25.0
...
...
@@ -63,7 +63,6 @@ batch_bins: 0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
augmentation_config
:
conf/preprocess.yaml
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
...
...
@@ -78,7 +77,7 @@ global_grad_clip: 5.0
optim
:
adam
optim_conf
:
lr
:
0.004
weight_decay
:
1e-06
weight_decay
:
1
.0
e-06
scheduler
:
warmuplr
scheduler_conf
:
warmup_steps
:
25000
...
...
examples/librispeech/asr1/local/align.sh
浏览文件 @
c907a8de
...
...
@@ -21,7 +21,7 @@ mkdir -p ${output_dir}
python3
-u
${
BIN_DIR
}
/alignment.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_c
onfi
g
${
decode_config_path
}
\
--decode_c
f
g
${
decode_config_path
}
\
--result_file
${
output_dir
}
/
${
type
}
.align
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decode.decode_batch_size
${
batch_size
}
...
...
examples/librispeech/asr1/local/test.sh
浏览文件 @
c907a8de
...
...
@@ -53,7 +53,7 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_c
onfi
g
${
decode_config_path
}
\
--decode_c
f
g
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decode.decoding_method
${
type
}
\
...
...
@@ -78,7 +78,7 @@ for type in ctc_greedy_search; do
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_c
onfi
g
${
decode_config_path
}
\
--decode_c
f
g
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decode.decoding_method
${
type
}
\
...
...
@@ -99,7 +99,7 @@ for type in ctc_prefix_beam_search attention_rescoring; do
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_c
onfi
g
${
decode_config_path
}
\
--decode_c
f
g
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decode.decoding_method
${
type
}
\
...
...
examples/librispeech/asr1/local/test_wav.sh
浏览文件 @
c907a8de
...
...
@@ -50,7 +50,7 @@ for type in attention_rescoring; do
python3
-u
${
BIN_DIR
}
/test_wav.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_c
onfi
g
${
decode_config_path
}
\
--decode_c
f
g
${
decode_config_path
}
\
--result_file
${
output_dir
}
/
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decode.decoding_method
${
type
}
\
...
...
examples/librispeech/asr2/conf/decode/decode_base.yaml
0 → 100644
浏览文件 @
c907a8de
decode_batch_size
:
1
error_rate_type
:
wer
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
beam_size
:
10
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
examples/librispeech/asr2/conf/transformer.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
# network architecture
model
:
cmvn_file
:
cmvn_file_type
:
"
json"
# encoder related
encoder
:
transformer
encoder_conf
:
############################################
# Network Architecture #
############################################
cmvn_file
:
cmvn_file_type
:
"
json"
# encoder related
encoder
:
transformer
encoder_conf
:
output_size
:
256
# dimension of attention
attention_heads
:
4
linear_units
:
2048
# the number of units of position-wise feed forward
...
...
@@ -16,9 +17,9 @@ model:
input_layer
:
conv2d
# encoder input type, you can chose conv2d, conv2d6 and conv2d8
normalize_before
:
true
# decoder related
decoder
:
transformer
decoder_conf
:
# decoder related
decoder
:
transformer
decoder_conf
:
attention_heads
:
4
linear_units
:
2048
num_blocks
:
6
...
...
@@ -27,45 +28,51 @@ model:
self_attention_dropout_rate
:
0.0
src_attention_dropout_rate
:
0.0
# hybrid CTC/attention
model_conf
:
# hybrid CTC/attention
model_conf
:
ctc_weight
:
0.3
lsm_weight
:
0.1
# label smoothing option
length_normalized_loss
:
false
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test-clean
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test-clean
collator
:
vocab_filepath
:
data/lang_char/train_960_unigram5000_units.txt
unit_type
:
spm
spm_model_prefix
:
data/lang_char/train_960_unigram5000
feat_dim
:
83
stride_ms
:
10.0
window_ms
:
25.0
sortagrad
:
0
# Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
batch_size
:
30
maxlen_in
:
512
# if input length > maxlen-in, batchsize is automatically reduced
maxlen_out
:
150
# if output length > maxlen-out, batchsize is automatically reduced
minibatches
:
0
# for debug
batch_count
:
auto
batch_bins
:
0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
augmentation_config
:
conf/preprocess.yaml
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
###########################################
# Dataloader #
###########################################
vocab_filepath
:
data/lang_char/train_960_unigram5000_units.txt
unit_type
:
spm
spm_model_prefix
:
data/lang_char/train_960_unigram5000
feat_dim
:
83
stride_ms
:
10.0
window_ms
:
25.0
sortagrad
:
0
# Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
batch_size
:
30
maxlen_in
:
512
# if input length > maxlen-in, batchsize is automatically reduced
maxlen_out
:
150
# if output length > maxlen-out, batchsize is automatically reduced
minibatches
:
0
# for debug
batch_count
:
auto
batch_bins
:
0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
preprocess_config
:
conf/preprocess.yaml
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
training
:
n_epoch
:
120
accum_grad
:
2
log_interval
:
100
checkpoint
:
###########################################
# Training #
###########################################
n_epoch
:
120
accum_grad
:
2
log_interval
:
1
checkpoint
:
kbest_n
:
50
latest_n
:
5
...
...
@@ -79,23 +86,5 @@ scheduler_conf:
warmup_steps
:
25000
lr_decay
:
1.0
decoding
:
batch_size
:
1
error_rate_type
:
wer
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
2.5
beta
:
0.3
beam_size
:
10
cutoff_prob
:
1.0
cutoff_top_n
:
0
num_proc_bsearch
:
8
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
examples/librispeech/asr2/local/align.sh
浏览文件 @
c907a8de
#!/bin/bash
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path dict_path ckpt_path_prefix"
if
[
$#
!=
4
]
;
then
echo
"usage:
${
0
}
config_path d
ecode_config_path d
ict_path ckpt_path_prefix"
exit
-1
fi
...
...
@@ -9,8 +9,9 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
dict_path
=
$2
ckpt_prefix
=
$3
decode_config_path
=
$2
dict_path
=
$3
ckpt_prefix
=
$4
batch_size
=
1
output_dir
=
${
ckpt_prefix
}
...
...
@@ -24,9 +25,10 @@ python3 -u ${BIN_DIR}/test.py \
--dict-path
${
dict_path
}
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result-file
${
output_dir
}
/
${
type
}
.align
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
--opts
decod
e.decode_
batch_size
${
batch_size
}
if
[
$?
-ne
0
]
;
then
echo
"Failed in ctc alignment!"
...
...
examples/librispeech/asr2/local/test.sh
浏览文件 @
c907a8de
...
...
@@ -19,8 +19,9 @@ bpeprefix=data/lang_char/${train_set}_${bpemode}${nbpe}
bpemodel
=
${
bpeprefix
}
.model
config_path
=
conf/transformer.yaml
decode_config_path
=
conf/decode/decode_base.yaml
dict
=
data/lang_char/
${
train_set
}
_
${
bpemode
}${
nbpe
}
_units.txt
ckpt_prefix
=
ckpt_prefix
=
exp/transformer/checkpoints/init
source
${
MAIN_ROOT
}
/utils/parse_options.sh
||
exit
1
;
...
...
@@ -79,11 +80,12 @@ for dmethd in attention ctc_greedy_search ctc_prefix_beam_search attention_resco
--ngpu
${
ngpu
}
\
--dict-path
${
dict
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--checkpoint_path
${
ckpt_prefix
}
\
--result-file
${
decode_dir
}
/data.JOB.json
\
--opts
decod
ing
.decoding_method
${
dmethd
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
\
--opts
data.
test_manifest
${
feat_recog_dir
}
/split
${
nj
}
/JOB/manifest.
${
rtask
}
--opts
decod
e
.decoding_method
${
dmethd
}
\
--opts
decod
e.decode_
batch_size
${
batch_size
}
\
--opts
test_manifest
${
feat_recog_dir
}
/split
${
nj
}
/JOB/manifest.
${
rtask
}
score_sclite.sh
--bpe
${
nbpe
}
--bpemodel
${
bpemodel
}
--wer
false
${
decode_dir
}
${
dict
}
...
...
examples/librispeech/asr2/run.sh
浏览文件 @
c907a8de
...
...
@@ -9,12 +9,14 @@ gpus=0,1,2,3,4,5,6,7
stage
=
0
stop_stage
=
50
conf_path
=
conf/transformer.yaml
dict_path
=
lang_char/train_960_unigram5000_units.txt
decode_conf_path
=
conf/decode/decode_base.yaml
dict_path
=
data/lang_char/train_960_unigram5000_units.txt
avg_num
=
10
source
${
MAIN_ROOT
}
/utils/parse_options.sh
||
exit
1
;
avg_ckpt
=
avg_
${
avg_num
}
avg_ckpt
=
init
ckpt
=
$(
basename
${
conf_path
}
|
awk
-F
'.'
'{print $1}'
)
echo
"checkpoint name
${
ckpt
}
"
...
...
@@ -35,7 +37,7 @@ fi
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
# attetion resocre decoder
./local/test.sh
${
conf_path
}
${
dict_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
./local/test.sh
${
conf_path
}
${
d
ecode_conf_path
}
${
d
ict_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
fi
if
[
${
stage
}
-le
4
]
&&
[
${
stop_stage
}
-ge
4
]
;
then
...
...
@@ -45,7 +47,7 @@ fi
if
[
${
stage
}
-le
5
]
&&
[
${
stop_stage
}
-ge
5
]
;
then
# ctc alignment of test data
CUDA_VISIBLE_DEVICES
=
0 ./local/align.sh
${
conf_path
}
${
dict_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
0 ./local/align.sh
${
conf_path
}
${
d
ecode_conf_path
}
${
d
ict_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
fi
if
[
${
stage
}
-le
6
]
&&
[
${
stop_stage
}
-ge
6
]
;
then
...
...
examples/other/1xt2x/aishell/conf/deepspeech2.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
0.0
max_input_len
:
27.0
# second
min_output_len
:
0.0
max_output_len
:
.inf
min_output_input_ratio
:
0.00
max_output_input_ratio
:
.inf
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
0.0
max_input_len
:
27.0
# second
min_output_len
:
0.0
max_output_len
:
.inf
min_output_input_ratio
:
0.00
max_output_input_ratio
:
.inf
collator
:
batch_size
:
64
# one gpu
mean_std_filepath
:
data/mean_std.npz
unit_type
:
char
vocab_filepath
:
data/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
feat_dim
:
delta_delta
:
False
stride_ms
:
10.0
window_ms
:
20.0
n_fft
:
None
max_freq
:
None
target_sample_rate
:
16000
use_dB_normalization
:
True
target_dB
:
-20
dither
:
1.0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
###########################################
# Dataloader #
###########################################
batch_size
:
64
# one gpu
mean_std_filepath
:
data/mean_std.npz
unit_type
:
char
vocab_filepath
:
data/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
feat_dim
:
delta_delta
:
False
stride_ms
:
10.0
window_ms
:
20.0
n_fft
:
None
max_freq
:
None
target_sample_rate
:
16000
use_dB_normalization
:
True
target_dB
:
-20
dither
:
1.0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
model
:
num_conv_layers
:
2
num_rnn_layers
:
3
rnn_layer_size
:
1024
use_gru
:
True
share_rnn_weights
:
False
blank_id
:
4333
############################################
# Network Architecture #
############################################
num_conv_layers
:
2
num_rnn_layers
:
3
rnn_layer_size
:
1024
use_gru
:
True
share_rnn_weights
:
False
blank_id
:
4333
training
:
n_epoch
:
80
accum_grad
:
1
lr
:
2e-3
lr_decay
:
0.83
weight_decay
:
1e-06
global_grad_clip
:
3.0
log_interval
:
100
checkpoint
:
###########################################
# Training #
###########################################
n_epoch
:
80
accum_grad
:
1
lr
:
2e-3
lr_decay
:
0.83
weight_decay
:
1e-06
global_grad_clip
:
3.0
log_interval
:
100
checkpoint
:
kbest_n
:
50
latest_n
:
5
decoding
:
batch_size
:
32
error_rate_type
:
cer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/zh_giga.no_cna_cmn.prune01244.klm
alpha
:
2.6
beta
:
5.0
beam_size
:
300
cutoff_prob
:
0.99
cutoff_top_n
:
40
num_proc_bsearch
:
8
examples/other/1xt2x/aishell/conf/tuning/decode.yaml
0 → 100644
浏览文件 @
c907a8de
decode_batch_size
:
32
error_rate_type
:
cer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/zh_giga.no_cna_cmn.prune01244.klm
alpha
:
2.6
beta
:
5.0
beam_size
:
300
cutoff_prob
:
0.99
cutoff_top_n
:
40
num_proc_bsearch
:
8
\ No newline at end of file
examples/other/1xt2x/aishell/local/test.sh
浏览文件 @
c907a8de
#!/bin/bash
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix model_type"
if
[
$#
!=
4
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix model_type"
exit
-1
fi
...
...
@@ -9,8 +9,9 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
model_type
=
$3
decode_config_path
=
$2
ckpt_prefix
=
$3
model_type
=
$4
# download language model
bash
local
/download_lm_ch.sh
...
...
@@ -21,6 +22,7 @@ fi
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--model_type
${
model_type
}
...
...
examples/other/1xt2x/aishell/run.sh
浏览文件 @
c907a8de
...
...
@@ -5,6 +5,7 @@ source path.sh
stage
=
0
stop_stage
=
100
conf_path
=
conf/deepspeech2.yaml
decode_conf_path
=
conf/tuning/decode.yaml
avg_num
=
1
model_type
=
offline
gpus
=
2
...
...
@@ -23,6 +24,6 @@ fi
if
[
${
stage
}
-le
1
]
&&
[
${
stop_stage
}
-ge
1
]
;
then
# test ckpt avg_n
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/test.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
v18_ckpt
}
${
model_type
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/test.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
v18_ckpt
}
${
model_type
}
||
exit
-1
fi
examples/other/1xt2x/baidu_en8k/conf/deepspeech2.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test-clean
min_input_len
:
0.0
max_input_len
:
.inf
# second
min_output_len
:
0.0
max_output_len
:
.inf
min_output_input_ratio
:
0.00
max_output_input_ratio
:
.inf
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test-clean
min_input_len
:
0.0
max_input_len
:
.inf
# second
min_output_len
:
0.0
max_output_len
:
.inf
min_output_input_ratio
:
0.00
max_output_input_ratio
:
.inf
collator
:
batch_size
:
64
# one gpu
mean_std_filepath
:
data/mean_std.npz
unit_type
:
char
vocab_filepath
:
data/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
feat_dim
:
delta_delta
:
False
stride_ms
:
10.0
window_ms
:
20.0
n_fft
:
None
max_freq
:
None
target_sample_rate
:
16000
use_dB_normalization
:
True
target_dB
:
-20
dither
:
1.0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
###########################################
# Dataloader #
###########################################
batch_size
:
64
# one gpu
mean_std_filepath
:
data/mean_std.npz
unit_type
:
char
vocab_filepath
:
data/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
feat_dim
:
delta_delta
:
False
stride_ms
:
10.0
window_ms
:
20.0
n_fft
:
None
max_freq
:
None
target_sample_rate
:
16000
use_dB_normalization
:
True
target_dB
:
-20
dither
:
1.0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
model
:
num_conv_layers
:
2
num_rnn_layers
:
3
rnn_layer_size
:
1024
use_gru
:
True
share_rnn_weights
:
False
blank_id
:
28
############################################
# Network Architecture #
############################################
num_conv_layers
:
2
num_rnn_layers
:
3
rnn_layer_size
:
1024
use_gru
:
True
share_rnn_weights
:
False
blank_id
:
28
training
:
n_epoch
:
80
accum_grad
:
1
lr
:
2e-3
lr_decay
:
0.83
weight_decay
:
1e-06
global_grad_clip
:
3.0
log_interval
:
100
checkpoint
:
###########################################
# Training #
###########################################
n_epoch
:
80
accum_grad
:
1
lr
:
2e-3
lr_decay
:
0.83
weight_decay
:
1e-06
global_grad_clip
:
3.0
log_interval
:
100
checkpoint
:
kbest_n
:
50
latest_n
:
5
decoding
:
batch_size
:
32
error_rate_type
:
wer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
1.4
beta
:
0.35
beam_size
:
500
cutoff_prob
:
1.0
cutoff_top_n
:
40
num_proc_bsearch
:
8
examples/other/1xt2x/baidu_en8k/conf/tuning/decode.yaml
0 → 100644
浏览文件 @
c907a8de
decode_batch_size
:
32
error_rate_type
:
wer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
1.4
beta
:
0.35
beam_size
:
500
cutoff_prob
:
1.0
cutoff_top_n
:
40
num_proc_bsearch
:
8
\ No newline at end of file
examples/other/1xt2x/baidu_en8k/local/test.sh
浏览文件 @
c907a8de
#!/bin/bash
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix model_type"
if
[
$#
!=
4
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix model_type"
exit
-1
fi
...
...
@@ -9,8 +9,9 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
model_type
=
$3
decode_config_path
=
$2
ckpt_prefix
=
$3
model_type
=
$4
# download language model
bash
local
/download_lm_en.sh
...
...
@@ -21,6 +22,7 @@ fi
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--model_type
${
model_type
}
...
...
examples/other/1xt2x/baidu_en8k/run.sh
浏览文件 @
c907a8de
...
...
@@ -5,6 +5,7 @@ source path.sh
stage
=
0
stop_stage
=
100
conf_path
=
conf/deepspeech2.yaml
decode_conf_path
=
conf/tuning/decode.yaml
avg_num
=
1
model_type
=
offline
gpus
=
0
...
...
@@ -23,6 +24,6 @@ fi
if
[
${
stage
}
-le
1
]
&&
[
${
stop_stage
}
-ge
1
]
;
then
# test ckpt avg_n
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/test.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
v18_ckpt
}
${
model_type
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/test.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
v18_ckpt
}
${
model_type
}
||
exit
-1
fi
examples/other/1xt2x/librispeech/conf/deepspeech2.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test-clean
min_input_len
:
0.0
max_input_len
:
1000.0
# second
min_output_len
:
0.0
max_output_len
:
.inf
min_output_input_ratio
:
0.00
max_output_input_ratio
:
.inf
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test-clean
min_input_len
:
0.0
max_input_len
:
1000.0
# second
min_output_len
:
0.0
max_output_len
:
.inf
min_output_input_ratio
:
0.00
max_output_input_ratio
:
.inf
collator
:
batch_size
:
64
# one gpu
mean_std_filepath
:
data/mean_std.npz
unit_type
:
char
vocab_filepath
:
data/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
feat_dim
:
delta_delta
:
False
stride_ms
:
10.0
window_ms
:
20.0
n_fft
:
None
max_freq
:
None
target_sample_rate
:
16000
use_dB_normalization
:
True
target_dB
:
-20
dither
:
1.0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
###########################################
# Dataloader #
###########################################
batch_size
:
64
# one gpu
mean_std_filepath
:
data/mean_std.npz
unit_type
:
char
vocab_filepath
:
data/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
feat_dim
:
delta_delta
:
False
stride_ms
:
10.0
window_ms
:
20.0
n_fft
:
None
max_freq
:
None
target_sample_rate
:
16000
use_dB_normalization
:
True
target_dB
:
-20
dither
:
1.0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
model
:
num_conv_layers
:
2
num_rnn_layers
:
3
rnn_layer_size
:
2048
use_gru
:
False
share_rnn_weights
:
True
blank_id
:
28
############################################
# Network Architecture #
############################################
num_conv_layers
:
2
num_rnn_layers
:
3
rnn_layer_size
:
2048
use_gru
:
False
share_rnn_weights
:
True
blank_id
:
28
training
:
n_epoch
:
80
accum_grad
:
1
lr
:
2e-3
lr_decay
:
0.83
weight_decay
:
1e-06
global_grad_clip
:
3.0
log_interval
:
100
checkpoint
:
###########################################
# Training #
###########################################
n_epoch
:
80
accum_grad
:
1
lr
:
2e-3
lr_decay
:
0.83
weight_decay
:
1e-06
global_grad_clip
:
3.0
log_interval
:
100
checkpoint
:
kbest_n
:
50
latest_n
:
5
decoding
:
batch_size
:
32
error_rate_type
:
wer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
2.5
beta
:
0.3
beam_size
:
500
cutoff_prob
:
1.0
cutoff_top_n
:
40
num_proc_bsearch
:
8
examples/other/1xt2x/librispeech/conf/tuning/decode.yaml
0 → 100644
浏览文件 @
c907a8de
decode_batch_size
:
32
error_rate_type
:
wer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
2.5
beta
:
0.3
beam_size
:
500
cutoff_prob
:
1.0
cutoff_top_n
:
40
num_proc_bsearch
:
8
\ No newline at end of file
examples/other/1xt2x/librispeech/local/test.sh
浏览文件 @
c907a8de
#!/bin/bash
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix model_type"
if
[
$#
!=
4
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix model_type"
exit
-1
fi
...
...
@@ -9,8 +9,9 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
model_type
=
$3
decode_config_path
=
$2
ckpt_prefix
=
$3
model_type
=
$4
# download language model
bash
local
/download_lm_en.sh
...
...
@@ -21,6 +22,7 @@ fi
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--model_type
${
model_type
}
...
...
examples/other/1xt2x/librispeech/run.sh
浏览文件 @
c907a8de
...
...
@@ -5,6 +5,7 @@ source path.sh
stage
=
0
stop_stage
=
100
conf_path
=
conf/deepspeech2.yaml
decode_conf_path
=
conf/tuning/decode.yaml
avg_num
=
1
model_type
=
offline
gpus
=
1
...
...
@@ -23,5 +24,5 @@ fi
if
[
${
stage
}
-le
1
]
&&
[
${
stop_stage
}
-ge
1
]
;
then
# test ckpt avg_n
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/test.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
v18_ckpt
}
${
model_type
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/test.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
v18_ckpt
}
${
model_type
}
||
exit
-1
fi
examples/other/1xt2x/src_deepspeech2x/bin/test.py
浏览文件 @
c907a8de
...
...
@@ -13,6 +13,7 @@
# limitations under the License.
"""Evaluation for DeepSpeech2 model."""
from
src_deepspeech2x.test_model
import
DeepSpeech2Tester
as
Tester
from
yacs.config
import
CfgNode
from
paddlespeech.s2t.exps.deepspeech2.config
import
get_cfg_defaults
from
paddlespeech.s2t.training.cli
import
default_argument_parser
...
...
@@ -44,6 +45,10 @@ if __name__ == "__main__":
config
=
get_cfg_defaults
(
args
.
model_type
)
if
args
.
config
:
config
.
merge_from_file
(
args
.
config
)
if
args
.
decode_cfg
:
decode_confs
=
CfgNode
(
new_allowed
=
True
)
decode_confs
.
merge_from_file
(
args
.
decode_cfg
)
config
.
decode
=
decode_confs
if
args
.
opts
:
config
.
merge_from_list
(
args
.
opts
)
config
.
freeze
()
...
...
examples/other/1xt2x/src_deepspeech2x/models/ds2/deepspeech2.py
浏览文件 @
c907a8de
...
...
@@ -233,11 +233,11 @@ class DeepSpeech2Model(nn.Layer):
"""
model
=
cls
(
feat_size
=
dataloader
.
collate_fn
.
feature_size
,
dict_size
=
len
(
dataloader
.
collate_fn
.
vocab_list
),
num_conv_layers
=
config
.
model
.
num_conv_layers
,
num_rnn_layers
=
config
.
model
.
num_rnn_layers
,
rnn_size
=
config
.
model
.
rnn_layer_size
,
use_gru
=
config
.
model
.
use_gru
,
share_rnn_weights
=
config
.
model
.
share_rnn_weights
)
num_conv_layers
=
config
.
num_conv_layers
,
num_rnn_layers
=
config
.
num_rnn_layers
,
rnn_size
=
config
.
rnn_layer_size
,
use_gru
=
config
.
use_gru
,
share_rnn_weights
=
config
.
share_rnn_weights
)
infos
=
Checkpoint
().
load_parameters
(
model
,
checkpoint_path
=
checkpoint_path
)
logger
.
info
(
f
"checkpoint info:
{
infos
}
"
)
...
...
@@ -250,7 +250,7 @@ class DeepSpeech2Model(nn.Layer):
Parameters
config: yacs.config.CfgNode
config
.model
config
Returns
-------
DeepSpeech2Model
...
...
examples/other/1xt2x/src_deepspeech2x/test_model.py
浏览文件 @
c907a8de
...
...
@@ -64,7 +64,7 @@ class DeepSpeech2Trainer(Trainer):
super
().
__init__
(
config
,
args
)
def
train_batch
(
self
,
batch_index
,
batch_data
,
msg
):
train_conf
=
self
.
config
.
training
train_conf
=
self
.
config
start
=
time
.
time
()
# forward
...
...
@@ -98,7 +98,7 @@ class DeepSpeech2Trainer(Trainer):
iteration_time
=
time
.
time
()
-
start
msg
+=
"train time: {:>.3f}s, "
.
format
(
iteration_time
)
msg
+=
"batch size: {}, "
.
format
(
self
.
config
.
collator
.
batch_size
)
msg
+=
"batch size: {}, "
.
format
(
self
.
config
.
batch_size
)
msg
+=
"accum: {}, "
.
format
(
train_conf
.
accum_grad
)
msg
+=
', '
.
join
(
'{}: {:>.6f}'
.
format
(
k
,
v
)
for
k
,
v
in
losses_np
.
items
())
...
...
@@ -126,7 +126,7 @@ class DeepSpeech2Trainer(Trainer):
total_loss
+=
float
(
loss
)
*
num_utts
valid_losses
[
'val_loss'
].
append
(
float
(
loss
))
if
(
i
+
1
)
%
self
.
config
.
training
.
log_interval
==
0
:
if
(
i
+
1
)
%
self
.
config
.
log_interval
==
0
:
valid_dump
=
{
k
:
np
.
mean
(
v
)
for
k
,
v
in
valid_losses
.
items
()}
valid_dump
[
'val_history_loss'
]
=
total_loss
/
num_seen_utts
...
...
@@ -146,15 +146,15 @@ class DeepSpeech2Trainer(Trainer):
def
setup_model
(
self
):
config
=
self
.
config
.
clone
()
config
.
defrost
()
config
.
model
.
feat_size
=
self
.
train_loader
.
collate_fn
.
feature_size
#config.
model.
dict_size = self.train_loader.collate_fn.vocab_size
config
.
model
.
dict_size
=
len
(
self
.
train_loader
.
collate_fn
.
vocab_list
)
config
.
feat_size
=
self
.
train_loader
.
collate_fn
.
feature_size
#config.dict_size = self.train_loader.collate_fn.vocab_size
config
.
dict_size
=
len
(
self
.
train_loader
.
collate_fn
.
vocab_list
)
config
.
freeze
()
if
self
.
args
.
model_type
==
'offline'
:
model
=
DeepSpeech2Model
.
from_config
(
config
.
model
)
model
=
DeepSpeech2Model
.
from_config
(
config
)
elif
self
.
args
.
model_type
==
'online'
:
model
=
DeepSpeech2ModelOnline
.
from_config
(
config
.
model
)
model
=
DeepSpeech2ModelOnline
.
from_config
(
config
)
else
:
raise
Exception
(
"wrong model type"
)
if
self
.
parallel
:
...
...
@@ -163,17 +163,13 @@ class DeepSpeech2Trainer(Trainer):
logger
.
info
(
f
"
{
model
}
"
)
layer_tools
.
print_params
(
model
,
logger
.
info
)
grad_clip
=
ClipGradByGlobalNormWithLog
(
config
.
training
.
global_grad_clip
)
grad_clip
=
ClipGradByGlobalNormWithLog
(
config
.
global_grad_clip
)
lr_scheduler
=
paddle
.
optimizer
.
lr
.
ExponentialDecay
(
learning_rate
=
config
.
training
.
lr
,
gamma
=
config
.
training
.
lr_decay
,
verbose
=
True
)
learning_rate
=
config
.
lr
,
gamma
=
config
.
lr_decay
,
verbose
=
True
)
optimizer
=
paddle
.
optimizer
.
Adam
(
learning_rate
=
lr_scheduler
,
parameters
=
model
.
parameters
(),
weight_decay
=
paddle
.
regularizer
.
L2Decay
(
config
.
training
.
weight_decay
),
weight_decay
=
paddle
.
regularizer
.
L2Decay
(
config
.
weight_decay
),
grad_clip
=
grad_clip
)
self
.
model
=
model
...
...
@@ -184,59 +180,59 @@ class DeepSpeech2Trainer(Trainer):
def
setup_dataloader
(
self
):
config
=
self
.
config
.
clone
()
config
.
defrost
()
config
.
collator
.
keep_transcription_text
=
False
config
.
keep_transcription_text
=
False
config
.
data
.
manifest
=
config
.
data
.
train_manifest
config
.
manifest
=
config
.
train_manifest
train_dataset
=
ManifestDataset
.
from_config
(
config
)
config
.
data
.
manifest
=
config
.
data
.
dev_manifest
config
.
manifest
=
config
.
dev_manifest
dev_dataset
=
ManifestDataset
.
from_config
(
config
)
config
.
data
.
manifest
=
config
.
data
.
test_manifest
config
.
manifest
=
config
.
test_manifest
test_dataset
=
ManifestDataset
.
from_config
(
config
)
if
self
.
parallel
:
batch_sampler
=
SortagradDistributedBatchSampler
(
train_dataset
,
batch_size
=
config
.
collator
.
batch_size
,
batch_size
=
config
.
batch_size
,
num_replicas
=
None
,
rank
=
None
,
shuffle
=
True
,
drop_last
=
True
,
sortagrad
=
config
.
collator
.
sortagrad
,
shuffle_method
=
config
.
collator
.
shuffle_method
)
sortagrad
=
config
.
sortagrad
,
shuffle_method
=
config
.
shuffle_method
)
else
:
batch_sampler
=
SortagradBatchSampler
(
train_dataset
,
shuffle
=
True
,
batch_size
=
config
.
collator
.
batch_size
,
batch_size
=
config
.
batch_size
,
drop_last
=
True
,
sortagrad
=
config
.
collator
.
sortagrad
,
shuffle_method
=
config
.
collator
.
shuffle_method
)
sortagrad
=
config
.
sortagrad
,
shuffle_method
=
config
.
shuffle_method
)
collate_fn_train
=
SpeechCollator
.
from_config
(
config
)
config
.
collator
.
augmentation_config
=
""
config
.
augmentation_config
=
""
collate_fn_dev
=
SpeechCollator
.
from_config
(
config
)
config
.
collator
.
keep_transcription_text
=
True
config
.
collator
.
augmentation_config
=
""
config
.
keep_transcription_text
=
True
config
.
augmentation_config
=
""
collate_fn_test
=
SpeechCollator
.
from_config
(
config
)
self
.
train_loader
=
DataLoader
(
train_dataset
,
batch_sampler
=
batch_sampler
,
collate_fn
=
collate_fn_train
,
num_workers
=
config
.
collator
.
num_workers
)
num_workers
=
config
.
num_workers
)
self
.
valid_loader
=
DataLoader
(
dev_dataset
,
batch_size
=
config
.
collator
.
batch_size
,
batch_size
=
config
.
batch_size
,
shuffle
=
False
,
drop_last
=
False
,
collate_fn
=
collate_fn_dev
)
self
.
test_loader
=
DataLoader
(
test_dataset
,
batch_size
=
config
.
decod
ing
.
batch_size
,
batch_size
=
config
.
decod
e
.
decode_
batch_size
,
shuffle
=
False
,
drop_last
=
False
,
collate_fn
=
collate_fn_test
)
...
...
@@ -274,7 +270,7 @@ class DeepSpeech2Tester(DeepSpeech2Trainer):
def
__init__
(
self
,
config
,
args
):
self
.
_text_featurizer
=
TextFeaturizer
(
unit_type
=
config
.
collator
.
unit_type
,
vocab_filepath
=
None
)
unit_type
=
config
.
unit_type
,
vocab
=
None
)
super
().
__init__
(
config
,
args
)
def
ordid2token
(
self
,
texts
,
texts_len
):
...
...
@@ -293,7 +289,7 @@ class DeepSpeech2Tester(DeepSpeech2Trainer):
texts
,
texts_len
,
fout
=
None
):
cfg
=
self
.
config
.
decod
ing
cfg
=
self
.
config
.
decod
e
errors_sum
,
len_refs
,
num_ins
=
0.0
,
0
,
0
errors_func
=
error_rate
.
char_errors
if
cfg
.
error_rate_type
==
'cer'
else
error_rate
.
word_errors
error_rate_func
=
error_rate
.
cer
if
cfg
.
error_rate_type
==
'cer'
else
error_rate
.
wer
...
...
@@ -399,31 +395,3 @@ class DeepSpeech2Tester(DeepSpeech2Trainer):
self
.
export
()
except
KeyboardInterrupt
:
exit
(
-
1
)
def
setup
(
self
):
"""Setup the experiment.
"""
paddle
.
set_device
(
'gpu'
if
self
.
args
.
ngpu
>
0
else
'cpu'
)
self
.
setup_output_dir
()
self
.
setup_checkpointer
()
self
.
setup_dataloader
()
self
.
setup_model
()
self
.
iteration
=
0
self
.
epoch
=
0
def
setup_output_dir
(
self
):
"""Create a directory used for output.
"""
# output dir
if
self
.
args
.
output
:
output_dir
=
Path
(
self
.
args
.
output
).
expanduser
()
output_dir
.
mkdir
(
parents
=
True
,
exist_ok
=
True
)
else
:
output_dir
=
Path
(
self
.
args
.
checkpoint_path
).
expanduser
().
parent
.
parent
output_dir
.
mkdir
(
parents
=
True
,
exist_ok
=
True
)
self
.
output_dir
=
output_dir
examples/ted_en_zh/st0/conf/transformer.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train.tiny
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
0.05
# second
max_input_len
:
30.0
# second
min_output_len
:
0.0
# tokens
max_output_len
:
400.0
# tokens
min_output_input_ratio
:
0.01
max_output_input_ratio
:
20.0
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.train.tiny
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
0.05
# second
max_input_len
:
30.0
# second
min_output_len
:
0.0
# tokens
max_output_len
:
400.0
# tokens
min_output_input_ratio
:
0.01
max_output_input_ratio
:
20.0
collator
:
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
data/lang_char/bpe_unigram_8000
mean_std_filepath
:
"
"
# augmentation_config: conf/augmentation.json
batch_size
:
10
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
###########################################
# Dataloader #
###########################################
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
data/lang_char/bpe_unigram_8000
mean_std_filepath
:
"
"
# augmentation_config: conf/augmentation.json
batch_size
:
10
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
# network architecture
model
:
cmvn_file
:
"
data/mean_std.json"
cmvn_file_type
:
"
json"
# encoder related
encoder
:
transformer
encoder_conf
:
############################################
# Network Architecture #
############################################
cmvn_file
:
"
data/mean_std.json"
cmvn_file_type
:
"
json"
# encoder related
encoder
:
transformer
encoder_conf
:
output_size
:
256
# dimension of attention
attention_heads
:
4
linear_units
:
2048
# the number of units of position-wise feed forward
...
...
@@ -53,9 +58,9 @@ model:
input_layer
:
conv2d
# encoder input type, you can chose conv2d, conv2d6 and conv2d8
normalize_before
:
true
# decoder related
decoder
:
transformer
decoder_conf
:
# decoder related
decoder
:
transformer
decoder_conf
:
attention_heads
:
4
linear_units
:
2048
num_blocks
:
6
...
...
@@ -64,46 +69,28 @@ model:
self_attention_dropout_rate
:
0.0
src_attention_dropout_rate
:
0.0
# hybrid CTC/attention
model_conf
:
# hybrid CTC/attention
model_conf
:
asr_weight
:
0.0
ctc_weight
:
0.0
lsm_weight
:
0.1
# label smoothing option
length_normalized_loss
:
false
training
:
n_epoch
:
120
accum_grad
:
2
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
###########################################
# Training #
###########################################
n_epoch
:
120
accum_grad
:
2
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
lr
:
0.004
weight_decay
:
1
e-06
scheduler
:
warmuplr
scheduler_conf
:
weight_decay
:
1.0
e-06
scheduler
:
warmuplr
scheduler_conf
:
warmup_steps
:
25000
lr_decay
:
1.0
log_interval
:
5
checkpoint
:
log_interval
:
5
checkpoint
:
kbest_n
:
50
latest_n
:
5
decoding
:
batch_size
:
5
error_rate_type
:
char-bleu
decoding_method
:
fullsentence
# 'fullsentence', 'simultaneous'
alpha
:
2.5
beta
:
0.3
beam_size
:
10
cutoff_prob
:
1.0
cutoff_top_n
:
0
num_proc_bsearch
:
8
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
examples/ted_en_zh/st0/conf/transformer_mtl_noam.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
0.05
# second
max_input_len
:
30.0
# second
min_output_len
:
0.0
# tokens
max_output_len
:
400.0
# tokens
min_output_input_ratio
:
0.01
max_output_input_ratio
:
20.0
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
0.05
# second
max_input_len
:
30.0
# second
min_output_len
:
0.0
# tokens
max_output_len
:
400.0
# tokens
min_output_input_ratio
:
0.01
max_output_input_ratio
:
20.0
collator
:
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
data/lang_char/bpe_unigram_8000
mean_std_filepath
:
"
"
# augmentation_config: conf/augmentation.json
batch_size
:
10
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
###########################################
# Dataloader #
###########################################
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
data/lang_char/bpe_unigram_8000
mean_std_filepath
:
"
"
# augmentation_config: conf/augmentation.json
batch_size
:
10
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
# network architecture
model
:
cmvn_file
:
"
data/mean_std.json"
cmvn_file_type
:
"
json"
# encoder related
encoder
:
transformer
encoder_conf
:
############################################
# Network Architecture #
############################################
cmvn_file
:
"
data/mean_std.json"
cmvn_file_type
:
"
json"
# encoder related
encoder
:
transformer
encoder_conf
:
output_size
:
256
# dimension of attention
attention_heads
:
4
linear_units
:
2048
# the number of units of position-wise feed forward
...
...
@@ -53,9 +58,9 @@ model:
input_layer
:
conv2d
# encoder input type, you can chose conv2d, conv2d6 and conv2d8
normalize_before
:
true
# decoder related
decoder
:
transformer
decoder_conf
:
# decoder related
decoder
:
transformer
decoder_conf
:
attention_heads
:
4
linear_units
:
2048
num_blocks
:
6
...
...
@@ -64,49 +69,32 @@ model:
self_attention_dropout_rate
:
0.0
src_attention_dropout_rate
:
0.0
# hybrid CTC/attention
model_conf
:
# hybrid CTC/attention
model_conf
:
asr_weight
:
0.5
ctc_weight
:
0.3
lsm_weight
:
0.1
# label smoothing option
length_normalized_loss
:
false
training
:
n_epoch
:
120
accum_grad
:
2
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
###########################################
# Training #
###########################################
n_epoch
:
120
accum_grad
:
2
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
lr
:
2.5
weight_decay
:
1
e-06
scheduler
:
noam
scheduler_conf
:
weight_decay
:
1.0
e-06
scheduler
:
noam
scheduler_conf
:
warmup_steps
:
25000
lr_decay
:
1.0
log_interval
:
50
checkpoint
:
log_interval
:
50
checkpoint
:
kbest_n
:
50
latest_n
:
5
decoding
:
batch_size
:
5
error_rate_type
:
char-bleu
decoding_method
:
fullsentence
# 'fullsentence', 'simultaneous'
alpha
:
2.5
beta
:
0.3
beam_size
:
10
word_reward
:
0.7
cutoff_prob
:
1.0
cutoff_top_n
:
0
num_proc_bsearch
:
8
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
examples/ted_en_zh/st0/conf/tuning/decode.yaml
0 → 100644
浏览文件 @
c907a8de
batch_size
:
5
error_rate_type
:
char-bleu
decoding_method
:
fullsentence
# 'fullsentence', 'simultaneous'
beam_size
:
10
word_reward
:
0.7
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
\ No newline at end of file
examples/ted_en_zh/st0/local/test.sh
浏览文件 @
c907a8de
#! /usr/bin/env bash
if
[
$#
!=
2
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix"
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix"
exit
-1
fi
...
...
@@ -9,7 +9,8 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
decode_config_path
=
$2
ckpt_prefix
=
$3
for
type
in
fullsentence
;
do
echo
"decoding
${
type
}
"
...
...
@@ -17,10 +18,11 @@ for type in fullsentence; do
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decod
ing
.decoding_method
${
type
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
--opts
decod
e
.decoding_method
${
type
}
\
--opts
decod
e.decode_
batch_size
${
batch_size
}
if
[
$?
-ne
0
]
;
then
echo
"Failed in evaluation!"
...
...
examples/ted_en_zh/st0/run.sh
浏览文件 @
c907a8de
...
...
@@ -6,6 +6,7 @@ gpus=0,1,2,3
stage
=
0
stop_stage
=
50
conf_path
=
conf/transformer_mtl_noam.yaml
decode_conf_path
=
conf/tuning/decode.yaml
avg_num
=
5
data_path
=
./TED_EnZh
# path to unzipped data
source
${
MAIN_ROOT
}
/utils/parse_options.sh
||
exit
1
;
...
...
@@ -32,7 +33,7 @@ fi
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
# test ckpt avg_n
CUDA_VISIBLE_DEVICES
=
0 ./local/test.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
0 ./local/test.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
fi
if
[
${
stage
}
-le
51
]
&&
[
${
stop_stage
}
-ge
51
]
;
then
...
...
examples/ted_en_zh/st1/conf/transformer.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train.tiny
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
5.0
# frame
max_input_len
:
3000.0
# frame
min_output_len
:
0.0
# tokens
max_output_len
:
400.0
# tokens
min_output_input_ratio
:
0.01
max_output_input_ratio
:
20.0
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.train.tiny
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
5.0
# frame
max_input_len
:
3000.0
# frame
min_output_len
:
0.0
# tokens
max_output_len
:
400.0
# tokens
min_output_input_ratio
:
0.01
max_output_input_ratio
:
20.0
collator
:
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
data/lang_char/bpe_unigram_8000
mean_std_filepath
:
"
"
# augmentation_config: conf/augmentation.json
batch_size
:
10
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
83
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
###########################################
# Dataloader #
###########################################
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
data/lang_char/bpe_unigram_8000
mean_std_filepath
:
"
"
# augmentation_config: conf/augmentation.json
batch_size
:
10
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
83
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
# network architecture
model
:
cmvn_file
:
None
cmvn_file_type
:
"
json"
# encoder related
encoder
:
transformer
encoder_conf
:
############################################
# Network Architecture #
############################################
cmvn_file
:
None
cmvn_file_type
:
"
json"
# encoder related
encoder
:
transformer
encoder_conf
:
output_size
:
256
# dimension of attention
attention_heads
:
4
linear_units
:
2048
# the number of units of position-wise feed forward
...
...
@@ -53,9 +58,9 @@ model:
input_layer
:
conv2d
# encoder input type, you can chose conv2d, conv2d6 and conv2d8
normalize_before
:
true
# decoder related
decoder
:
transformer
decoder_conf
:
# decoder related
decoder
:
transformer
decoder_conf
:
attention_heads
:
4
linear_units
:
2048
num_blocks
:
6
...
...
@@ -64,47 +69,29 @@ model:
self_attention_dropout_rate
:
0.0
src_attention_dropout_rate
:
0.0
# hybrid CTC/attention
model_conf
:
# hybrid CTC/attention
model_conf
:
asr_weight
:
0.0
ctc_weight
:
0.0
lsm_weight
:
0.1
# label smoothing option
length_normalized_loss
:
false
training
:
n_epoch
:
20
accum_grad
:
2
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
###########################################
# Training #
###########################################
n_epoch
:
20
accum_grad
:
2
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
lr
:
0.004
weight_decay
:
1
e-06
scheduler
:
warmuplr
scheduler_conf
:
weight_decay
:
1.0
e-06
scheduler
:
warmuplr
scheduler_conf
:
warmup_steps
:
25000
lr_decay
:
1.0
log_interval
:
5
checkpoint
:
log_interval
:
5
checkpoint
:
kbest_n
:
50
latest_n
:
5
decoding
:
batch_size
:
5
error_rate_type
:
char-bleu
decoding_method
:
fullsentence
# 'fullsentence', 'simultaneous'
alpha
:
2.5
beta
:
0.3
beam_size
:
10
word_reward
:
0.7
cutoff_prob
:
1.0
cutoff_top_n
:
0
num_proc_bsearch
:
8
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
examples/ted_en_zh/st1/conf/transformer_mtl_noam.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
5.0
# frame
max_input_len
:
3000.0
# frame
min_output_len
:
0.0
# tokens
max_output_len
:
400.0
# tokens
min_output_input_ratio
:
0.01
max_output_input_ratio
:
20.0
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
5.0
# frame
max_input_len
:
3000.0
# frame
min_output_len
:
0.0
# tokens
max_output_len
:
400.0
# tokens
min_output_input_ratio
:
0.01
max_output_input_ratio
:
20.0
collator
:
vocab_filepath
:
data/lang_char/ted_en_zh_bpe8000.txt
unit_type
:
'
spm'
spm_model_prefix
:
data/lang_char/ted_en_zh_bpe8000
mean_std_filepath
:
"
"
# augmentation_config: conf/augmentation.json
batch_size
:
10
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
83
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
###########################################
# Dataloader #
###########################################
vocab_filepath
:
data/lang_char/ted_en_zh_bpe8000.txt
unit_type
:
'
spm'
spm_model_prefix
:
data/lang_char/ted_en_zh_bpe8000
mean_std_filepath
:
"
"
# augmentation_config: conf/augmentation.json
batch_size
:
10
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
83
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
# network architecture
model
:
cmvn_file
:
None
cmvn_file_type
:
"
json"
# encoder related
encoder
:
transformer
encoder_conf
:
############################################
# Network Architecture #
############################################
cmvn_file
:
None
cmvn_file_type
:
"
json"
# encoder related
encoder
:
transformer
encoder_conf
:
output_size
:
256
# dimension of attention
attention_heads
:
4
linear_units
:
2048
# the number of units of position-wise feed forward
...
...
@@ -53,9 +58,9 @@ model:
input_layer
:
conv2d
# encoder input type, you can chose conv2d, conv2d6 and conv2d8
normalize_before
:
true
# decoder related
decoder
:
transformer
decoder_conf
:
# decoder related
decoder
:
transformer
decoder_conf
:
attention_heads
:
4
linear_units
:
2048
num_blocks
:
6
...
...
@@ -64,47 +69,29 @@ model:
self_attention_dropout_rate
:
0.0
src_attention_dropout_rate
:
0.0
# hybrid CTC/attention
model_conf
:
# hybrid CTC/attention
model_conf
:
asr_weight
:
0.5
ctc_weight
:
0.3
lsm_weight
:
0.1
# label smoothing option
length_normalized_loss
:
false
training
:
n_epoch
:
20
accum_grad
:
2
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
###########################################
# Training #
###########################################
n_epoch
:
20
accum_grad
:
2
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
lr
:
2.5
weight_decay
:
1
e-06
scheduler
:
noam
scheduler_conf
:
weight_decay
:
1.0
e-06
scheduler
:
noam
scheduler_conf
:
warmup_steps
:
25000
lr_decay
:
1.0
log_interval
:
5
checkpoint
:
log_interval
:
5
checkpoint
:
kbest_n
:
50
latest_n
:
5
decoding
:
batch_size
:
5
error_rate_type
:
char-bleu
decoding_method
:
fullsentence
# 'fullsentence', 'simultaneous'
alpha
:
2.5
beta
:
0.3
beam_size
:
10
word_reward
:
0.7
cutoff_prob
:
1.0
cutoff_top_n
:
0
num_proc_bsearch
:
8
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
\ No newline at end of file
examples/ted_en_zh/st1/conf/tuning/decode.yaml
0 → 100644
浏览文件 @
c907a8de
batch_size
:
5
error_rate_type
:
char-bleu
decoding_method
:
fullsentence
# 'fullsentence', 'simultaneous'
beam_size
:
10
word_reward
:
0.7
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
\ No newline at end of file
examples/ted_en_zh/st1/local/test.sh
浏览文件 @
c907a8de
#! /usr/bin/env bash
if
[
$#
!=
2
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix"
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix"
exit
-1
fi
...
...
@@ -9,7 +9,8 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
decode_config_path
=
$2
ckpt_prefix
=
$3
for
type
in
fullsentence
;
do
echo
"decoding
${
type
}
"
...
...
@@ -17,10 +18,11 @@ for type in fullsentence; do
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decod
ing
.decoding_method
${
type
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
--opts
decod
e
.decoding_method
${
type
}
\
--opts
decod
e.decode_
batch_size
${
batch_size
}
if
[
$?
-ne
0
]
;
then
echo
"Failed in evaluation!"
...
...
examples/ted_en_zh/st1/run.sh
浏览文件 @
c907a8de
...
...
@@ -7,6 +7,7 @@ gpus=0,1,2,3
stage
=
1
stop_stage
=
4
conf_path
=
conf/transformer_mtl_noam.yaml
decode_conf_path
=
conf/tuning/decode.yaml
ckpt_path
=
# paddle.98 # (finetune from FAT-ST pretrained model)
avg_num
=
5
data_path
=
./TED_EnZh
# path to unzipped data
...
...
@@ -38,5 +39,5 @@ fi
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
# test ckpt avg_n
CUDA_VISIBLE_DEVICES
=
0 ./local/test.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
0 ./local/test.sh
${
conf_path
}
${
decode_conf_pat
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
fi
\ No newline at end of file
examples/timit/asr1/conf/transformer.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
0.0
# second
max_input_len
:
10.0
# second
min_output_len
:
0.0
# tokens
max_output_len
:
150.0
# tokens
min_output_input_ratio
:
0.005
max_output_input_ratio
:
1000.0
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
collator
:
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
"
word"
mean_std_filepath
:
"
"
augmentation_config
:
conf/preprocess.yaml
batch_size
:
64
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
###########################################
# Dataloader #
###########################################
vocab_filepath
:
data/lang_char/vocab.txt
spm_model_prefix
:
'
'
unit_type
:
"
word"
mean_std_filepath
:
"
"
preprocess_config
:
conf/preprocess.yaml
feat_dim
:
80
stride_ms
:
10.0
window_ms
:
25.0
sortagrad
:
0
# Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
batch_size
:
64
maxlen_in
:
512
# if input length > maxlen-in, batchsize is automatically reduced
maxlen_out
:
150
# if output length > maxlen-out, batchsize is automatically reduced
minibatches
:
0
# for debug
batch_count
:
auto
batch_bins
:
0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
# network architecture
model
:
cmvn_file
:
cmvn_file_type
:
"
json"
# encoder related
encoder
:
transformer
encoder_conf
:
############################################
# Network Architecture #
############################################
cmvn_file
:
cmvn_file_type
:
"
json"
# encoder related
encoder
:
transformer
encoder_conf
:
output_size
:
128
# dimension of attention
attention_heads
:
4
linear_units
:
1024
# the number of units of position-wise feed forward
...
...
@@ -52,9 +50,9 @@ model:
input_layer
:
conv2d
# encoder input type, you can chose conv2d, conv2d6 and conv2d8
normalize_before
:
true
# decoder related
decoder
:
transformer
decoder_conf
:
# decoder related
decoder
:
transformer
decoder_conf
:
attention_heads
:
4
linear_units
:
1024
num_blocks
:
6
...
...
@@ -63,48 +61,29 @@ model:
self_attention_dropout_rate
:
0.0
src_attention_dropout_rate
:
0.0
# hybrid CTC/attention
model_conf
:
# hybrid CTC/attention
model_conf
:
ctc_weight
:
0.5
lsm_weight
:
0.1
# label smoothing option
length_normalized_loss
:
false
training
:
n_epoch
:
50
accum_grad
:
1
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
###########################################
# Training #
###########################################
n_epoch
:
50
accum_grad
:
1
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
lr
:
0.004
weight_decay
:
1e-0
6
scheduler
:
warmuplr
scheduler_conf
:
weight_decay
:
1.0e-
6
scheduler
:
warmuplr
scheduler_conf
:
warmup_steps
:
1200
lr_decay
:
1.0
log_interval
:
10
checkpoint
:
log_interval
:
10
checkpoint
:
kbest_n
:
50
latest_n
:
5
decoding
:
batch_size
:
64
error_rate_type
:
wer
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
2.5
beta
:
0.3
beam_size
:
10
cutoff_prob
:
1.0
cutoff_top_n
:
0
num_proc_bsearch
:
8
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
examples/timit/asr1/conf/tuning/decode.yaml
0 → 100644
浏览文件 @
c907a8de
decode_batch_size
:
64
error_rate_type
:
wer
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
beam_size
:
10
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
examples/timit/asr1/local/align.sh
浏览文件 @
c907a8de
#!/bin/bash
if
[
$#
!=
2
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix"
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix"
exit
-1
fi
...
...
@@ -9,7 +9,8 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
decode_config_path
=
$2
ckpt_prefix
=
$3
batch_size
=
1
output_dir
=
${
ckpt_prefix
}
...
...
@@ -20,9 +21,10 @@ mkdir -p ${output_dir}
python3
-u
${
BIN_DIR
}
/alignment.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
output_dir
}
/
${
type
}
.align
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
--opts
decod
e.decode_
batch_size
${
batch_size
}
if
[
$?
-ne
0
]
;
then
echo
"Failed in ctc alignment!"
...
...
examples/timit/asr1/local/test.sh
浏览文件 @
c907a8de
...
...
@@ -7,8 +7,8 @@ stop_stage=50
.
${
MAIN_ROOT
}
/utils/parse_options.sh
||
exit
1
;
if
[
$#
!=
2
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix"
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix"
exit
-1
fi
...
...
@@ -17,7 +17,8 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
decode_config_path
=
$2
ckpt_prefix
=
$3
chunk_mode
=
false
if
[[
${
config_path
}
=
~ ^.
*
chunk_.
*
yaml
$
]]
;
then
...
...
@@ -43,10 +44,11 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decod
ing
.decoding_method
${
type
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
--opts
decod
e
.decoding_method
${
type
}
\
--opts
decod
e.decode_
batch_size
${
batch_size
}
if
[
$?
-ne
0
]
;
then
echo
"Failed in evaluation!"
...
...
@@ -63,10 +65,11 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decod
ing
.decoding_method
${
type
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
--opts
decod
e
.decoding_method
${
type
}
\
--opts
decod
e.decode_
batch_size
${
batch_size
}
if
[
$?
-ne
0
]
;
then
echo
"Failed in evaluation!"
...
...
@@ -82,10 +85,11 @@ if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decod
ing
.decoding_method
${
type
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
--opts
decod
e
.decoding_method
${
type
}
\
--opts
decod
e.decode_
batch_size
${
batch_size
}
if
[
$?
-ne
0
]
;
then
echo
"Failed in evaluation!"
...
...
examples/timit/asr1/run.sh
浏览文件 @
c907a8de
...
...
@@ -7,6 +7,7 @@ gpus=0,1,2,3
stage
=
0
stop_stage
=
50
conf_path
=
conf/transformer.yaml
decode_conf_path
=
conf/tuning/decode.yaml
avg_num
=
10
TIMIT_path
=
/path/to/TIMIT
...
...
@@ -34,15 +35,15 @@ fi
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
# test ckpt avg_n
CUDA_VISIBLE_DEVICES
=
0 ./local/test.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
0 ./local/test.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
fi
if
[
${
stage
}
-le
4
]
&&
[
${
stop_stage
}
-ge
4
]
;
then
# ctc alignment of test data
CUDA_VISIBLE_DEVICES
=
0 ./local/align.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
0 ./local/align.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
fi
# if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5
]; then
#
# export ckpt avg_n
#
CUDA_VISIBLE_DEVICES= ./local/export.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} exp/${ckpt}/checkpoints/${avg_ckpt}.jit
#
fi
if
[
${
stage
}
-le
51
]
&&
[
${
stop_stage
}
-ge
51
]
;
then
# export ckpt avg_n
CUDA_VISIBLE_DEVICES
=
./local/export.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
.jit
fi
examples/tiny/asr0/conf/deepspeech2.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.tiny
dev_manifest
:
data/manifest.tiny
test_manifest
:
data/manifest.tiny
min_input_len
:
0.0
max_input_len
:
30.0
min_output_len
:
0.0
max_output_len
:
400.0
min_output_input_ratio
:
0.05
max_output_input_ratio
:
10.0
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.tiny
dev_manifest
:
data/manifest.tiny
test_manifest
:
data/manifest.tiny
min_input_len
:
0.0
max_input_len
:
30.0
min_output_len
:
0.0
max_output_len
:
400.0
min_output_input_ratio
:
0.05
max_output_input_ratio
:
10.0
collator
:
mean_std_filepath
:
data/mean_std.json
unit_type
:
char
vocab_filepath
:
data/lang_char/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
feat_dim
:
delta_delta
:
False
stride_ms
:
10.0
window_ms
:
20.0
n_fft
:
None
max_freq
:
None
target_sample_rate
:
16000
use_dB_normalization
:
True
target_dB
:
-20
dither
:
1.0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
batch_size
:
4
###########################################
# Dataloader #
###########################################
mean_std_filepath
:
data/mean_std.json
unit_type
:
char
vocab_filepath
:
data/lang_char/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
feat_dim
:
delta_delta
:
False
stride_ms
:
10.0
window_ms
:
20.0
n_fft
:
None
max_freq
:
None
target_sample_rate
:
16000
use_dB_normalization
:
True
target_dB
:
-20
dither
:
1.0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
batch_size
:
4
model
:
num_conv_layers
:
2
num_rnn_layers
:
3
rnn_layer_size
:
2048
use_gru
:
False
share_rnn_weights
:
True
blank_id
:
0
############################################
# Network Architecture #
############################################
num_conv_layers
:
2
num_rnn_layers
:
3
rnn_layer_size
:
2048
use_gru
:
False
share_rnn_weights
:
True
blank_id
:
0
training
:
n_epoch
:
5
accum_grad
:
1
lr
:
1e-5
lr_decay
:
0.8
weight_decay
:
1e-06
global_grad_clip
:
5.0
log_interval
:
1
checkpoint
:
###########################################
# Training #
###########################################
n_epoch
:
5
accum_grad
:
1
lr
:
1e-5
lr_decay
:
0.8
weight_decay
:
1e-06
global_grad_clip
:
5.0
log_interval
:
1
checkpoint
:
kbest_n
:
3
latest_n
:
2
decoding
:
batch_size
:
128
error_rate_type
:
wer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
2.5
beta
:
0.3
beam_size
:
500
cutoff_prob
:
1.0
cutoff_top_n
:
40
num_proc_bsearch
:
8
examples/tiny/asr0/conf/deepspeech2_online.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.tiny
dev_manifest
:
data/manifest.tiny
test_manifest
:
data/manifest.tiny
min_input_len
:
0.0
max_input_len
:
30.0
min_output_len
:
0.0
max_output_len
:
400.0
min_output_input_ratio
:
0.05
max_output_input_ratio
:
10.0
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.tiny
dev_manifest
:
data/manifest.tiny
test_manifest
:
data/manifest.tiny
min_input_len
:
0.0
max_input_len
:
30.0
min_output_len
:
0.0
max_output_len
:
400.0
min_output_input_ratio
:
0.05
max_output_input_ratio
:
10.0
collator
:
mean_std_filepath
:
data/mean_std.json
unit_type
:
char
vocab_filepath
:
data/lang_char/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
feat_dim
:
delta_delta
:
False
stride_ms
:
10.0
window_ms
:
20.0
n_fft
:
None
max_freq
:
None
target_sample_rate
:
16000
use_dB_normalization
:
True
target_dB
:
-20
dither
:
1.0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
0
batch_size
:
4
###########################################
# Dataloader #
###########################################
mean_std_filepath
:
data/mean_std.json
unit_type
:
char
vocab_filepath
:
data/lang_char/vocab.txt
augmentation_config
:
conf/augmentation.json
random_seed
:
0
spm_model_prefix
:
spectrum_type
:
linear
feat_dim
:
delta_delta
:
False
stride_ms
:
10.0
window_ms
:
20.0
n_fft
:
None
max_freq
:
None
target_sample_rate
:
16000
use_dB_normalization
:
True
target_dB
:
-20
dither
:
1.0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
0
batch_size
:
4
model
:
num_conv_layers
:
2
num_rnn_layers
:
4
rnn_layer_size
:
2048
rnn_direction
:
forward
num_fc_layers
:
2
fc_layers_size_list
:
512,
256
use_gru
:
True
blank_id
:
0
############################################
# Network Architecture #
############################################
num_conv_layers
:
2
num_rnn_layers
:
4
rnn_layer_size
:
2048
rnn_direction
:
forward
num_fc_layers
:
2
fc_layers_size_list
:
512,
256
use_gru
:
True
blank_id
:
0
training
:
n_epoch
:
5
accum_grad
:
1
lr
:
1e-5
lr_decay
:
1.0
weight_decay
:
1e-06
global_grad_clip
:
5.0
log_interval
:
1
checkpoint
:
###########################################
# Training #
###########################################
n_epoch
:
5
accum_grad
:
1
lr
:
1e-5
lr_decay
:
1.0
weight_decay
:
1e-06
global_grad_clip
:
5.0
log_interval
:
1
checkpoint
:
kbest_n
:
3
latest_n
:
2
decoding
:
batch_size
:
128
error_rate_type
:
wer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
2.5
beta
:
0.3
beam_size
:
500
cutoff_prob
:
1.0
cutoff_top_n
:
40
num_proc_bsearch
:
8
examples/tiny/asr0/conf/tuning/chunk_decode.yaml
0 → 100644
浏览文件 @
c907a8de
decode_batch_size
:
128
error_rate_type
:
wer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
2.5
beta
:
0.3
beam_size
:
500
cutoff_prob
:
1.0
cutoff_top_n
:
40
num_proc_bsearch
:
8
examples/tiny/asr0/conf/tuning/decode.yaml
0 → 100644
浏览文件 @
c907a8de
decode_batch_size
:
128
error_rate_type
:
wer
decoding_method
:
ctc_beam_search
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
2.5
beta
:
0.3
beam_size
:
500
cutoff_prob
:
1.0
cutoff_top_n
:
40
num_proc_bsearch
:
8
examples/tiny/asr0/local/test.sh
浏览文件 @
c907a8de
#!/bin/bash
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix model_type"
if
[
$#
!=
4
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix model_type"
exit
-1
fi
...
...
@@ -9,8 +9,9 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
model_type
=
$3
decode_config_path
=
$2
ckpt_prefix
=
$3
model_type
=
$4
# download language model
bash
local
/download_lm_en.sh
...
...
@@ -21,6 +22,7 @@ fi
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--model_type
${
model_type
}
...
...
examples/tiny/asr0/run.sh
浏览文件 @
c907a8de
...
...
@@ -6,6 +6,7 @@ gpus=0
stage
=
0
stop_stage
=
100
conf_path
=
conf/deepspeech2.yaml
decode_conf_path
=
conf/tuning/decode.yaml
avg_num
=
1
model_type
=
offline
...
...
@@ -32,7 +33,7 @@ fi
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
# test ckpt avg_n
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/test.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
${
model_type
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/test.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
${
model_type
}
||
exit
-1
fi
if
[
${
stage
}
-le
4
]
&&
[
${
stop_stage
}
-ge
4
]
;
then
...
...
examples/tiny/asr1/conf/chunk_confermer.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.tiny
dev_manifest
:
data/manifest.tiny
test_manifest
:
data/manifest.tiny
min_input_len
:
0.5
# second
max_input_len
:
30.0
# second
min_output_len
:
0.0
# tokens
max_output_len
:
400.0
# tokens
min_output_input_ratio
:
0.05
max_output_input_ratio
:
10.0
collator
:
mean_std_filepath
:
"
"
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_200'
augmentation_config
:
conf/preprocess.yaml
batch_size
:
4
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
# network architecture
model
:
cmvn_file
:
"
data/mean_std.json"
cmvn_file_type
:
"
json"
# encoder related
encoder
:
conformer
encoder_conf
:
############################################
# Network Architecture #
############################################
cmvn_file
:
"
data/mean_std.json"
cmvn_file_type
:
"
json"
# encoder related
encoder
:
conformer
encoder_conf
:
output_size
:
256
# dimension of attention
attention_heads
:
4
linear_units
:
2048
# the number of units of position-wise feed forward
...
...
@@ -62,9 +25,9 @@ model:
cnn_module_norm
:
'
layer_norm'
# using nn.LayerNorm makes model converge faster
use_dynamic_left_chunk
:
false
# decoder related
decoder
:
transformer
decoder_conf
:
# decoder related
decoder
:
transformer
decoder_conf
:
attention_heads
:
4
linear_units
:
2048
num_blocks
:
6
...
...
@@ -73,48 +36,63 @@ model:
self_attention_dropout_rate
:
0.0
src_attention_dropout_rate
:
0.0
# hybrid CTC/attention
model_conf
:
# hybrid CTC/attention
model_conf
:
ctc_weight
:
0.3
lsm_weight
:
0.1
# label smoothing option
length_normalized_loss
:
false
training
:
n_epoch
:
5
accum_grad
:
1
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.tiny
dev_manifest
:
data/manifest.tiny
test_manifest
:
data/manifest.tiny
###########################################
# Dataloader #
###########################################
mean_std_filepath
:
"
"
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_200'
preprocess_config
:
conf/preprocess.yaml
feat_dim
:
80
stride_ms
:
10.0
window_ms
:
25.0
sortagrad
:
0
# Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
batch_size
:
4
maxlen_in
:
512
# if input length > maxlen-in, batchsize is automatically reduced
maxlen_out
:
150
# if output length > maxlen-out, batchsize is automatically reduced
minibatches
:
0
# for debug
batch_count
:
auto
batch_bins
:
0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
augmentation_config
:
conf/preprocess.yaml
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
###########################################
# Training #
###########################################
n_epoch
:
5
accum_grad
:
1
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
lr
:
0.001
weight_decay
:
1
e-06
scheduler
:
warmuplr
scheduler_conf
:
weight_decay
:
1.0
e-06
scheduler
:
warmuplr
scheduler_conf
:
warmup_steps
:
25000
lr_decay
:
1.0
log_interval
:
1
checkpoint
:
log_interval
:
1
checkpoint
:
kbest_n
:
10
latest_n
:
1
decoding
:
batch_size
:
64
error_rate_type
:
wer
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
2.5
beta
:
0.3
beam_size
:
10
cutoff_prob
:
1.0
cutoff_top_n
:
0
num_proc_bsearch
:
8
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
examples/tiny/asr1/conf/chunk_transformer.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.tiny
dev_manifest
:
data/manifest.tiny
test_manifest
:
data/manifest.tiny
min_input_len
:
0.5
# second
max_input_len
:
20.0
# second
min_output_len
:
0.0
# tokens
max_output_len
:
400.0
# tokens
min_output_input_ratio
:
0.05
max_output_input_ratio
:
10.0
collator
:
mean_std_filepath
:
"
"
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_200'
augmentation_config
:
conf/preprocess.yaml
batch_size
:
4
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
# network architecture
model
:
cmvn_file
:
"
data/mean_std.json"
cmvn_file_type
:
"
json"
# encoder related
encoder
:
transformer
encoder_conf
:
############################################
# Network Architecture #
############################################
cmvn_file
:
"
data/mean_std.json"
cmvn_file_type
:
"
json"
# encoder related
encoder
:
transformer
encoder_conf
:
output_size
:
256
# dimension of attention
attention_heads
:
4
linear_units
:
2048
# the number of units of position-wise feed forward
...
...
@@ -55,9 +18,9 @@ model:
use_dynamic_chunk
:
true
use_dynamic_left_chunk
:
false
# decoder related
decoder
:
transformer
decoder_conf
:
# decoder related
decoder
:
transformer
decoder_conf
:
attention_heads
:
4
linear_units
:
2048
num_blocks
:
6
...
...
@@ -66,48 +29,63 @@ model:
self_attention_dropout_rate
:
0.0
src_attention_dropout_rate
:
0.0
# hybrid CTC/attention
model_conf
:
# hybrid CTC/attention
model_conf
:
ctc_weight
:
0.3
lsm_weight
:
0.1
# label smoothing option
length_normalized_loss
:
false
training
:
n_epoch
:
5
accum_grad
:
1
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
# https://yaml.org/type/float.html
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.tiny
dev_manifest
:
data/manifest.tiny
test_manifest
:
data/manifest.tiny
###########################################
# Dataloader #
###########################################
mean_std_filepath
:
"
"
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_200'
preprocess_config
:
conf/preprocess.yaml
feat_dim
:
80
stride_ms
:
10.0
window_ms
:
25.0
sortagrad
:
0
# Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
batch_size
:
4
maxlen_in
:
512
# if input length > maxlen-in, batchsize is automatically reduced
maxlen_out
:
150
# if output length > maxlen-out, batchsize is automatically reduced
minibatches
:
0
# for debug
batch_count
:
auto
batch_bins
:
0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
###########################################
# Training #
###########################################
n_epoch
:
5
accum_grad
:
1
global_grad_clip
:
5.0
optim
:
adam
optim_conf
:
lr
:
0.002
weight_decay
:
1
e-06
scheduler
:
warmuplr
scheduler_conf
:
weight_decay
:
1.0
e-06
scheduler
:
warmuplr
scheduler_conf
:
warmup_steps
:
25000
lr_decay
:
1.0
log_interval
:
1
checkpoint
:
log_interval
:
1
checkpoint
:
kbest_n
:
10
latest_n
:
1
decoding
:
batch_size
:
64
error_rate_type
:
wer
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
2.5
beta
:
0.3
beam_size
:
10
cutoff_prob
:
1.0
cutoff_top_n
:
0
num_proc_bsearch
:
8
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
examples/tiny/asr1/conf/conformer.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.tiny
dev_manifest
:
data/manifest.tiny
test_manifest
:
data/manifest.tiny
min_input_len
:
0.5
# second
max_input_len
:
20.0
# second
min_output_len
:
0.0
# tokens
max_output_len
:
400.0
# tokens
min_output_input_ratio
:
0.05
max_output_input_ratio
:
10.0
###########################################
# Dataloader #
###########################################
mean_std_filepath
:
"
"
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_200'
augmentation_config
:
conf/preprocess.yaml
batch_size
:
4
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
############################################
# Network Architecture #
############################################
...
...
@@ -83,7 +41,41 @@ model_conf:
###########################################
# training #
# Data #
###########################################
train_manifest
:
data/manifest.tiny
dev_manifest
:
data/manifest.tiny
test_manifest
:
data/manifest.tiny
###########################################
# Dataloader #
###########################################
mean_std_filepath
:
"
"
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_200'
preprocess_config
:
conf/preprocess.yaml
feat_dim
:
80
stride_ms
:
10.0
window_ms
:
25.0
sortagrad
:
0
# Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
batch_size
:
4
maxlen_in
:
512
# if input length > maxlen-in, batchsize is automatically reduced
maxlen_out
:
150
# if output length > maxlen-out, batchsize is automatically reduced
minibatches
:
0
# for debug
batch_count
:
auto
batch_bins
:
0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
###########################################
# Training #
###########################################
n_epoch
:
5
accum_grad
:
4
...
...
@@ -91,7 +83,7 @@ global_grad_clip: 5.0
optim
:
adam
optim_conf
:
lr
:
0.002
weight_decay
:
1e-06
weight_decay
:
1
.0
e-06
scheduler
:
warmuplr
scheduler_conf
:
warmup_steps
:
25000
...
...
examples/tiny/asr1/conf/transformer.yaml
浏览文件 @
c907a8de
# https://yaml.org/type/float.html
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.tiny
dev_manifest
:
data/manifest.tiny
test_manifest
:
data/manifest.tiny
min_input_len
:
0.5
# second
max_input_len
:
20.0
# second
min_output_len
:
0.0
# tokens
max_output_len
:
400.0
# tokens
min_output_input_ratio
:
0.05
max_output_input_ratio
:
10.0
###########################################
# Dataloader #
###########################################
mean_std_filepath
:
data/mean_std.json
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_200'
augmentation_config
:
conf/preprocess.yaml
batch_size
:
4
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
80
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-20
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
############################################
# Network Architecture #
############################################
...
...
@@ -74,9 +34,41 @@ model_conf:
lsm_weight
:
0.1
# label smoothing option
length_normalized_loss
:
false
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.tiny
dev_manifest
:
data/manifest.tiny
test_manifest
:
data/manifest.tiny
###########################################
# Dataloader #
###########################################
mean_std_filepath
:
data/mean_std.json
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
spm'
spm_model_prefix
:
'
data/lang_char/bpe_unigram_200'
preprocess_config
:
conf/preprocess.yaml
feat_dim
:
80
stride_ms
:
10.0
window_ms
:
25.0
sortagrad
:
0
# Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
batch_size
:
4
maxlen_in
:
512
# if input length > maxlen-in, batchsize is automatically reduced
maxlen_out
:
150
# if output length > maxlen-out, batchsize is automatically reduced
minibatches
:
0
# for debug
batch_count
:
auto
batch_bins
:
0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
###########################################
#
t
raining #
#
T
raining #
###########################################
n_epoch
:
5
accum_grad
:
1
...
...
@@ -84,7 +76,7 @@ global_grad_clip: 5.0
optim
:
adam
optim_conf
:
lr
:
0.002
weight_decay
:
1e-06
weight_decay
:
1
.0
e-06
scheduler
:
warmuplr
scheduler_conf
:
warmup_steps
:
25000
...
...
examples/tiny/asr1/conf/tuning/chunk_decode.yaml
0 → 100644
浏览文件 @
c907a8de
decode_batch_size
:
8
#64
error_rate_type
:
wer
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
beam_size
:
10
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
\ No newline at end of file
examples/tiny/asr1/conf/tuning/decode.yaml
0 → 100644
浏览文件 @
c907a8de
decode_batch_size
:
8
#64
error_rate_type
:
wer
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
beam_size
:
10
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
examples/tiny/asr1/local/align.sh
浏览文件 @
c907a8de
#!/bin/bash
if
[
$#
!=
2
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix"
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix"
exit
-1
fi
...
...
@@ -9,7 +9,8 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
decode_config_path
=
$2
ckpt_prefix
=
$3
batch_size
=
1
output_dir
=
${
ckpt_prefix
}
...
...
@@ -20,9 +21,10 @@ mkdir -p ${output_dir}
python3
-u
${
BIN_DIR
}
/alignment.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
output_dir
}
/
${
type
}
.align
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
--opts
decod
e.decode_
batch_size
${
batch_size
}
if
[
$?
-ne
0
]
;
then
echo
"Failed in ctc alignment!"
...
...
examples/tiny/asr1/local/test.sh
浏览文件 @
c907a8de
#!/bin/bash
if
[
$#
!=
2
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix"
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix"
exit
-1
fi
...
...
@@ -9,7 +9,8 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
decode_config_path
=
$2
ckpt_prefix
=
$3
chunk_mode
=
false
if
[[
${
config_path
}
=
~ ^.
*
chunk_.
*
yaml
$
]]
;
then
...
...
@@ -33,10 +34,11 @@ for type in attention ctc_greedy_search; do
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decod
ing
.decoding_method
${
type
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
--opts
decod
e
.decoding_method
${
type
}
\
--opts
decod
e.decode_
batch_size
${
batch_size
}
if
[
$?
-ne
0
]
;
then
echo
"Failed in evaluation!"
...
...
@@ -50,10 +52,11 @@ for type in ctc_prefix_beam_search attention_rescoring; do
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
ckpt_prefix
}
.
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decod
ing
.decoding_method
${
type
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
--opts
decod
e
.decoding_method
${
type
}
\
--opts
decod
e.decode_
batch_size
${
batch_size
}
if
[
$?
-ne
0
]
;
then
echo
"Failed in evaluation!"
...
...
examples/tiny/asr1/run.sh
浏览文件 @
c907a8de
...
...
@@ -6,6 +6,7 @@ gpus=0
stage
=
0
stop_stage
=
50
conf_path
=
conf/transformer.yaml
decode_conf_path
=
conf/tuning/decode.yaml
avg_num
=
1
source
${
MAIN_ROOT
}
/utils/parse_options.sh
||
exit
1
;
...
...
@@ -31,12 +32,12 @@ fi
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
# test ckpt avg_n
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/test.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/test.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
fi
if
[
${
stage
}
-le
4
]
&&
[
${
stop_stage
}
-ge
4
]
;
then
# ctc alignment of test data
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/align.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/align.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
fi
if
[
${
stage
}
-le
51
]
&&
[
${
stop_stage
}
-ge
51
]
;
then
...
...
examples/wenetspeech/asr1/conf/conformer.yaml
浏览文件 @
c907a8de
# network architecture
model
:
# encoder related
encoder
:
conformer
encoder_conf
:
############################################
# Network Architecture #
############################################
cmvn_file
:
cmvn_file_type
:
"
json"
# encoder related
encoder
:
conformer
encoder_conf
:
output_size
:
512
# dimension of attention
attention_heads
:
8
linear_units
:
2048
# the number of units of position-wise feed forward
...
...
@@ -19,9 +22,9 @@ model:
pos_enc_layer_type
:
rel_pos
selfattention_layer_type
:
rel_selfattn
# decoder related
decoder
:
transformer
decoder_conf
:
# decoder related
decoder
:
transformer
decoder_conf
:
attention_heads
:
8
linear_units
:
2048
num_blocks
:
6
...
...
@@ -30,82 +33,60 @@ model:
self_attention_dropout_rate
:
0.0
src_attention_dropout_rate
:
0.0
# hybrid CTC/attention
model_conf
:
# hybrid CTC/attention
model_conf
:
ctc_weight
:
0.3
lsm_weight
:
0.1
# label smoothing option
length_normalized_loss
:
false
# https://yaml.org/type/float.html
data
:
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
min_input_len
:
0.1
# second
max_input_len
:
12.0
# second
min_output_len
:
1.0
max_output_len
:
400.0
min_output_input_ratio
:
0.05
max_output_input_ratio
:
10.0
###########################################
# Data #
###########################################
train_manifest
:
data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test
collator
:
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
char'
spm_model_prefix
:
'
'
augmentation_config
:
conf/preprocess.yaml
batch_size
:
64
raw_wav
:
True
# use raw_wav or kaldi feature
spectrum_type
:
fbank
#linear, mfcc, fbank
feat_dim
:
8
0
delta_delta
:
False
dither
:
1.0
target_sample_rate
:
16000
max_freq
:
None
n_fft
:
None
stride_ms
:
10.0
window_ms
:
25.0
use_dB_normalization
:
True
target_dB
:
-2
0
random_seed
:
0
keep_transcription_text
:
False
sortagrad
:
True
shuffle_method
:
batch_shuffle
num_workers
:
2
###########################################
# Dataloader #
###########################################
vocab_filepath
:
data/lang_char/vocab.txt
unit_type
:
'
char'
preprocess_config
:
conf/preprocess.yaml
spm_model_prefix
:
'
'
feat_dim
:
80
stride_ms
:
10.
0
window_ms
:
25.0
sortagrad
:
0
# Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
batch_size
:
64
maxlen_in
:
512
# if input length > maxlen-in, batchsize is automatically reduced
maxlen_out
:
150
# if output length > maxlen-out, batchsize is automatically reduced
minibatches
:
0
# for debug
batch_count
:
auto
batch_bins
:
0
batch_frames_in
:
0
batch_frames_out
:
0
batch_frames_inout
:
0
num_workers
:
0
subsampling_factor
:
1
num_encs
:
1
training
:
n_epoch
:
240
accum_grad
:
16
global_grad_clip
:
5.0
log_interval
:
100
checkpoint
:
###########################################
# Training #
###########################################
n_epoch
:
240
accum_grad
:
16
global_grad_clip
:
5.0
log_interval
:
100
checkpoint
:
kbest_n
:
50
latest_n
:
5
optim
:
adam
optim_conf
:
optim
:
adam
optim_conf
:
lr
:
0.001
weight_decay
:
1
e-6
scheduler
:
warmuplr
scheduler_conf
:
weight_decay
:
1.0
e-6
scheduler
:
warmuplr
scheduler_conf
:
warmup_steps
:
5000
lr_decay
:
1.0
decoding
:
batch_size
:
128
error_rate_type
:
cer
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
lang_model_path
:
data/lm/common_crawl_00.prune01111.trie.klm
alpha
:
2.5
beta
:
0.3
beam_size
:
10
cutoff_prob
:
1.0
cutoff_top_n
:
0
num_proc_bsearch
:
8
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
\ No newline at end of file
examples/wenetspeech/asr1/conf/tuning/decode.yaml
0 → 100644
浏览文件 @
c907a8de
decode_batch_size
:
128
error_rate_type
:
cer
decoding_method
:
attention
# 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
beam_size
:
10
ctc_weight
:
0.5
# ctc weight for attention rescoring decode mode.
decoding_chunk_size
:
-1
# decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks
:
-1
# number of left chunks for decoding. Defaults to -1.
simulate_streaming
:
False
# simulate streaming inference. Defaults to False.
\ No newline at end of file
examples/wenetspeech/asr1/local/test.sh
浏览文件 @
c907a8de
#!/bin/bash
if
[
$#
!=
2
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix"
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix"
exit
-1
fi
...
...
@@ -9,7 +9,8 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
decode_config_path
=
$2
ckpt_prefix
=
$3
chunk_mode
=
false
if
[[
${
config_path
}
=
~ ^.
*
chunk_.
*
yaml
$
]]
;
then
...
...
@@ -36,10 +37,11 @@ for type in attention ctc_greedy_search; do
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
output_dir
}
/
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decod
ing
.decoding_method
${
type
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
--opts
decod
e
.decoding_method
${
type
}
\
--opts
decod
e.decode_
batch_size
${
batch_size
}
if
[
$?
-ne
0
]
;
then
echo
"Failed in evaluation!"
...
...
@@ -55,10 +57,11 @@ for type in ctc_prefix_beam_search attention_rescoring; do
python3
-u
${
BIN_DIR
}
/test.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
output_dir
}
/
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decod
ing
.decoding_method
${
type
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
--opts
decod
e
.decoding_method
${
type
}
\
--opts
decod
e.decode_
batch_size
${
batch_size
}
if
[
$?
-ne
0
]
;
then
echo
"Failed in evaluation!"
...
...
examples/wenetspeech/asr1/local/test_wav.sh
浏览文件 @
c907a8de
#!/bin/bash
if
[
$#
!=
3
]
;
then
echo
"usage:
${
0
}
config_path ckpt_path_prefix audio_file"
if
[
$#
!=
4
]
;
then
echo
"usage:
${
0
}
config_path
decode_config_path
ckpt_path_prefix audio_file"
exit
-1
fi
...
...
@@ -9,8 +9,9 @@ ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo
"using
$ngpu
gpus..."
config_path
=
$1
ckpt_prefix
=
$2
audio_file
=
$3
decode_config_path
=
$2
ckpt_prefix
=
$3
audio_file
=
$4
mkdir
-p
data
wget
-nc
https://paddlespeech.bj.bcebos.com/datasets/single_wav/zh/demo_01_03.wav
-P
data/
...
...
@@ -43,10 +44,11 @@ for type in attention_rescoring; do
python3
-u
${
BIN_DIR
}
/test_wav.py
\
--ngpu
${
ngpu
}
\
--config
${
config_path
}
\
--decode_cfg
${
decode_config_path
}
\
--result_file
${
output_dir
}
/
${
type
}
.rsl
\
--checkpoint_path
${
ckpt_prefix
}
\
--opts
decod
ing
.decoding_method
${
type
}
\
--opts
decod
ing.
batch_size
${
batch_size
}
\
--opts
decod
e
.decoding_method
${
type
}
\
--opts
decod
e.decode_
batch_size
${
batch_size
}
\
--audio_file
${
audio_file
}
if
[
$?
-ne
0
]
;
then
...
...
examples/wenetspeech/asr1/run.sh
浏览文件 @
c907a8de
...
...
@@ -7,7 +7,7 @@ gpus=0,1,2,3,4,5,6,7
stage
=
0
stop_stage
=
100
conf_path
=
conf/conformer.yaml
decode_conf_path
=
conf/tuning/decode.yaml
average_checkpoint
=
true
avg_num
=
10
...
...
@@ -36,12 +36,12 @@ fi
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
# test ckpt avg_n
CUDA_VISIBLE_DEVICES
=
0 ./local/test.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
0 ./local/test.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
fi
if
[
${
stage
}
-le
4
]
&&
[
${
stop_stage
}
-ge
4
]
;
then
# ctc alignment of test data
CUDA_VISIBLE_DEVICES
=
0 ./local/align.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
0 ./local/align.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
||
exit
-1
fi
if
[
${
stage
}
-le
5
]
&&
[
${
stop_stage
}
-ge
5
]
;
then
...
...
@@ -51,5 +51,5 @@ fi
if
[
${
stage
}
-le
7
]
&&
[
${
stop_stage
}
-ge
7
]
;
then
# test a single .wav file
CUDA_VISIBLE_DEVICES
=
0 ./local/test_wav.sh
${
conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
${
audio_file
}
||
exit
-1
CUDA_VISIBLE_DEVICES
=
0 ./local/test_wav.sh
${
conf_path
}
${
decode_conf_path
}
exp/
${
ckpt
}
/checkpoints/
${
avg_ckpt
}
${
audio_file
}
||
exit
-1
fi
paddlespeech/s2t/exps/deepspeech2/bin/deploy/runtime.py
浏览文件 @
c907a8de
...
...
@@ -80,13 +80,13 @@ def inference(config, args):
def
start_server
(
config
,
args
):
"""Start the ASR server"""
config
.
defrost
()
config
.
data
.
manifest
=
config
.
data
.
test_manifest
config
.
manifest
=
config
.
test_manifest
dataset
=
ManifestDataset
.
from_config
(
config
)
config
.
collator
.
augmentation_config
=
""
config
.
collator
.
keep_transcription_text
=
True
config
.
collator
.
batch_size
=
1
config
.
collator
.
num_workers
=
0
config
.
augmentation_config
=
""
config
.
keep_transcription_text
=
True
config
.
batch_size
=
1
config
.
num_workers
=
0
collate_fn
=
SpeechCollator
.
from_config
(
config
)
test_loader
=
DataLoader
(
dataset
,
collate_fn
=
collate_fn
,
num_workers
=
0
)
...
...
@@ -105,14 +105,14 @@ def start_server(config, args):
paddle
.
to_tensor
(
audio
),
paddle
.
to_tensor
(
audio_len
),
vocab_list
=
test_loader
.
collate_fn
.
vocab_list
,
decoding_method
=
config
.
decod
ing
.
decoding_method
,
lang_model_path
=
config
.
decod
ing
.
lang_model_path
,
beam_alpha
=
config
.
decod
ing
.
alpha
,
beam_beta
=
config
.
decod
ing
.
beta
,
beam_size
=
config
.
decod
ing
.
beam_size
,
cutoff_prob
=
config
.
decod
ing
.
cutoff_prob
,
cutoff_top_n
=
config
.
decod
ing
.
cutoff_top_n
,
num_processes
=
config
.
decod
ing
.
num_proc_bsearch
)
decoding_method
=
config
.
decod
e
.
decoding_method
,
lang_model_path
=
config
.
decod
e
.
lang_model_path
,
beam_alpha
=
config
.
decod
e
.
alpha
,
beam_beta
=
config
.
decod
e
.
beta
,
beam_size
=
config
.
decod
e
.
beam_size
,
cutoff_prob
=
config
.
decod
e
.
cutoff_prob
,
cutoff_top_n
=
config
.
decod
e
.
cutoff_top_n
,
num_processes
=
config
.
decod
e
.
num_proc_bsearch
)
return
result_transcript
[
0
]
# warming up with utterrances sampled from Librispeech
...
...
@@ -179,12 +179,16 @@ if __name__ == "__main__":
config
=
get_cfg_defaults
()
if
args
.
config
:
config
.
merge_from_file
(
args
.
config
)
if
args
.
decode_cfg
:
decode_confs
=
CfgNode
(
new_allowed
=
True
)
decode_confs
.
merge_from_file
(
args
.
decode_cfg
)
config
.
decode
=
decode_confs
if
args
.
opts
:
config
.
merge_from_list
(
args
.
opts
)
config
.
freeze
()
print
(
config
)
args
.
warmup_manifest
=
config
.
data
.
test_manifest
args
.
warmup_manifest
=
config
.
test_manifest
print_arguments
(
args
,
globals
())
if
args
.
dump_config
:
...
...
paddlespeech/s2t/exps/deepspeech2/bin/deploy/server.py
浏览文件 @
c907a8de
...
...
@@ -33,13 +33,13 @@ from paddlespeech.s2t.utils.utility import print_arguments
def
start_server
(
config
,
args
):
"""Start the ASR server"""
config
.
defrost
()
config
.
data
.
manifest
=
config
.
data
.
test_manifest
config
.
manifest
=
config
.
test_manifest
dataset
=
ManifestDataset
.
from_config
(
config
)
config
.
collator
.
augmentation_config
=
""
config
.
collator
.
keep_transcription_text
=
True
config
.
collator
.
batch_size
=
1
config
.
collator
.
num_workers
=
0
config
.
augmentation_config
=
""
config
.
keep_transcription_text
=
True
config
.
batch_size
=
1
config
.
num_workers
=
0
collate_fn
=
SpeechCollator
.
from_config
(
config
)
test_loader
=
DataLoader
(
dataset
,
collate_fn
=
collate_fn
,
num_workers
=
0
)
...
...
@@ -62,14 +62,14 @@ def start_server(config, args):
paddle
.
to_tensor
(
audio
),
paddle
.
to_tensor
(
audio_len
),
vocab_list
=
test_loader
.
collate_fn
.
vocab_list
,
decoding_method
=
config
.
decod
ing
.
decoding_method
,
lang_model_path
=
config
.
decod
ing
.
lang_model_path
,
beam_alpha
=
config
.
decod
ing
.
alpha
,
beam_beta
=
config
.
decod
ing
.
beta
,
beam_size
=
config
.
decod
ing
.
beam_size
,
cutoff_prob
=
config
.
decod
ing
.
cutoff_prob
,
cutoff_top_n
=
config
.
decod
ing
.
cutoff_top_n
,
num_processes
=
config
.
decod
ing
.
num_proc_bsearch
)
decoding_method
=
config
.
decod
e
.
decoding_method
,
lang_model_path
=
config
.
decod
e
.
lang_model_path
,
beam_alpha
=
config
.
decod
e
.
alpha
,
beam_beta
=
config
.
decod
e
.
beta
,
beam_size
=
config
.
decod
e
.
beam_size
,
cutoff_prob
=
config
.
decod
e
.
cutoff_prob
,
cutoff_top_n
=
config
.
decod
e
.
cutoff_top_n
,
num_processes
=
config
.
decod
e
.
num_proc_bsearch
)
return
result_transcript
[
0
]
# warming up with utterrances sampled from Librispeech
...
...
@@ -114,12 +114,16 @@ if __name__ == "__main__":
config
=
get_cfg_defaults
()
if
args
.
config
:
config
.
merge_from_file
(
args
.
config
)
if
args
.
decode_cfg
:
decode_confs
=
CfgNode
(
new_allowed
=
True
)
decode_confs
.
merge_from_file
(
args
.
decode_cfg
)
config
.
decode
=
decode_confs
if
args
.
opts
:
config
.
merge_from_list
(
args
.
opts
)
config
.
freeze
()
print
(
config
)
args
.
warmup_manifest
=
config
.
data
.
test_manifest
args
.
warmup_manifest
=
config
.
test_manifest
print_arguments
(
args
,
globals
())
if
args
.
dump_config
:
...
...
paddlespeech/s2t/exps/deepspeech2/bin/test.py
浏览文件 @
c907a8de
...
...
@@ -12,6 +12,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.
"""Evaluation for DeepSpeech2 model."""
from
yacs.config
import
CfgNode
from
paddlespeech.s2t.exps.deepspeech2.config
import
get_cfg_defaults
from
paddlespeech.s2t.exps.deepspeech2.model
import
DeepSpeech2Tester
as
Tester
from
paddlespeech.s2t.training.cli
import
default_argument_parser
...
...
@@ -44,6 +46,10 @@ if __name__ == "__main__":
config
=
get_cfg_defaults
(
args
.
model_type
)
if
args
.
config
:
config
.
merge_from_file
(
args
.
config
)
if
args
.
decode_cfg
:
decode_confs
=
CfgNode
(
new_allowed
=
True
)
decode_confs
.
merge_from_file
(
args
.
decode_cfg
)
config
.
decode
=
decode_confs
if
args
.
opts
:
config
.
merge_from_list
(
args
.
opts
)
config
.
freeze
()
...
...
paddlespeech/s2t/exps/deepspeech2/bin/test_export.py
浏览文件 @
c907a8de
...
...
@@ -12,6 +12,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.
"""Evaluation for DeepSpeech2 model."""
from
yacs.config
import
CfgNode
from
paddlespeech.s2t.exps.deepspeech2.config
import
get_cfg_defaults
from
paddlespeech.s2t.exps.deepspeech2.model
import
DeepSpeech2ExportTester
as
ExportTester
from
paddlespeech.s2t.training.cli
import
default_argument_parser
...
...
@@ -49,6 +51,10 @@ if __name__ == "__main__":
config
=
get_cfg_defaults
(
args
.
model_type
)
if
args
.
config
:
config
.
merge_from_file
(
args
.
config
)
if
args
.
decode_cfg
:
decode_confs
=
CfgNode
(
new_allowed
=
True
)
decode_confs
.
merge_from_file
(
args
.
decode_cfg
)
config
.
decode
=
decode_confs
if
args
.
opts
:
config
.
merge_from_list
(
args
.
opts
)
config
.
freeze
()
...
...
paddlespeech/s2t/exps/deepspeech2/bin/test_wav.py
浏览文件 @
c907a8de
...
...
@@ -18,6 +18,7 @@ from pathlib import Path
import
paddle
import
soundfile
from
yacs.config
import
CfgNode
from
paddlespeech.s2t.exps.deepspeech2.config
import
get_cfg_defaults
from
paddlespeech.s2t.frontend.featurizer.text_featurizer
import
TextFeaturizer
...
...
@@ -41,7 +42,7 @@ class DeepSpeech2Tester_hub():
self
.
audio_file
=
args
.
audio_file
self
.
collate_fn_test
=
SpeechCollator
.
from_config
(
config
)
self
.
_text_featurizer
=
TextFeaturizer
(
unit_type
=
config
.
collator
.
unit_type
,
vocab
=
None
)
unit_type
=
config
.
unit_type
,
vocab
=
None
)
def
compute_result_transcripts
(
self
,
audio
,
audio_len
,
vocab_list
,
cfg
):
result_transcripts
=
self
.
model
.
decode
(
...
...
@@ -74,7 +75,7 @@ class DeepSpeech2Tester_hub():
audio
=
paddle
.
unsqueeze
(
audio
,
axis
=
0
)
vocab_list
=
collate_fn_test
.
vocab_list
result_transcripts
=
self
.
compute_result_transcripts
(
audio
,
audio_len
,
vocab_list
,
cfg
.
decod
ing
)
audio
,
audio_len
,
vocab_list
,
cfg
.
decod
e
)
logger
.
info
(
"result_transcripts: "
+
result_transcripts
[
0
])
def
run_test
(
self
):
...
...
@@ -110,13 +111,13 @@ class DeepSpeech2Tester_hub():
def
setup_model
(
self
):
config
=
self
.
config
.
clone
()
with
UpdateConfig
(
config
):
config
.
model
.
input_dim
=
self
.
collate_fn_test
.
feature_size
config
.
model
.
output_dim
=
self
.
collate_fn_test
.
vocab_size
config
.
input_dim
=
self
.
collate_fn_test
.
feature_size
config
.
output_dim
=
self
.
collate_fn_test
.
vocab_size
if
self
.
args
.
model_type
==
'offline'
:
model
=
DeepSpeech2Model
.
from_config
(
config
.
model
)
model
=
DeepSpeech2Model
.
from_config
(
config
)
elif
self
.
args
.
model_type
==
'online'
:
model
=
DeepSpeech2ModelOnline
.
from_config
(
config
.
model
)
model
=
DeepSpeech2ModelOnline
.
from_config
(
config
)
else
:
raise
Exception
(
"wrong model type"
)
...
...
@@ -134,8 +135,8 @@ class DeepSpeech2Tester_hub():
self
.
checkpoint_dir
=
checkpoint_dir
self
.
checkpoint
=
Checkpoint
(
kbest_n
=
self
.
config
.
training
.
checkpoint
.
kbest_n
,
latest_n
=
self
.
config
.
training
.
checkpoint
.
latest_n
)
kbest_n
=
self
.
config
.
checkpoint
.
kbest_n
,
latest_n
=
self
.
config
.
checkpoint
.
latest_n
)
def
resume
(
self
):
"""Resume from the checkpoint at checkpoints in the output
...
...
@@ -190,6 +191,10 @@ if __name__ == "__main__":
config
=
get_cfg_defaults
(
args
.
model_type
)
if
args
.
config
:
config
.
merge_from_file
(
args
.
config
)
if
args
.
decode_cfg
:
decode_confs
=
CfgNode
(
new_allowed
=
True
)
decode_confs
.
merge_from_file
(
args
.
decode_cfg
)
config
.
decode
=
decode_confs
if
args
.
opts
:
config
.
merge_from_list
(
args
.
opts
)
config
.
freeze
()
...
...
paddlespeech/s2t/exps/deepspeech2/config.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/exps/deepspeech2/model.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/exps/u2/bin/alignment.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/exps/u2/bin/test.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/exps/u2/bin/test_wav.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/exps/u2/config.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/exps/u2/model.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/exps/u2/trainer.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/exps/u2_kaldi/bin/test.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/exps/u2_kaldi/model.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/exps/u2_st/bin/test.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/exps/u2_st/config.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/exps/u2_st/model.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/io/collator.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/io/dataset.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/models/ds2/deepspeech2.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/models/ds2_online/deepspeech2.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
paddlespeech/s2t/training/cli.py
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
tests/benchmark/conformer/run.sh
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
tests/benchmark/conformer/run_benchmark.sh
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
tests/chains/ds2/ds2_params_lite_train_infer.txt
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
tests/chains/ds2/ds2_params_whole_train_infer.txt
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
tests/chains/ds2/lite_train_infer.sh
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
tests/chains/ds2/prepare.sh
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
tests/chains/ds2/test.sh
浏览文件 @
c907a8de
此差异已折叠。
点击以展开。
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录