Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
7bee9d80
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 1 年 前同步成功
通知
206
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
7bee9d80
编写于
10月 10, 2022
作者:
T
tianhao zhang
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
format wav2vec2 demo
上级
19180d35
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
110 addition
and
0 deletion
+110
-0
examples/librispeech/asr3/local/data.sh
examples/librispeech/asr3/local/data.sh
+110
-0
未找到文件。
examples/librispeech/asr3/local/data.sh
0 → 100644
浏览文件 @
7bee9d80
#!/bin/bash
stage
=
-1
stop_stage
=
100
unit_type
=
char
dict_dir
=
data/lang_char
source
${
MAIN_ROOT
}
/utils/parse_options.sh
mkdir
-p
data
mkdir
-p
${
dict_dir
}
TARGET_DIR
=
${
MAIN_ROOT
}
/dataset
mkdir
-p
${
TARGET_DIR
}
if
[
${
stage
}
-le
-1
]
&&
[
${
stop_stage
}
-ge
-1
]
;
then
# download data, generate manifests
python3
${
TARGET_DIR
}
/librispeech/librispeech.py
\
--manifest_prefix
=
"data/manifest"
\
--target_dir
=
"
${
TARGET_DIR
}
/librispeech"
\
--full_download
=
"True"
if
[
$?
-ne
0
]
;
then
echo
"Prepare LibriSpeech failed. Terminated."
exit
1
fi
for
set
in
train-clean-100 train-clean-360 train-other-500 dev-clean dev-other test-clean test-other
;
do
mv
data/manifest.
${
set
}
data/manifest.
${
set
}
.raw
done
rm
-rf
data/manifest.train.raw data/manifest.dev.raw data/manifest.test.raw
for
set
in
train-clean-100 train-clean-360 train-other-500
;
do
cat
data/manifest.
${
set
}
.raw
>>
data/manifest.train.raw
done
for
set
in
dev-clean dev-other
;
do
cat
data/manifest.
${
set
}
.raw
>>
data/manifest.dev.raw
done
for
set
in
test-clean test-other
;
do
cat
data/manifest.
${
set
}
.raw
>>
data/manifest.test.raw
done
fi
if
[
${
stage
}
-le
0
]
&&
[
${
stop_stage
}
-ge
0
]
;
then
# compute mean and stddev for normalizer
num_workers
=
$(
nproc
)
python3
${
MAIN_ROOT
}
/utils/compute_mean_std.py
\
--manifest_path
=
"data/manifest.train.raw"
\
--num_samples
=
2000
\
--spectrum_type
=
"fbank"
\
--feat_dim
=
161
\
--delta_delta
=
false
\
--sample_rate
=
16000
\
--stride_ms
=
10
\
--window_ms
=
25
\
--use_dB_normalization
=
False
\
--num_workers
=
${
num_workers
}
\
--output_path
=
"data/mean_std.json"
if
[
$?
-ne
0
]
;
then
echo
"Compute mean and stddev failed. Terminated."
exit
1
fi
fi
if
[
${
stage
}
-le
1
]
&&
[
${
stop_stage
}
-ge
1
]
;
then
# build vocabulary
python3
${
MAIN_ROOT
}
/utils/build_vocab.py
\
--unit_type
${
unit_type
}
\
--count_threshold
=
0
\
--vocab_path
=
"
${
dict_dir
}
/vocab.txt"
\
--manifest_paths
=
"data/manifest.train.raw"
if
[
$?
-ne
0
]
;
then
echo
"Build vocabulary failed. Terminated."
exit
1
fi
fi
if
[
${
stage
}
-le
2
]
&&
[
${
stop_stage
}
-ge
2
]
;
then
# format manifest with tokenids, vocab size
for
set
in
train dev
test
dev-clean dev-other test-clean test-other
;
do
{
python3
${
MAIN_ROOT
}
/utils/format_data.py
\
--cmvn_path
"data/mean_std.json"
\
--unit_type
${
unit_type
}
\
--vocab_path
=
"
${
dict_dir
}
/vocab.txt"
\
--manifest_path
=
"data/manifest.
${
set
}
.raw"
\
--output_path
=
"data/manifest.
${
set
}
"
if
[
$?
-ne
0
]
;
then
echo
"Formt mnaifest.
${
set
}
failed. Terminated."
exit
1
fi
}
&
done
wait
fi
echo
"LibriSpeech Data preparation done."
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
mkdir
-p
exp/wav2vec2
echo
"Pretrained wav2vec2 model download"
wget
-P
exp/wav2vec2 https://paddlespeech.bj.bcebos.com/wav2vec/wav2vec2-large-960h-lv60-self.pdparams
fi
exit
0
\ No newline at end of file
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录