Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
54376f5d
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 1 年 前同步成功
通知
207
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
54376f5d
编写于
4月 20, 2022
作者:
H
Hui Zhang
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
speedup ngram build
上级
60f2b5e5
变更
3
显示空白变更内容
内联
并排
Showing
3 changed file
with
51 addition
and
8 deletion
+51
-8
speechx/examples/ds2_ol/aishell/local/split_data.sh
speechx/examples/ds2_ol/aishell/local/split_data.sh
+11
-5
speechx/examples/ngram/zh/local/aishell_train_lms.sh
speechx/examples/ngram/zh/local/aishell_train_lms.sh
+10
-3
speechx/examples/ngram/zh/local/split_data.sh
speechx/examples/ngram/zh/local/split_data.sh
+30
-0
未找到文件。
speechx/examples/ds2_ol/aishell/local/split_data.sh
浏览文件 @
54376f5d
#!/usr/bin/env bash
set
-eo
pipefail
data
=
$1
feat_
scp
=
$2
split_
feat_
name
=
$3
scp
=
$2
split_name
=
$3
numsplit
=
$4
# save in $data/split{n}
# $scp to split
#
if
[[
!
$numsplit
-gt
0
]]
;
then
echo
"Invalid num-split argument"
;
...
...
@@ -12,8 +17,8 @@ if [[ ! $numsplit -gt 0 ]]; then
fi
directories
=
$(
for
n
in
`
seq
$numsplit
`
;
do
echo
$data
/split
${
numsplit
}
/
$n
;
done
)
feat_split_scp
=
$(
for
n
in
`
seq
$numsplit
`
;
do
echo
$data
/split
${
numsplit
}
/
$n
/
${
split_fea
t_name
}
;
done
)
echo
$feat_split_scp
scp_splits
=
$(
for
n
in
`
seq
$numsplit
`
;
do
echo
$data
/split
${
numsplit
}
/
$n
/
${
spli
t_name
}
;
done
)
# if this mkdir fails due to argument-list being too long, iterate.
if
!
mkdir
-p
$directories
>
&/dev/null
;
then
for
n
in
`
seq
$numsplit
`
;
do
...
...
@@ -21,4 +26,5 @@ if ! mkdir -p $directories >&/dev/null; then
done
fi
utils/split_scp.pl
$feat_scp
$feat_split_scp
echo
"utils/split_scp.pl
$scp
$scp_splits
"
utils/split_scp.pl
$scp
$scp_splits
speechx/examples/ngram/zh/local/aishell_train_lms.sh
浏览文件 @
54376f5d
...
...
@@ -3,6 +3,7 @@
# To be run from one directory above this script.
.
./path.sh
nj
=
40
text
=
data/local/lm/text
lexicon
=
data/local/dict/lexicon.txt
...
...
@@ -31,9 +32,15 @@ cleantext=$dir/text.no_oov
# oov to <SPOKEN_NOISE>
# lexicon line: word char0 ... charn
# text line: utt word0 ... wordn -> line: <SPOKEN_NOISE> word0 ... wordn
cat
$text
|
awk
-v
lex
=
$lexicon
'BEGIN{while((getline<lex) >0){ seen[$1]=1; } }
text_dir
=
$(
dirname
$text
)
split_name
=
$(
basename
$text
)
./local/split_data.sh
$text_dir
$text
$split_name
$nj
utils/run.pl
JOB
=
1:
$nj
$text_dir
/split
${
nj
}
/JOB/
${
split_name
}
.no_oov.log
\
cat
${
text_dir
}
/split
${
nj
}
/JOB/
${
split_name
}
\|
awk
-v
lex
=
$lexicon
'BEGIN{while((getline<lex) >0){ seen[$1]=1; } }
{for(n=1; n<=NF;n++) { if (seen[$n]) { printf("%s ", $n); } else {printf("<SPOKEN_NOISE> ");} } printf("\n");}'
\
>
$cleantext
||
exit
1
;
\>
${
text_dir
}
/split
${
nj
}
/JOB/
${
split_name
}
.no_oov
||
exit
1
;
cat
${
text_dir
}
/split
${
nj
}
/
*
/
${
split_name
}
.no_oov
>
$cleantext
# compute word counts, sort in descending order
# line: count word
...
...
speechx/examples/ngram/zh/local/split_data.sh
0 → 100644
浏览文件 @
54376f5d
#!/usr/bin/env bash
set
-eo
pipefail
data
=
$1
scp
=
$2
split_name
=
$3
numsplit
=
$4
# save in $data/split{n}
# $scp to split
#
if
[[
!
$numsplit
-gt
0
]]
;
then
echo
"Invalid num-split argument"
;
exit
1
;
fi
directories
=
$(
for
n
in
`
seq
$numsplit
`
;
do
echo
$data
/split
${
numsplit
}
/
$n
;
done
)
scp_splits
=
$(
for
n
in
`
seq
$numsplit
`
;
do
echo
$data
/split
${
numsplit
}
/
$n
/
${
split_name
}
;
done
)
# if this mkdir fails due to argument-list being too long, iterate.
if
!
mkdir
-p
$directories
>
&/dev/null
;
then
for
n
in
`
seq
$numsplit
`
;
do
mkdir
-p
$data
/split
${
numsplit
}
/
$n
done
fi
echo
"utils/split_scp.pl
$scp
$scp_splits
"
utils/split_scp.pl
$scp
$scp_splits
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录