Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
b648f0c2
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
接近 2 年 前同步成功
通知
209
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
b648f0c2
编写于
8月 10, 2017
作者:
W
wanghaoshuang
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Implement uploading data in submit scripts and fix issues
上级
ef5f0436
变更
6
隐藏空白更改
内联
并排
Showing
6 changed file
with
80 addition
and
52 deletion
+80
-52
cloud/README.md
cloud/README.md
+1
-14
cloud/pcloud_submit.sh
cloud/pcloud_submit.sh
+50
-5
cloud/pcloud_train.sh
cloud/pcloud_train.sh
+12
-14
cloud/prepare_data.py
cloud/prepare_data.py
+2
-2
cloud/split_data.py
cloud/split_data.py
+3
-3
pcloud_train.sh
pcloud_train.sh
+12
-14
未找到文件。
cloud/README.md
浏览文件 @
b648f0c2
...
@@ -21,21 +21,8 @@ The we can get job name 'deepspeech20170727130129' at last line
...
@@ -21,21 +21,8 @@ The we can get job name 'deepspeech20170727130129' at last line
```
```
$ paddlecloud logs -n 10000 deepspeech20170727130129
$ paddlecloud logs -n 10000 deepspeech20170727130129
$ ==========================deepspeech20170727130129-trainer-6vk3m==========================
label selector: paddle-job-pserver=deepspeech20170727130129, desired: 1
running pod list: [('Running', '10.1.3.6')]
label selector: paddle-job=deepspeech20170727130129, desired: 1
running pod list: [('Running', '10.1.83.14')]
Starting training job: /pfs/dlnel/home/****@baidu.com/jobs/deepspeech20170727130129, num_gradient_servers: 1, trainer_id: 0, version: v2
I0727 05:01:42.969719 25 Util.cpp:166] commandline: --num_gradient_servers=1 --ports_num_for_sparse=1 --use_gpu=1 --trainer_id=0 --pservers=10.1.3.6 --trainer_count=4 --num_passes=1 --ports_num=1 --port=7164
[INFO 2017-07-27 05:01:50,279 layers.py:2430] output for __conv_0__: c = 32, h = 81, w = 54, size = 139968
[WARNING 2017-07-27 05:01:50,280 layers.py:2789] brelu is not recommend for batch normalization's activation, maybe the relu is better
[INFO 2017-07-27 05:01:50,283 layers.py:2430] output for __conv_1__: c = 32, h = 41, w = 54, size = 70848
I0727 05:01:50.316176 25 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=4 numDevices=4
I0727 05:01:50.454787 25 GradientMachine.cpp:85] Initing parameters..
I0727 05:01:50.690007 25 GradientMachine.cpp:92] Init parameters done.
```
```
[
More opti
ns and cmd aoub
t paddle cloud
](
https://github.com/PaddlePaddle/cloud/blob/develop/doc/usage_cn.md
)
[
More opti
ons and cmd abou
t paddle cloud
](
https://github.com/PaddlePaddle/cloud/blob/develop/doc/usage_cn.md
)
## Run DS2 by customize data
## Run DS2 by customize data
TODO
TODO
cloud/pcloud_submit.sh
浏览文件 @
b648f0c2
DS2_PATH
=
../
#
tar
-czf
deepspeech.tar.gz
${
DS2_PATH
}
TRAIN_MANIFEST
=
"/home/work/wanghaoshuang/ds2/pcloud/models/deep_speech_2/datasets/manifest.dev"
TEST_MANIFEST
=
"/home/work/wanghaoshuang/ds2/pcloud/models/deep_speech_2/datasets/manifest.dev"
VOCAB_PATH
=
"/home/work/wanghaoshuang/ds2/pcloud/models/deep_speech_2/datasets/vocab/eng_vocab.txt"
MEAN_STD_PATH
=
"/home/work/wanghaoshuang/ds2/pcloud/models/deep_speech_2/compute_mean_std.py"
CLOUD_DATA_DIR
=
"/pfs/dlnel/home/wanghaoshuang@baidu.com/deepspeech2/data"
CLOUD_MODEL_DIR
=
"/pfs/dlnel/home/wanghaoshuang@baidu.com/deepspeech2/model"
DS2_PATH
=
${
PWD
%/*
}
rm
-rf
./tmp
mkdir
./tmp
paddlecloud
ls
${
CLOUD_DATA_DIR
}
/mean_std.npz
if
[
$?
-ne
0
]
;
then
cp
-f
${
MEAN_STD_PATH
}
./tmp/mean_std.npz
paddlecloud file put ./tmp/mean_std.npz
${
CLOUD_DATA_DIR
}
/
fi
paddlecloud
ls
${
CLOUD_DATA_DIR
}
/vocab.txt
if
[
$?
-ne
0
]
;
then
cp
-f
${
VOCAB_PATH
}
./tmp/vocab.txt
paddlecloud file put ./tmp/vocab.txt
${
CLOUD_DATA_DIR
}
/
fi
paddlecloud
ls
${
CLOUD_DATA_DIR
}
/cloud.train.manifest
if
[
$?
-ne
0
]
;
then
python prepare_data.py
\
--manifest_path
=
${
TRAIN_MANIFEST
}
\
--out_tar_path
=
"./tmp/cloud.train.tar"
\
--out_manifest_path
=
"tmp/cloud.train.manifest"
paddlecloud file put ./tmp/cloud.train.tar
${
CLOUD_DATA_DIR
}
/
paddlecloud file put ./tmp/cloud.train.manifest
${
CLOUD_DATA_DIR
}
/
fi
paddlecloud
ls
${
CLOUD_DATA_DIR
}
/cloud.test.manifest
if
[
$?
-ne
0
]
;
then
python prepare_data.py
\
--manifest_path
=
${
TEST_MANIFEST
}
\
--out_tar_path
=
"./tmp/cloud.test.tar"
\
--out_manifest_path
=
"tmp/cloud.test.manifest"
paddlecloud file put ./tmp/cloud.test.tar
${
CLOUD_DATA_DIR
}
/
paddlecloud file put ./tmp/cloud.test.manifest
${
CLOUD_DATA_DIR
}
/
fi
rm
-rf
./tmp
JOB_NAME
=
deepspeech
`
date
+%Y%m%d%H%M%S
`
JOB_NAME
=
deepspeech
`
date
+%Y%m%d%H%M%S
`
cp
pcloud_train.sh
${
DS2_PATH
}
cp
pcloud_train.sh
${
DS2_PATH
}
paddlecloud submit
\
paddlecloud submit
\
-image
wanghaoshuang/pcloud_ds2:latest-gpu-cudnn
\
-image
bootstrapper:5000/
wanghaoshuang/pcloud_ds2:latest-gpu-cudnn
\
-jobname
${
JOB_NAME
}
\
-jobname
${
JOB_NAME
}
\
-cpu
4
\
-cpu
4
\
-gpu
4
\
-gpu
4
\
...
@@ -13,5 +58,5 @@ paddlecloud submit \
...
@@ -13,5 +58,5 @@ paddlecloud submit \
-pservers
1
\
-pservers
1
\
-psmemory
10Gi
\
-psmemory
10Gi
\
-passes
1
\
-passes
1
\
-entry
"sh pcloud_train.sh"
\
-entry
"sh pcloud_train.sh
${
CLOUD_DATA_DIR
}
${
CLOUD_MODEl_DIR
}
"
\
.
${
DS2_PATH
}
cloud/pcloud_train.sh
浏览文件 @
b648f0c2
DATA_PATH
=
/pfs/dlnel/public/dataset/speech/libri
DATA_PATH
=
$1
MODEL_PATH
=
$2
#setted by user
#setted by user
TRAIN_MANI
=
${
DATA_PATH
}
/
manifest_pcloud.train
TRAIN_MANI
=
${
DATA_PATH
}
/
cloud.train.manifest
#setted by user
#setted by user
DEV_MANI
=
${
DATA_PATH
}
/
manifest_pcloud.dev
DEV_MANI
=
${
DATA_PATH
}
/
cloud.test.manifest
#setted by user
#setted by user
TRAIN_TAR
=
${
DATA_PATH
}
/
data
.train.tar
TRAIN_TAR
=
${
DATA_PATH
}
/
cloud
.train.tar
#setted by user
#setted by user
DEV_TAR
=
${
DATA_PATH
}
/
data.dev
.tar
DEV_TAR
=
${
DATA_PATH
}
/
cloud.test
.tar
#setted by user
#setted by user
VOCAB_PATH
=
${
DATA_PATH
}
/eng_vocab.txt
VOCAB_PATH
=
${
DATA_PATH
}
/eng_vocab.txt
#setted by user
#setted by user
MEAN_STD_FILE
=
${
DATA_PATH
}
/mean_std.npz
MEAN_STD_FILE
=
${
DATA_PATH
}
/mean_std.npz
tar
-xzf
deepspeech.tar.gz
rm
-rf
./cloud/data/
*
# split train data for each pcloud node
# split train data for each pcloud node
python ./cloud/
pcloud_
split_data.py
\
python ./cloud/split_data.py
\
--in_manifest_path
=
$TRAIN_MANI
\
--in_manifest_path
=
$TRAIN_MANI
\
--data_tar_path
=
$TRAIN_TAR
\
--data_tar_path
=
$TRAIN_TAR
\
--out_manifest_path
=
'./
cloud/data/train.mani
'
--out_manifest_path
=
'./
local.train.manifest
'
# split dev data for each pcloud node
# split dev data for each pcloud node
python
pcloud_
split_data.py
\
python
./cloud/
split_data.py
\
--in_manifest_path
=
$DEV_MANI
\
--in_manifest_path
=
$DEV_MANI
\
--data_tar_path
=
$DEV_TAR
\
--data_tar_path
=
$DEV_TAR
\
--out_manifest_path
=
'./
cloud/data/dev.mani
'
--out_manifest_path
=
'./
local.test.manifest
'
python train.py
\
python train.py
\
--use_gpu
=
1
\
--use_gpu
=
1
\
--trainer_count
=
4
\
--trainer_count
=
4
\
--batch_size
=
256
\
--batch_size
=
256
\
--mean_std_filepath
=
$MEAN_STD_FILE
\
--mean_std_filepath
=
$MEAN_STD_FILE
\
--train_manifest_path
=
'./
cloud/data/train.mani
'
\
--train_manifest_path
=
'./
local.train.manifest
'
\
--dev_manifest_path
=
'./
cloud/data/dev.mani
'
\
--dev_manifest_path
=
'./
local.test.manifest
'
\
--vocab_filepath
=
$VOCAB_PATH
\
--vocab_filepath
=
$VOCAB_PATH
\
cloud/p
cloud_p
repare_data.py
→
cloud/prepare_data.py
浏览文件 @
b648f0c2
...
@@ -25,12 +25,12 @@ parser.add_argument(
...
@@ -25,12 +25,12 @@ parser.add_argument(
help
=
"Manifest of target data. (default: %(default)s)"
)
help
=
"Manifest of target data. (default: %(default)s)"
)
parser
.
add_argument
(
parser
.
add_argument
(
"--out_tar_path"
,
"--out_tar_path"
,
default
=
"./
data/dev
.tar"
,
default
=
"./
tmp/cloud.train
.tar"
,
type
=
str
,
type
=
str
,
help
=
"Output tar file path. (default: %(default)s)"
)
help
=
"Output tar file path. (default: %(default)s)"
)
parser
.
add_argument
(
parser
.
add_argument
(
"--out_manifest_path"
,
"--out_manifest_path"
,
default
=
"./
data/dev.mani
"
,
default
=
"./
tmp/cloud.train.manifest
"
,
type
=
str
,
type
=
str
,
help
=
"Manifest of output data. (default: %(default)s)"
)
help
=
"Manifest of output data. (default: %(default)s)"
)
args
=
parser
.
parse_args
()
args
=
parser
.
parse_args
()
...
...
cloud/
pcloud_
split_data.py
→
cloud/split_data.py
浏览文件 @
b648f0c2
...
@@ -11,17 +11,17 @@ import argparse
...
@@ -11,17 +11,17 @@ import argparse
parser
=
argparse
.
ArgumentParser
(
description
=
__doc__
)
parser
=
argparse
.
ArgumentParser
(
description
=
__doc__
)
parser
.
add_argument
(
parser
.
add_argument
(
"--in_manifest_path"
,
"--in_manifest_path"
,
default
=
'./cloud
/data/dev.mani
'
,
default
=
'./cloud
.train.manifest
'
,
type
=
str
,
type
=
str
,
help
=
"Input manifest path. (default: %(default)s)"
)
help
=
"Input manifest path. (default: %(default)s)"
)
parser
.
add_argument
(
parser
.
add_argument
(
"--data_tar_path"
,
"--data_tar_path"
,
default
=
'./cloud
/data/dev
.tar'
,
default
=
'./cloud
.train
.tar'
,
type
=
str
,
type
=
str
,
help
=
"Data tar file path. (default: %(default)s)"
)
help
=
"Data tar file path. (default: %(default)s)"
)
parser
.
add_argument
(
parser
.
add_argument
(
"--out_manifest_path"
,
"--out_manifest_path"
,
default
=
'./
cloud/data/dev.mani.spli
t'
,
default
=
'./
local.train.manifes
t'
,
type
=
str
,
type
=
str
,
help
=
"Out manifest file path. (default: %(default)s)"
)
help
=
"Out manifest file path. (default: %(default)s)"
)
args
=
parser
.
parse_args
()
args
=
parser
.
parse_args
()
...
...
pcloud_train.sh
浏览文件 @
b648f0c2
DATA_PATH
=
/pfs/dlnel/public/dataset/speech/libri
DATA_PATH
=
$1
MODEL_PATH
=
$2
#setted by user
#setted by user
TRAIN_MANI
=
${
DATA_PATH
}
/
manifest_pcloud.train
TRAIN_MANI
=
${
DATA_PATH
}
/
cloud.train.manifest
#setted by user
#setted by user
DEV_MANI
=
${
DATA_PATH
}
/
manifest_pcloud.dev
DEV_MANI
=
${
DATA_PATH
}
/
cloud.test.manifest
#setted by user
#setted by user
TRAIN_TAR
=
${
DATA_PATH
}
/
data
.train.tar
TRAIN_TAR
=
${
DATA_PATH
}
/
cloud
.train.tar
#setted by user
#setted by user
DEV_TAR
=
${
DATA_PATH
}
/
data.dev
.tar
DEV_TAR
=
${
DATA_PATH
}
/
cloud.test
.tar
#setted by user
#setted by user
VOCAB_PATH
=
${
DATA_PATH
}
/eng_vocab.txt
VOCAB_PATH
=
${
DATA_PATH
}
/eng_vocab.txt
#setted by user
#setted by user
MEAN_STD_FILE
=
${
DATA_PATH
}
/mean_std.npz
MEAN_STD_FILE
=
${
DATA_PATH
}
/mean_std.npz
tar
-xzvf
deepspeech.tar.gz
rm
-rf
./cloud/data/
*
# split train data for each pcloud node
# split train data for each pcloud node
python ./cloud/
pcloud_
split_data.py
\
python ./cloud/split_data.py
\
--in_manifest_path
=
$TRAIN_MANI
\
--in_manifest_path
=
$TRAIN_MANI
\
--data_tar_path
=
$TRAIN_TAR
\
--data_tar_path
=
$TRAIN_TAR
\
--out_manifest_path
=
'./
cloud/data/train.mani
'
--out_manifest_path
=
'./
local.train.manifest
'
# split dev data for each pcloud node
# split dev data for each pcloud node
python
pcloud_
split_data.py
\
python
./cloud/
split_data.py
\
--in_manifest_path
=
$DEV_MANI
\
--in_manifest_path
=
$DEV_MANI
\
--data_tar_path
=
$DEV_TAR
\
--data_tar_path
=
$DEV_TAR
\
--out_manifest_path
=
'./
cloud/data/dev.mani
'
--out_manifest_path
=
'./
local.test.manifest
'
python train.py
\
python train.py
\
--use_gpu
=
1
\
--use_gpu
=
1
\
--trainer_count
=
4
\
--trainer_count
=
4
\
--batch_size
=
256
\
--batch_size
=
256
\
--mean_std_filepath
=
$MEAN_STD_FILE
\
--mean_std_filepath
=
$MEAN_STD_FILE
\
--train_manifest_path
=
'./
cloud/data/train.mani
'
\
--train_manifest_path
=
'./
local.train.manifest
'
\
--dev_manifest_path
=
'./
cloud/data/dev.mani
'
\
--dev_manifest_path
=
'./
local.test.manifest
'
\
--vocab_filepath
=
$VOCAB_PATH
\
--vocab_filepath
=
$VOCAB_PATH
\
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录