Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
PaddleRec
提交
7d585eaf
P
PaddleRec
项目概览
BaiXuePrincess
/
PaddleRec
与 Fork 源项目一致
Fork自
PaddlePaddle / PaddleRec
通知
1
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleRec
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
7d585eaf
编写于
5月 13, 2020
作者:
T
tangwei
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add cluster run
上级
1fb63969
变更
4
隐藏空白更改
内联
并排
Showing
4 changed file
with
2 addition
and
206 deletion
+2
-206
core/utils/envs.py
core/utils/envs.py
+2
-6
models/rank/dnn/backend.yaml
models/rank/dnn/backend.yaml
+0
-35
models/rank/dnn/submit.sh
models/rank/dnn/submit.sh
+0
-90
models/rank/dnn/worker.sh
models/rank/dnn/worker.sh
+0
-75
未找到文件。
core/utils/envs.py
浏览文件 @
7d585eaf
...
...
@@ -17,6 +17,7 @@ import copy
import
sys
import
socket
from
contextlib
import
closing
global_envs
=
{}
...
...
@@ -176,17 +177,12 @@ def get_platform():
return
"WINDOWS"
<<
<<
<<
<
HEAD
:
fleet_rec
/
core
/
utils
/
envs
.
py
==
==
==
=
>>>>>>
>
upstream
/
develop
:
core
/
utils
/
envs
.
py
def
find_free_port
():
def
__free_port
():
with
closing
(
socket
.
socket
(
socket
.
AF_INET
,
socket
.
SOCK_STREAM
))
as
s
:
s
.
bind
((
''
,
0
))
return
s
.
getsockname
()[
1
]
new_port
=
__free_port
()
return
new_port
models/rank/dnn/backend.yaml
已删除
100755 → 0
浏览文件 @
1fb63969
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
engine
:
workspace
:
"
paddlerec.models.rank.dnn"
backend
:
"
MPI"
hdfs
:
name
:
"
hdfs://nmg01-taihang-hdfs.dmop.baidu.com:54310"
ugi
:
"
fcr,SaK2VqfEDeXzKPor"
output
:
"
/app/ecom/fcr/fanyabo/wadstyleimageq/tangwei12/output_1/"
package
:
build_script
:
"
{workspace}/package.sh"
python
:
"
/home/tangwei/fleet_rec_env/cpython-2.7.11-ucs4"
paddlerec
:
"
/home/tangwei/fleet_rec_env/PaddleRec"
submit
:
hpc
:
"
/home/tangwei/Plines/client/smart_client_khan/"
qconf
:
"
/home/tangwei/Plines/imageq/qsub_f.conf"
nodes
:
10
submit_scrpit
:
"
{workspace}/submit.sh"
job_scrpit
:
"
{workspace}/worker.sh"
models/rank/dnn/submit.sh
已删除
100644 → 0
浏览文件 @
1fb63969
#!/bin/bash
###################################################
# Usage: submit.sh
# Description: run mpi submit clinet implement
###################################################
#-----------------------------------------------------------------------------------------------------------------
#fun : get argument from env, set it into variables
#param : N/A
#return : 0 -- success; not 0 -- failure
#-----------------------------------------------------------------------------------------------------------------
function
vars_get_from_env
()
{
echo
"xx"
}
#-----------------------------------------------------------------------------------------------------------------
#fun : package
#param : N/A
#return : 0 -- success; not 0 -- failure
#-----------------------------------------------------------------------------------------------------------------
function
package
()
{
g_run_stage
=
"package"
temp
=
${
engine_temp_path
}
echo
"package temp dir: "
${
temp
}
cp
${
engine_job_scrpit
}
${
temp
}
cp
${
engine_submit_qconf
}
${
temp
}
echo
"copy job.sh from "
${
engine_worker
}
" to "
${
temp
}
mkdir
-p
${
temp
}
/package
cp
-r
${
engine_package_python
}
${
temp
}
/package/
echo
"copy python from "
${
engine_package_python
}
" to "
${
temp
}
mkdir
${
temp
}
/package/whl
cp
${
engine_package_paddlerec
}
${
temp
}
/package/whl/
echo
"copy "
${
engine_package_paddlerec
}
" to "
${
temp
}
"/whl/"
}
#-----------------------------------------------------------------------------------------------------------------
#fun : before hook submit to cluster
#param : N/A
#return : 0 -- success; not 0 -- failure
#-----------------------------------------------------------------------------------------------------------------
function
before_submit
()
{
echo
"before_submit"
}
#-----------------------------------------------------------------------------------------------------------------
#fun : after hook submit to cluster
#param : N/A
#return : 0 -- success; not 0 -- failure
#-----------------------------------------------------------------------------------------------------------------
function
after_submit
()
{
echo
"after_submit"
}
#-----------------------------------------------------------------------------------------------------------------
#fun : submit to cluster
#param : N/A
#return : 0 -- success; not 0 -- failure
#-----------------------------------------------------------------------------------------------------------------
function
submit
()
{
g_run_stage
=
"submit"
g_job_name
=
"paddle_rec_mpi"
g_hdfs_path
=
$g_hdfs_path
g_job_entry
=
"worker.sh"
engine_hdfs_output
=
${
engine_hdfs_output
}
/
$(
date
+%Y%m%d%H%M%S
)
cd
${
engine_temp_path
}
${
engine_submit_hpc
}
/bin/qsub_f
\
-N
${
g_job_name
}
\
--conf
${
engine_submit_qconf
}
\
--hdfs
${
engine_hdfs_name
}
\
--ugi
${
engine_hdfs_ugi
}
\
--hout
${
engine_hdfs_output
}
\
--files
./package
\
-l
nodes
=
${
engine_submit_nodes
}
,walltime
=
1000:00:00,resource
=
full
${
g_job_entry
}
}
function
main
()
{
package
before_submit
submit
after_submit
}
models/rank/dnn/worker.sh
已删除
100644 → 0
浏览文件 @
1fb63969
#!/bin/bash
###################################################
# Usage: job.sh
# Description: run job on mpi per node
###################################################
# ---------------------------------------------------------------------------- #
# variable define #
# ---------------------------------------------------------------------------- #
declare
g_curPath
=
""
declare
g_scriptName
=
""
declare
g_workPath
=
""
declare
g_run_stage
=
""
# ---------------------------------------------------------------------------- #
# const define #
# ---------------------------------------------------------------------------- #
export
FLAGS_communicator_thread_pool_size
=
5
export
FLAGS_communicator_send_queue_size
=
18
export
FLAGS_communicator_thread_pool_size
=
20
export
FLAGS_communicator_max_merge_var_num
=
18
################################################################################
#-----------------------------------------------------------------------------------------------------------------
#fun : check function return code
#param : N/A
#return : 0 -- success; not 0 -- failure
#-----------------------------------------------------------------------------------------------------------------
function
check_error
()
{
if
[
${
?
}
-ne
0
]
;
then
echo
"execute "
+
$g_run_stage
+
" raise exception! please check ..."
exit
1
fi
}
#-----------------------------------------------------------------------------------------------------------------
#fun : check function return code
#param : N/A
#return : 0 -- success; not 0 -- failure
#-----------------------------------------------------------------------------------------------------------------
function
env_prepare
()
{
g_run_stage
=
"env_prepare"
WORKDIR
=
$(
pwd
)
mpirun
-npernode
1
mv
package/
*
./
echo
"current:"
$WORKDIR
mpirun
-npernode
1
tar
-zxvf
python.tar.gz
>
/dev/null
export
PYTHONPATH
=
$WORKDIR
/python/
export
PYTHONROOT
=
$WORKDIR
/python/
export
LIBRARY_PATH
=
$PYTHONPATH
/lib:
$LIBRARY_PATH
export
LD_LIBRARY_PATH
=
$PYTHONPATH
/lib:
$LD_LIBRARY_PATH
export
PATH
=
$PYTHONPATH
/bin:
$PATH
export
LIBRARY_PATH
=
$PYTHONROOT
/lib:
$LIBRARY_PATH
python
-c
"print('heheda')"
mpirun
-npernode
1 python/bin/python
-m
pip uninstall
-y
paddle-rec
mpirun
-npernode
1 python/bin/python
-m
pip
install
whl/fleet_rec-0.0.2-py2-none-any.whl
--index-url
=
http://pip.baidu.com/pypi/simple
--trusted-host
pip.baidu.com
check_error
}
function
run
()
{
echo
"run"
g_run_stage
=
"run"
mpirun
-npernode
2
-timestamp-output
-tag-output
-machinefile
${
PBS_NODEFILE
}
python/bin/python
-u
-m
paddlerec.run
-m
paddlerec.models.rank.dnn
--engine
cluster
--role
worker
}
function
main
()
{
env_prepare
run
}
main
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录