Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Oneflow-Inc
OneFlow-Benchmark
提交
11f6e4cb
O
OneFlow-Benchmark
项目概览
Oneflow-Inc
/
OneFlow-Benchmark
上一次同步 2 年多
通知
1
Star
92
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
O
OneFlow-Benchmark
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
前往新版Gitcode,体验更适合开发者的 AI 搜索 >>
提交
11f6e4cb
编写于
10月 15, 2021
作者:
O
ouyangyu
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
refine
上级
646149d9
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
3 addition
and
9 deletion
+3
-9
Classification/cnns/args_train.sh
Classification/cnns/args_train.sh
+3
-9
未找到文件。
Classification/cnns/args_train.sh
浏览文件 @
11f6e4cb
rm
-rf
core.
*
rm
-rf
./output/logs/
$HOSTNAME
./output/
$HOSTNAME
./initial_model
# bash args_train.sh ${NUM_NODES} ${NUM_GPUS_PER_NODE} ${BATCH_SIZE} ${USE_FP16} ${NUM_EPOCH} ${LOSS_PRINT_ITER} ${TRAIN_DATA_PATH} ${VAL_DATA_PATH} ${PYTHON_BIN} ${NODE_IPS} ${DEBUG_AND_NCCL} ${NSYS_BIN} ${ITER_NUM}
...
...
@@ -16,7 +15,7 @@ PYTHON_BIN=${9:-"python3"}
NODE_IPS
=
${
10
:-
"10.11.0.2,10.11.0.3,10.11.0.4,10.11.0.5"
}
DEBUG_AND_NCCL
=
${
11
:-
false
}
NSYS_BIN
=
${
12
:-
""
}
ITER_NUM
=
${
13
:-
1
}
RUN_COMMIT
=
${
13
:-
"master"
}
# if [ $NUM_GPUS_PER_NODE -eq 1 ]; then
# export CUDA_VISIBLE_DEVICES=$(($ITER_NUM-1))
...
...
@@ -26,7 +25,7 @@ TRAN_MODEL="resnet50"
RUN_TIME
=
$(
date
"+%Y%m%d_%H%M%S%N"
)
LOG_FOLDER
=
./output/logs/
$HOSTNAME
/
${
NUM_NODES
}
n
${
NUM_GPUS_PER_NODE
}
g
mkdir
-p
$LOG_FOLDER
LOG_FILENAME
=
$LOG_FOLDER
/
${
TRAN_MODEL
}
_lazy_
${
NUM_NODES
}
n
${
NUM_GPUS_PER_NODE
}
g_b
${
BATCH_SIZE
}
_fp16
${
USE_FP16
}
_
${
RUN_TIME
}
_
iter
${
ITER_NUM
}
.log
LOG_FILENAME
=
$LOG_FOLDER
/
${
TRAN_MODEL
}
_lazy_
${
NUM_NODES
}
n
${
NUM_GPUS_PER_NODE
}
g_b
${
BATCH_SIZE
}
_fp16
${
USE_FP16
}
_
${
RUN_TIME
}
_
${
RUN_COMMIT
}
.log
export
PYTHONUNBUFFERED
=
1
echo
PYTHONUNBUFFERED
=
$PYTHONUNBUFFERED
...
...
@@ -43,7 +42,7 @@ fi
CMD
=
""
if
[[
!
-z
"
${
NSYS_BIN
}
"
]]
;
then
CMD+
=
"
${
NSYS_BIN
}
profile --stats true --output
${
TRAN_MODEL
}
_v0.4.0_
${
NUM_NODES
}
_
${
NUM_GPUS_PER_NODE
}
_%h_%p "
CMD+
=
"
${
NSYS_BIN
}
profile --stats true --output
${
LOG_FOLDER
}
/
${
TRAN_MODEL
}
_lazy_
${
NUM_NODES
}
n
${
NUM_GPUS_PER_NODE
}
g_b
${
BATCH_SIZE
}
_fp16
${
USE_FP16
}
_
${
RUN_COMMIT
}
_%h_%p "
fi
CMD+
=
"
${
PYTHON_BIN
}
of_cnn_train_val.py "
...
...
@@ -87,8 +86,3 @@ echo "Rum cmd ${CMD}"
$CMD
2>&1 |
tee
${
LOG_FILENAME
}
echo
"Writting log to
${
LOG_FILENAME
}
"
if
[
!
-d
"./test_result"
]
;
then
mkdir
./test_result
fi
cp
-r
$LOG_FOLDER
./test_result/
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录