提交 fe018da2 编写于 作者: 武毅 提交者: GitHub

Merge pull request #1522 from typhoonzero/k8s_cluster_rerun

Fix k8s cluster job rerunable
...@@ -132,6 +132,7 @@ def startPaddle(idMap={}, train_args_dict=None): ...@@ -132,6 +132,7 @@ def startPaddle(idMap={}, train_args_dict=None):
logDir = JOB_PATH_OUTPUT + "/node_" + str(trainerId) logDir = JOB_PATH_OUTPUT + "/node_" + str(trainerId)
if not os.path.exists(JOB_PATH_OUTPUT): if not os.path.exists(JOB_PATH_OUTPUT):
os.makedirs(JOB_PATH_OUTPUT) os.makedirs(JOB_PATH_OUTPUT)
if not os.path.exists(logDir):
os.mkdir(logDir) os.mkdir(logDir)
copyCommand = 'cp -rf ' + JOB_PATH + \ copyCommand = 'cp -rf ' + JOB_PATH + \
"/" + str(trainerId) + "/data/*" + " ./data/" "/" + str(trainerId) + "/data/*" + " ./data/"
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册