未验证 提交 f795d6f0 编写于 作者: S sneaxiy 提交者: GitHub

add barrier (#2309)

上级 7cc1d668
......@@ -43,6 +43,7 @@ function _train(){
log_parse_file="mylog/workerlog.0" ;;
*) echo "choose run_mode(sp or mp)"; exit 1;
esac
bash tests/test_tipc/barrier.sh
# 以下不用修改
timeout 15m ${train_cmd} > ${log_file} 2>&1
if [ $? -ne 0 ];then
......
set -ex
NNODES=${PADDLE_TRAINERS_NUM:-"1"}
PYTHON=${PYTHON:-"python"}
TIMEOUT=${1:-"10m"}
if [[ "$NNODES" -gt 1 ]]; then
while ! timeout "$TIMEOUT" "$PYTHON" -m paddle.distributed.launch run_check; do
echo "Retry barrier ......"
done
fi
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册