未验证 提交 0f649b32 编写于 作者: L LiYuRio 提交者: GitHub

remove tcp store barrier (#47184)

上级 e48b6dcf
...@@ -409,11 +409,8 @@ def new_group(ranks=None, backend=None, timeout=_default_timeout): ...@@ -409,11 +409,8 @@ def new_group(ranks=None, backend=None, timeout=_default_timeout):
# TODO(shenliang03): This is a temporary solution to solve the problem of # TODO(shenliang03): This is a temporary solution to solve the problem of
# hang caused by tcp # hang caused by tcp
paddle.distributed.barrier(group=group) paddle.distributed.barrier(group=group)
# NOTE(liyurui): All processors should hang and wait using tcp store, in case master exit before sub-group is created. if paddle.distributed.get_world_size() > 1:
if backend != 'heter': paddle.distributed.barrier()
_barrier_by_tcp_store(group_name, _default_store, timeout)
else:
print("Warning: store barrier is not supported for heter backend.")
return group return group
if not backend: if not backend:
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册