Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
MegEngine 天元
MegEngine
提交
3aef5224
MegEngine
项目概览
MegEngine 天元
/
MegEngine
1 年多 前同步成功
通知
403
Star
4705
Fork
582
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
MegEngine
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
提交
3aef5224
编写于
1月 13, 2023
作者:
M
Megvii Engine Team
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
refactor(distributed): remove the shm backend for distributed training
GitOrigin-RevId: ab76f23f9dc6a4452fcde58fac6078f4c24af352
上级
21849d79
变更
4
隐藏空白更改
内联
并排
Showing
4 changed file
with
5 addition
and
10 deletion
+5
-10
imperative/python/megengine/distributed/__init__.py
imperative/python/megengine/distributed/__init__.py
+1
-1
imperative/python/megengine/distributed/group.py
imperative/python/megengine/distributed/group.py
+2
-2
imperative/python/megengine/distributed/helper.py
imperative/python/megengine/distributed/helper.py
+1
-4
src/opr-mm/impl/megray_helper.cpp
src/opr-mm/impl/megray_helper.cpp
+1
-3
未找到文件。
imperative/python/megengine/distributed/__init__.py
浏览文件 @
3aef5224
...
@@ -26,7 +26,7 @@ from .server import Client, Server
...
@@ -26,7 +26,7 @@ from .server import Client, Server
@
mproperty
@
mproperty
def
backend
(
mod
):
def
backend
(
mod
):
r
"""Get or set backend of collective communication.
r
"""Get or set backend of collective communication.
Available backends are ['nccl', '
shm', '
rccl']
Available backends are ['nccl', 'rccl']
Examples:
Examples:
...
...
imperative/python/megengine/distributed/group.py
浏览文件 @
3aef5224
...
@@ -95,7 +95,7 @@ class Group:
...
@@ -95,7 +95,7 @@ class Group:
WORLD
=
Group
([])
WORLD
=
Group
([])
_devices
=
{
"gpu"
,
"cuda"
,
"rocm"
}
_devices
=
{
"gpu"
,
"cuda"
,
"rocm"
}
_backends
=
{
"nccl"
,
"rccl"
,
"
shm"
,
"
auto"
}
_backends
=
{
"nccl"
,
"rccl"
,
"auto"
}
def
init_process_group
(
def
init_process_group
(
...
@@ -115,7 +115,7 @@ def init_process_group(
...
@@ -115,7 +115,7 @@ def init_process_group(
world_size: total number of processes participating in the job.
world_size: total number of processes participating in the job.
rank: rank of the current process.
rank: rank of the current process.
device: the GPU device id to bind this process to.
device: the GPU device id to bind this process to.
backend: communicator backend, currently support 'nccl' and '
shm
'.
backend: communicator backend, currently support 'nccl' and '
rccl
'.
"""
"""
physical_device_type
=
what_is_xpu
()
if
device_type
==
"xpu"
else
device_type
physical_device_type
=
what_is_xpu
()
if
device_type
==
"xpu"
else
device_type
if
not
isinstance
(
master_ip
,
str
):
if
not
isinstance
(
master_ip
,
str
):
...
...
imperative/python/megengine/distributed/helper.py
浏览文件 @
3aef5224
...
@@ -205,10 +205,7 @@ class AllreduceCallback:
...
@@ -205,10 +205,7 @@ class AllreduceCallback:
assert
_group
.
_sd
,
"please call init_process_group first"
assert
_group
.
_sd
,
"please call init_process_group first"
backend
=
_group
.
_sd
.
backend
backend
=
_group
.
_sd
.
backend
if
backend
==
"auto"
:
if
backend
==
"auto"
:
if
group
.
is_single_machine
and
not
_check_enable_p2p
():
backend
=
"nccl"
backend
=
"shm"
else
:
backend
=
"nccl"
self
.
_backend
=
backend
self
.
_backend
=
backend
def
_reset
(
self
):
def
_reset
(
self
):
...
...
src/opr-mm/impl/megray_helper.cpp
浏览文件 @
3aef5224
...
@@ -31,10 +31,8 @@ MegRay::Backend mgb::opr::get_megray_backend(const std::string& backend) {
...
@@ -31,10 +31,8 @@ MegRay::Backend mgb::opr::get_megray_backend(const std::string& backend) {
return
MegRay
::
MEGRAY_RCCL
;
return
MegRay
::
MEGRAY_RCCL
;
}
else
if
(
backend
==
"ucx"
)
{
}
else
if
(
backend
==
"ucx"
)
{
return
MegRay
::
MEGRAY_UCX
;
return
MegRay
::
MEGRAY_UCX
;
}
else
if
(
backend
==
"shm"
)
{
return
MegRay
::
MEGRAY_SHM
;
}
else
{
}
else
{
mgb_throw
(
MegBrainError
,
"ba
ck
CollectiveComm backend"
);
mgb_throw
(
MegBrainError
,
"ba
d
CollectiveComm backend"
);
}
}
}
}
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录