Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
MegEngine 天元
MegEngine
提交
e507228e
MegEngine
项目概览
MegEngine 天元
/
MegEngine
1 年多 前同步成功
通知
404
Star
4705
Fork
582
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
MegEngine
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
提交
e507228e
编写于
9月 10, 2020
作者:
M
Megvii Engine Team
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
feat(mge/examples): add distributed training examples using launcher
GitOrigin-RevId: 5db26f58eb2f81825693c423f9466bbd830fec6c
上级
23437864
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
20 addition
and
39 deletion
+20
-39
imperative/python/megengine/distributed/launcher.py
imperative/python/megengine/distributed/launcher.py
+20
-39
未找到文件。
imperative/python/megengine/distributed/launcher.py
浏览文件 @
e507228e
...
...
@@ -8,26 +8,12 @@
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
import
multiprocessing
as
mp
from
..device
import
get_device_count
from
.group
import
init_process_group
from
.helper
import
get_device_count_by_fork
from
.server
import
Server
from
.util
import
get_free_ports
def
_get_device_count
():
"""use subprocess to avoid cuda environment initialization in the main process"""
def
run
(
q
):
count
=
get_device_count
(
"gpu"
)
q
.
put
(
count
)
q
=
mp
.
Queue
()
p
=
mp
.
Process
(
target
=
run
,
args
=
(
q
,))
p
.
start
()
p
.
join
()
return
q
.
get
()
def
_run_wrapped
(
func
,
master_ip
,
port
,
world_size
,
rank
,
dev
,
args
,
kwargs
):
"""init distributed process group and run wrapped function"""
init_process_group
(
...
...
@@ -36,33 +22,28 @@ def _run_wrapped(func, master_ip, port, world_size, rank, dev, args, kwargs):
func
(
*
args
,
**
kwargs
)
def
launcher
(
n_gpus
):
def
launcher
(
func
):
"""decorator for launching multiple processes in single-machine multi-gpu training"""
count
=
_get_device_count
()
assert
isinstance
(
n_gpus
,
int
)
and
n_gpus
>
1
,
"invalid n_gpus"
assert
n_gpus
<=
count
,
"{} gpus required, {} gpus provided"
.
format
(
n_gpus
,
count
)
def
decorator
(
func
):
def
wrapper
(
*
args
,
**
kwargs
):
master_ip
=
"localhost"
port
=
get_free_ports
(
1
)[
0
]
server
=
Server
(
port
)
n_gpus
=
get_device_count_by_fork
(
"gpu"
)
procs
=
[]
for
rank
in
range
(
n_gpus
):
p
=
mp
.
Process
(
target
=
_run_wrapped
,
args
=
(
func
,
master_ip
,
port
,
n_gpus
,
rank
,
rank
,
args
,
kwargs
),
)
p
.
start
()
procs
.
append
(
p
)
def
wrapper
(
*
args
,
**
kwargs
):
master_ip
=
"localhost"
port
=
get_free_ports
(
1
)[
0
]
server
=
Server
(
port
)
for
rank
in
range
(
n_gpus
):
procs
[
rank
].
join
()
code
=
procs
[
rank
].
exitcode
assert
code
==
0
,
"subprocess {} exit with code {}"
.
format
(
rank
,
code
)
procs
=
[]
for
rank
in
range
(
n_gpus
):
p
=
mp
.
Process
(
target
=
_run_wrapped
,
args
=
(
func
,
master_ip
,
port
,
n_gpus
,
rank
,
rank
,
args
,
kwargs
),
)
p
.
start
()
procs
.
append
(
p
)
return
wrapper
for
rank
in
range
(
n_gpus
):
procs
[
rank
].
join
()
code
=
procs
[
rank
].
exitcode
assert
code
==
0
,
"subprocess {} exit with code {}"
.
format
(
rank
,
code
)
return
decorato
r
return
wrappe
r
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录