Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
magicwindyyd
mindspore
提交
64a1560f
M
mindspore
项目概览
magicwindyyd
/
mindspore
与 Fork 源项目一致
Fork自
MindSpore / mindspore
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
M
mindspore
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
64a1560f
编写于
8月 11, 2020
作者:
Y
yuchaojie
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add allreduce group for resnet gpu version
上级
63ac1f35
变更
3
隐藏空白更改
内联
并排
Showing
3 changed file
with
8 addition
and
6 deletion
+8
-6
mindspore/parallel/_auto_parallel_context.py
mindspore/parallel/_auto_parallel_context.py
+4
-4
model_zoo/official/cv/resnet/README.md
model_zoo/official/cv/resnet/README.md
+1
-1
model_zoo/official/cv/resnet/train.py
model_zoo/official/cv/resnet/train.py
+3
-1
未找到文件。
mindspore/parallel/_auto_parallel_context.py
浏览文件 @
64a1560f
...
...
@@ -275,7 +275,7 @@ class _AutoParallelContext:
Args:
indices (list): Indices list.
group (str): The
hccl communication group
.
group (str): The
communication group of hccl/nccl
.
Raises:
TypeError: If type of indices item is not int.
...
...
@@ -311,7 +311,7 @@ class _AutoParallelContext:
Get allreduce fusion split indices.
Args:
group (str): The
hccl communication group
.
group (str): The
communication group of hccl/nccl
.
Returns:
Return split sizes list according to the group.
...
...
@@ -340,7 +340,7 @@ class _AutoParallelContext:
Args:
sizes (list): Sizes list.
group (str): The
hccl communication group
.
group (str): The
communication group of hccl/nccl
.
Raises:
TypeError: If type of sizes item is not int.
...
...
@@ -376,7 +376,7 @@ class _AutoParallelContext:
Get allreduce fusion split sizes.
Args:
group (str): The
hccl communication group
.
group (str): The
communication group of hccl/nccl
.
Returns:
Return split sizes list according to the group.
...
...
model_zoo/official/cv/resnet/README.md
浏览文件 @
64a1560f
...
...
@@ -44,7 +44,7 @@ ImageNet2012
├── run_distribute_train.sh
# launch distributed training(8 pcs)
├── run_parameter_server_train.sh
# launch Ascend parameter server training(8 pcs)
├── run_eval.sh
# launch evaluation
└
── run_standalone_train.sh
# launch standalone training(1 pcs)
├
── run_standalone_train.sh
# launch standalone training(1 pcs)
├── run_distribute_train_gpu.sh
# launch gpu distributed training(8 pcs)
├── run_parameter_server_train_gpu.sh
# launch gpu parameter server training(8 pcs)
├── run_eval_gpu.sh
# launch gpu evaluation
...
...
model_zoo/official/cv/resnet/train.py
浏览文件 @
64a1560f
...
...
@@ -81,9 +81,11 @@ if __name__ == '__main__':
init
()
# GPU target
else
:
init
(
"nccl"
)
context
.
set_auto_parallel_context
(
device_num
=
get_group_size
(),
parallel_mode
=
ParallelMode
.
DATA_PARALLEL
,
mirror_mean
=
True
)
if
args_opt
.
net
==
"resnet50"
:
auto_parallel_context
().
set_all_reduce_fusion_split_indices
([
85
,
160
])
init
(
"nccl"
)
ckpt_save_dir
=
config
.
save_checkpoint_path
+
"ckpt_"
+
str
(
get_rank
())
+
"/"
# create dataset
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录