Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
PaddleRec
提交
20fb78d3
P
PaddleRec
项目概览
BaiXuePrincess
/
PaddleRec
与 Fork 源项目一致
Fork自
PaddlePaddle / PaddleRec
通知
1
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleRec
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
20fb78d3
编写于
9月 02, 2020
作者:
L
liuyuhui
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
fix bugs for files partition running in collective mode
上级
8c7d113e
变更
2
显示空白变更内容
内联
并排
Showing
2 changed file
with
32 addition
and
1 deletion
+32
-1
core/engine/local_cluster.py
core/engine/local_cluster.py
+2
-1
core/utils/dataloader_instance.py
core/utils/dataloader_instance.py
+30
-0
未找到文件。
core/engine/local_cluster.py
浏览文件 @
20fb78d3
...
@@ -119,7 +119,8 @@ class LocalClusterEngine(Engine):
...
@@ -119,7 +119,8 @@ class LocalClusterEngine(Engine):
"PADDLE_TRAINERS_NUM"
:
str
(
worker_num
),
"PADDLE_TRAINERS_NUM"
:
str
(
worker_num
),
"TRAINING_ROLE"
:
"TRAINER"
,
"TRAINING_ROLE"
:
"TRAINER"
,
"PADDLE_TRAINER_ID"
:
str
(
i
),
"PADDLE_TRAINER_ID"
:
str
(
i
),
"FLAGS_selected_gpus"
:
str
(
selected_gpus
[
i
])
"FLAGS_selected_gpus"
:
str
(
selected_gpus
[
i
]),
"PADDLEREC_GPU_NUMS"
:
str
(
selected_gpus_num
)
})
})
os
.
system
(
"mkdir -p {}"
.
format
(
logs_dir
))
os
.
system
(
"mkdir -p {}"
.
format
(
logs_dir
))
...
...
core/utils/dataloader_instance.py
浏览文件 @
20fb78d3
...
@@ -47,6 +47,16 @@ def dataloader_by_name(readerclass,
...
@@ -47,6 +47,16 @@ def dataloader_by_name(readerclass,
files
.
sort
()
files
.
sort
()
# for local cluster: discard some files if files cannot be divided equally between GPUs
if
(
context
[
"device"
]
==
"GPU"
):
selected_gpu_nums
=
int
(
os
.
getenv
(
"PADDLEREC_GPU_NUMS"
))
discard_file_nums
=
len
(
files
)
%
selected_gpu_nums
if
(
discard_file_nums
!=
0
):
print
(
"Warning: beacause files cannot be divided equally between GPUs,discard these files:{}"
.
format
(
files
[
-
discard_file_nums
:]))
files
=
files
[:
len
(
files
)
-
discard_file_nums
]
need_split_files
=
False
need_split_files
=
False
if
context
[
"engine"
]
==
EngineMode
.
LOCAL_CLUSTER
:
if
context
[
"engine"
]
==
EngineMode
.
LOCAL_CLUSTER
:
# for local cluster: split files for multi process
# for local cluster: split files for multi process
...
@@ -109,6 +119,16 @@ def slotdataloader_by_name(readerclass, dataset_name, yaml_file, context):
...
@@ -109,6 +119,16 @@ def slotdataloader_by_name(readerclass, dataset_name, yaml_file, context):
files
.
sort
()
files
.
sort
()
# for local cluster: discard some files if files cannot be divided equally between GPUs
if
(
context
[
"device"
]
==
"GPU"
):
selected_gpu_nums
=
int
(
os
.
getenv
(
"PADDLEREC_GPU_NUMS"
))
discard_file_nums
=
len
(
files
)
%
selected_gpu_nums
if
(
discard_file_nums
!=
0
):
print
(
"Warning: beacause files cannot be divided equally between GPUs, discard these files:{}"
.
format
(
files
[
-
discard_file_nums
:]))
files
=
files
[:
len
(
files
)
-
discard_file_nums
]
need_split_files
=
False
need_split_files
=
False
if
context
[
"engine"
]
==
EngineMode
.
LOCAL_CLUSTER
:
if
context
[
"engine"
]
==
EngineMode
.
LOCAL_CLUSTER
:
# for local cluster: split files for multi process
# for local cluster: split files for multi process
...
@@ -179,6 +199,16 @@ def slotdataloader(readerclass, train, yaml_file, context):
...
@@ -179,6 +199,16 @@ def slotdataloader(readerclass, train, yaml_file, context):
files
.
sort
()
files
.
sort
()
# for local cluster: discard some files if files cannot be divided equally between GPUs
if
(
context
[
"device"
]
==
"GPU"
):
selected_gpu_nums
=
int
(
os
.
getenv
(
"PADDLEREC_GPU_NUMS"
))
discard_file_nums
=
len
(
files
)
%
selected_gpu_nums
if
(
discard_file_nums
!=
0
):
print
(
"Warning: beacause files cannot be divided equally between GPUs,discard these files:{}"
.
format
(
files
[
-
discard_file_nums
:]))
files
=
files
[:
len
(
files
)
-
discard_file_nums
]
need_split_files
=
False
need_split_files
=
False
if
context
[
"engine"
]
==
EngineMode
.
LOCAL_CLUSTER
:
if
context
[
"engine"
]
==
EngineMode
.
LOCAL_CLUSTER
:
# for local cluster: split files for multi process
# for local cluster: split files for multi process
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录