Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Crayon鑫
Paddle
提交
66d46221
P
Paddle
项目概览
Crayon鑫
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1
Issue
1
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
66d46221
编写于
4月 17, 2021
作者:
S
ShenLiang
提交者:
GitHub
4月 17, 2021
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
[Hybrid Parallel] Add model parallel support in dygraph (#32248)
* add model parallel support in dygraph
上级
03c9ecd9
变更
18
隐藏空白更改
内联
并排
Showing
18 changed file
with
917 addition
and
10 deletion
+917
-10
paddle/fluid/framework/distributed_strategy.proto
paddle/fluid/framework/distributed_strategy.proto
+7
-0
python/paddle/distributed/fleet/__init__.py
python/paddle/distributed/fleet/__init__.py
+2
-0
python/paddle/distributed/fleet/base/distributed_strategy.py
python/paddle/distributed/fleet/base/distributed_strategy.py
+34
-0
python/paddle/distributed/fleet/base/fleet_base.py
python/paddle/distributed/fleet/base/fleet_base.py
+43
-0
python/paddle/distributed/fleet/base/topology.py
python/paddle/distributed/fleet/base/topology.py
+18
-9
python/paddle/distributed/fleet/meta_parallel/__init__.py
python/paddle/distributed/fleet/meta_parallel/__init__.py
+15
-0
python/paddle/distributed/fleet/meta_parallel/mp_utils/__init__.py
...ddle/distributed/fleet/meta_parallel/mp_utils/__init__.py
+16
-0
python/paddle/distributed/fleet/meta_parallel/mp_utils/layers.py
...paddle/distributed/fleet/meta_parallel/mp_utils/layers.py
+190
-0
python/paddle/distributed/fleet/meta_parallel/mp_utils/layers_help.py
...e/distributed/fleet/meta_parallel/mp_utils/layers_help.py
+116
-0
python/paddle/distributed/fleet/meta_parallel/mp_utils/random.py
...paddle/distributed/fleet/meta_parallel/mp_utils/random.py
+79
-0
python/paddle/fluid/tests/unittests/CMakeLists.txt
python/paddle/fluid/tests/unittests/CMakeLists.txt
+3
-0
python/paddle/fluid/tests/unittests/hybrid_parallel_communicate_group.py
...luid/tests/unittests/hybrid_parallel_communicate_group.py
+0
-0
python/paddle/fluid/tests/unittests/hybrid_parallel_mp_layers.py
...paddle/fluid/tests/unittests/hybrid_parallel_mp_layers.py
+273
-0
python/paddle/fluid/tests/unittests/hybrid_parallel_mp_random.py
...paddle/fluid/tests/unittests/hybrid_parallel_mp_random.py
+74
-0
python/paddle/fluid/tests/unittests/test_fleet_distributed_strategy.py
.../fluid/tests/unittests/test_fleet_distributed_strategy.py
+11
-0
python/paddle/fluid/tests/unittests/test_new_group.sh
python/paddle/fluid/tests/unittests/test_new_group.sh
+1
-1
python/paddle/fluid/tests/unittests/test_parallel_dygraph_hybrid_parallel.py
.../tests/unittests/test_parallel_dygraph_hybrid_parallel.py
+33
-0
python/setup.py.in
python/setup.py.in
+2
-0
未找到文件。
paddle/fluid/framework/distributed_strategy.proto
浏览文件 @
66d46221
...
...
@@ -43,6 +43,12 @@ message ShardingConfig {
optional
int32
pp_degree
=
11
[
default
=
1
];
}
message
HybridConfig
{
optional
int32
dp_degree
=
1
[
default
=
-
1
];
optional
int32
mp_degree
=
2
[
default
=
1
];
optional
int32
pp_degree
=
3
[
default
=
1
];
}
message
AMPConfig
{
optional
float
init_loss_scaling
=
1
[
default
=
32768.0
];
optional
int32
incr_every_n_steps
=
2
[
default
=
1000
];
...
...
@@ -175,6 +181,7 @@ message DistributedStrategy {
optional
LambConfig
lamb_configs
=
109
;
optional
AdaptiveLocalSGDConfig
adaptive_localsgd_configs
=
110
;
optional
ShardingConfig
sharding_configs
=
111
;
optional
HybridConfig
hybrid_configs
=
112
;
optional
BuildStrategy
build_strategy
=
201
;
optional
ExecutionStrategy
execution_strategy
=
202
;
}
...
...
python/paddle/distributed/fleet/__init__.py
浏览文件 @
66d46221
...
...
@@ -21,6 +21,7 @@ from .dataset import *
from
.data_generator
import
MultiSlotDataGenerator
,
MultiSlotStringDataGenerator
from
.
import
metrics
from
.base.topology
import
CommunicateTopology
,
HybridCommunicateGroup
from
.meta_parallel
import
random
,
layers
__all__
=
[
"DistributedStrategy"
,
"UtilBase"
,
"UserDefinedRoleMaker"
,
...
...
@@ -72,3 +73,4 @@ get_lr = fleet.get_lr
state_dict
=
fleet
.
state_dict
set_state_dict
=
fleet
.
set_state_dict
shrink
=
fleet
.
shrink
get_hybrid_communicate_group
=
fleet
.
get_hybrid_communicate_group
python/paddle/distributed/fleet/base/distributed_strategy.py
浏览文件 @
66d46221
...
...
@@ -867,6 +867,40 @@ class DistributedStrategy(object):
"pipeline_configs"
)
assign_configs_value
(
self
.
strategy
.
pipeline_configs
,
configs
)
@
property
def
hybrid_configs
(
self
):
"""
Dynamic graph hybrid parallel strategy configuration. Three-way hybrid parallelism
needs to meet the following relationships
total_number_GPUs = dp_degree * mp_degree * pp_degree
**Note**:
dp_degree(int): set number of GPUs in a data parallel group. Default -1.
This value should be an integer greater than 0.
If it is not set, or set to -1, its value will be inferred
based on the total number of cards.
mp_degree(int): set number of GPUs in a model parallel group. Default 1
pp_degree(int): set number of GPUs in a pipeline parallel group. Default 1
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.hybrid_configs = {
"dp_degree": 1,
"mp_degree": 2,
"pp_degree": 1}
"""
return
get_msg_dict
(
self
.
strategy
.
hybrid_configs
)
@
hybrid_configs
.
setter
def
hybrid_configs
(
self
,
configs
):
check_configs_key
(
self
.
strategy
.
hybrid_configs
,
configs
,
"hybrid_configs"
)
assign_configs_value
(
self
.
strategy
.
hybrid_configs
,
configs
)
@
property
def
localsgd
(
self
):
"""
...
...
python/paddle/distributed/fleet/base/fleet_base.py
浏览文件 @
66d46221
...
...
@@ -26,6 +26,7 @@ from .meta_optimizer_factory import MetaOptimizerFactory
from
.runtime_factory
import
RuntimeFactory
from
paddle.fluid.wrapped_decorator
import
wrap_decorator
from
paddle.fluid.dygraph
import
parallel_helper
from
.
import
topology
as
tp
def
_inited_runtime_handler_
(
func
):
...
...
@@ -234,6 +235,48 @@ class Fleet(object):
self
.
_user_defined_strategy
.
nccl_comm_num
)
paddle
.
distributed
.
init_parallel_env
()
# init hybrid parallel environment in dygraph
if
tp
.
_HYBRID_PARALLEL_GROUP
is
None
:
self
.
_init_hybrid_parallel_env
()
else
:
warnings
.
warn
(
"The dygraph hybrid parallel environment has been initialized."
)
def
_init_hybrid_parallel_env
(
self
):
"""initialize the hybrid environment
"""
self
.
hybrid_configs
=
self
.
_user_defined_strategy
.
hybrid_configs
self
.
dp_degree
=
self
.
hybrid_configs
[
"dp_degree"
]
self
.
mp_degree
=
self
.
hybrid_configs
[
"mp_degree"
]
self
.
pp_degree
=
self
.
hybrid_configs
[
"pp_degree"
]
assert
self
.
mp_degree
>=
0
,
"mp_degree should be greater or equal to 0"
assert
self
.
pp_degree
>=
0
,
"pp_degree should be greater or equal to 0"
self
.
mp_degree
=
max
(
self
.
mp_degree
,
1
)
self
.
pp_degree
=
max
(
self
.
pp_degree
,
1
)
if
self
.
dp_degree
<
0
:
nranks
=
paddle
.
distributed
.
get_world_size
()
self
.
dp_degree
=
nranks
//
(
self
.
mp_degree
*
self
.
pp_degree
)
self
.
dp_degree
=
max
(
self
.
dp_degree
,
1
)
self
.
_topology
=
tp
.
CommunicateTopology
(
hybrid_group_names
=
[
"data"
,
"pipe"
,
"model"
],
dims
=
[
self
.
dp_degree
,
self
.
pp_degree
,
self
.
mp_degree
])
self
.
_hcg
=
tp
.
HybridCommunicateGroup
(
self
.
_topology
)
def
get_hybrid_communicate_group
(
self
):
assert
self
.
_hcg
is
not
None
return
self
.
_hcg
def
get_hybrid_parallel_topology
(
self
):
assert
self
.
_topology
is
not
None
return
self
.
_topology
def
is_first_worker
(
self
):
"""
Check whether the node is the first instance of worker.
...
...
python/paddle/distributed/fleet/base/topology.py
浏览文件 @
66d46221
...
...
@@ -12,6 +12,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
print_function
import
sys
import
paddle
import
collections
import
numpy
as
np
...
...
@@ -19,6 +21,8 @@ from itertools import product
from
functools
import
reduce
__all__
=
[
'CommunicateTopology'
,
'HybridCommunicateGroup'
]
_HYBRID_PARALLEL_GROUP
=
None
class
CommunicateTopology
(
object
):
def
__init__
(
self
,
hybrid_group_names
,
dims
):
...
...
@@ -100,26 +104,31 @@ class HybridCommunicateGroup(object):
self
.
global_rank
=
paddle
.
distributed
.
get_rank
()
self
.
_topo
=
topology
self
.
_
num_data_parallel
=
self
.
_topo
.
get_dim
(
'data'
)
self
.
_
num_model_parallel
=
self
.
_topo
.
get_dim
(
'model'
)
self
.
_
num_pipe_parallel
=
self
.
_topo
.
get_dim
(
'pipe'
)
self
.
_
dp_degree
=
self
.
_topo
.
get_dim
(
'data'
)
self
.
_
mp_degree
=
self
.
_topo
.
get_dim
(
'model'
)
self
.
_
pp_degree
=
self
.
_topo
.
get_dim
(
'pipe'
)
self
.
_data_parallel_id
=
self
.
_get_data_parallel_id
()
self
.
_model_parallel_id
=
self
.
_get_model_parallel_id
()
assert
self
.
_check_vaild_topo
(
),
"Here is an unreasonable topogy setting"
),
"Here is an unreasonable topogy setting. world_size: {}, but"
\
"dp_num: {}, mp_num: {}, pp_num: {}"
.
format
(
self
.
nranks
,
self
.
_dp_degree
,
self
.
_mp_degree
,
self
.
_pp_degree
)
# create comm group for data parallel
self
.
_dp_group
,
self
.
_dp_comm_group
=
self
.
_set_comm_group
(
"data"
)
print
(
"data parallel group"
,
self
.
_dp_group
)
print
(
"data parallel group"
,
self
.
_dp_group
,
file
=
sys
.
stderr
)
# create comm group for model parallel
self
.
_mp_group
,
self
.
_mp_comm_group
=
self
.
_set_comm_group
(
"model"
)
print
(
"model parallel group"
,
self
.
_mp_group
)
print
(
"data parallel group"
,
self
.
_mp_group
,
file
=
sys
.
stderr
)
global
_HYBRID_PARALLEL_GROUP
_HYBRID_PARALLEL_GROUP
=
self
def
_check_vaild_topo
(
self
):
return
self
.
_
num_data_parallel
*
self
.
_num_model_parallel
*
self
.
_num_pipe_parallel
==
self
.
nranks
return
self
.
_
dp_degree
*
self
.
_mp_degree
*
self
.
_pp_degree
==
self
.
nranks
def
_set_comm_group
(
self
,
parallel_method
=
"data"
):
parallel_group
=
[]
...
...
@@ -151,7 +160,7 @@ class HybridCommunicateGroup(object):
return
self
.
_data_parallel_id
def
get_data_parallel_world_size
(
self
):
return
self
.
_
num_data_parallel
return
self
.
_
dp_degree
def
get_data_parallel_group
(
self
):
return
self
.
_dp_comm_group
...
...
@@ -167,7 +176,7 @@ class HybridCommunicateGroup(object):
return
self
.
_model_parallel_id
def
get_model_parallel_world_size
(
self
):
return
self
.
_
num_model_parallel
return
self
.
_
mp_degree
def
get_model_parallel_group
(
self
):
return
self
.
_mp_comm_group
...
...
python/paddle/distributed/fleet/meta_parallel/__init__.py
0 → 100644
浏览文件 @
66d46221
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
.mp_utils
import
*
python/paddle/distributed/fleet/meta_parallel/mp_utils/__init__.py
0 → 100644
浏览文件 @
66d46221
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
.layers
import
*
from
.random
import
*
python/paddle/distributed/fleet/meta_parallel/mp_utils/layers.py
0 → 100644
浏览文件 @
66d46221
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
paddle
from
paddle.fluid.dygraph.layers
import
Layer
from
.random
import
get_rng_state_tracker
from
paddle.nn
import
functional
as
F
from
paddle
import
framework
from
...base
import
topology
as
tp
from
.layers_help
import
identity_in_model_parallel
,
gather_in_model_parallel
,
reduce_in_model_parallel
,
scatter_in_model_parallel
__all__
=
[
'VocabParallelEmbedding'
,
'ColumnParallelLinear'
,
'RowParallelLinear'
]
# Follow this paper to achieve the file:
# Shoeybi M, Patwary M, Puri R, et al. Megatron-lm: Training multi-billion parameter
# language models using model parallelism[J]. arXiv preprint arXiv:1909.08053, 2019. (https://arxiv.org/abs/1909.08053)
class
VocabParallelEmbedding
(
Layer
):
def
__init__
(
self
,
num_embeddings
,
embedding_dim
,
weight_attr
=
None
,
name
=
None
):
super
(
VocabParallelEmbedding
,
self
).
__init__
()
self
.
model_parallel_group
=
tp
.
_HYBRID_PARALLEL_GROUP
.
get_model_parallel_group
(
)
self
.
world_size
=
tp
.
_HYBRID_PARALLEL_GROUP
.
get_model_parallel_world_size
(
)
self
.
rank
=
tp
.
_HYBRID_PARALLEL_GROUP
.
get_model_parallel_rank
()
self
.
origin_num_embeddings
=
num_embeddings
per_part_size
=
(
num_embeddings
+
self
.
world_size
-
1
)
//
self
.
world_size
last_part_size
=
num_embeddings
-
per_part_size
*
(
self
.
world_size
-
1
)
if
self
.
rank
==
self
.
world_size
-
1
:
per_part_size
=
last_part_size
per_part_size
+=
1
# make the last row as the padding index
self
.
per_part_size
=
per_part_size
self
.
embedding
=
paddle
.
nn
.
Embedding
(
per_part_size
,
embedding_dim
,
padding_idx
=
per_part_size
-
1
,
sparse
=
False
,
weight_attr
=
weight_attr
,
name
=
name
)
self
.
embedding
.
weight
.
is_distributed
=
True
def
forward
(
self
,
x
):
origin_input_shape
=
x
.
shape
if
len
(
origin_input_shape
)
==
2
:
x
=
paddle
.
unsqueeze
(
x
,
axis
=-
1
)
else
:
assert
origin_input_shape
[
-
1
]
==
1
,
(
"The last dimension size of x must be 1."
)
x_shard
=
paddle
.
shard_index
(
x
,
self
.
origin_num_embeddings
,
self
.
world_size
,
self
.
rank
,
self
.
per_part_size
-
1
)
if
len
(
origin_input_shape
)
==
2
:
x_shard
=
paddle
.
squeeze
(
x_shard
,
axis
=-
1
)
emb_out_
=
self
.
embedding
(
x_shard
)
emb_out
=
reduce_in_model_parallel
(
emb_out_
)
return
emb_out
class
ColumnParallelLinear
(
Layer
):
def
__init__
(
self
,
in_features
,
out_features
,
weight_attr
=
None
,
has_bias
=
None
,
gather_output
=
True
,
name
=
None
):
super
(
ColumnParallelLinear
,
self
).
__init__
()
self
.
model_parallel_group
=
tp
.
_HYBRID_PARALLEL_GROUP
.
get_model_parallel_group
(
)
self
.
world_size
=
tp
.
_HYBRID_PARALLEL_GROUP
.
get_model_parallel_world_size
(
)
self
.
name
=
name
self
.
gather_output
=
gather_output
assert
out_features
%
self
.
world_size
==
0
,
(
"Number of column of the weight for linear ({}) must be"
" divisible by model parallel size ({})"
.
format
(
out_features
,
self
.
world_size
))
self
.
output_size_per_partition
=
out_features
//
self
.
world_size
self
.
_weight_attr
=
weight_attr
self
.
_dtype
=
self
.
_helper
.
get_default_dtype
()
self
.
weight
=
self
.
create_parameter
(
shape
=
[
in_features
,
self
.
output_size_per_partition
],
attr
=
self
.
_weight_attr
,
dtype
=
self
.
_dtype
)
self
.
weight
.
is_distributed
=
True
if
has_bias
:
# initialize bias to zero like Megatron
self
.
bias
=
self
.
create_parameter
(
shape
=
[
self
.
output_size_per_partition
],
attr
=
paddle
.
nn
.
initializer
.
Constant
(
value
=
0.0
),
dtype
=
self
.
_dtype
)
self
.
bias
.
is_distributed
=
True
else
:
self
.
bias
=
None
def
forward
(
self
,
x
):
input_parallel
=
identity_in_model_parallel
(
x
)
output_parallel
=
F
.
linear
(
input_parallel
,
self
.
weight
,
self
.
bias
,
name
=
self
.
name
)
if
self
.
gather_output
:
output
=
gather_in_model_parallel
(
output_parallel
)
else
:
output
=
output_parallel
return
output
class
RowParallelLinear
(
Layer
):
def
__init__
(
self
,
in_features
,
out_features
,
weight_attr
=
None
,
has_bias
=
True
,
input_is_parallel
=
False
,
name
=
None
):
super
(
RowParallelLinear
,
self
).
__init__
()
self
.
in_features
=
in_features
self
.
out_features
=
out_features
self
.
input_is_parallel
=
input_is_parallel
self
.
_weight_attr
=
weight_attr
self
.
_dtype
=
self
.
_helper
.
get_default_dtype
()
self
.
name
=
name
self
.
model_parallel_group
=
tp
.
_HYBRID_PARALLEL_GROUP
.
get_model_parallel_group
(
)
self
.
world_size
=
tp
.
_HYBRID_PARALLEL_GROUP
.
get_model_parallel_world_size
(
)
self
.
rank
=
tp
.
_HYBRID_PARALLEL_GROUP
.
get_model_parallel_rank
()
assert
in_features
%
self
.
world_size
==
0
,
(
"Number of row of the weight for linear ({}) must be"
" divisible by model parallel size ({})"
.
format
(
in_features
,
self
.
world_size
))
self
.
input_size_per_partition
=
in_features
//
self
.
world_size
self
.
weight
=
self
.
create_parameter
(
shape
=
[
self
.
input_size_per_partition
,
self
.
out_features
],
attr
=
self
.
_weight_attr
,
dtype
=
self
.
_dtype
)
self
.
weight
.
is_distributed
=
True
if
has_bias
:
self
.
bias
=
self
.
create_parameter
(
shape
=
[
self
.
out_features
],
attr
=
paddle
.
nn
.
initializer
.
Constant
(
value
=
0.0
),
dtype
=
self
.
_dtype
)
else
:
self
.
bias
=
None
def
forward
(
self
,
x
):
if
self
.
input_is_parallel
:
input_parallel
=
x
else
:
# split last dim
input_parallel
=
scatter_in_model_parallel
(
x
)
output_parallel
=
F
.
linear
(
input_parallel
,
self
.
weight
,
name
=
self
.
name
)
output_
=
reduce_in_model_parallel
(
output_parallel
)
output
=
output_
+
self
.
bias
if
self
.
bias
is
not
None
else
output_
return
output
python/paddle/distributed/fleet/meta_parallel/mp_utils/layers_help.py
0 → 100644
浏览文件 @
66d46221
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
paddle.autograd
import
PyLayer
from
...base
import
topology
as
tp
import
paddle
# Follow this paper to achieve the file:
# Shoeybi M, Patwary M, Puri R, et al. Megatron-lm: Training multi-billion parameter
# language models using model parallelism[J]. arXiv preprint arXiv:1909.08053, 2019. (https://arxiv.org/abs/1909.08053)
def
mp_reduce
(
x
):
if
tp
.
_HYBRID_PARALLEL_GROUP
.
get_model_parallel_world_size
()
==
1
:
return
x
paddle
.
distributed
.
all_reduce
(
x
,
group
=
tp
.
_HYBRID_PARALLEL_GROUP
.
get_model_parallel_group
())
return
x
def
mp_split
(
x
):
world_size
=
tp
.
_HYBRID_PARALLEL_GROUP
.
get_model_parallel_world_size
()
if
world_size
==
1
:
return
x
rank
=
tp
.
_HYBRID_PARALLEL_GROUP
.
get_model_parallel_rank
()
last_dim
=
len
(
x
.
shape
)
-
1
input_list
=
paddle
.
split
(
x
,
num_or_sections
=
world_size
,
axis
=
last_dim
)
output
=
input_list
[
rank
]
return
output
def
mp_gather
(
x
):
world_size
=
tp
.
_HYBRID_PARALLEL_GROUP
.
get_model_parallel_world_size
()
if
world_size
==
1
:
return
x
output
=
[]
paddle
.
distributed
.
all_gather
(
output
,
x
,
group
=
tp
.
_HYBRID_PARALLEL_GROUP
.
get_model_parallel_group
())
output
=
paddle
.
concat
(
output
,
axis
=
len
(
x
.
shape
)
-
1
)
return
output
class
_IdentityInModelParallel
(
PyLayer
):
@
staticmethod
def
forward
(
ctx
,
x
):
return
x
@
staticmethod
def
backward
(
ctx
,
dx
):
return
mp_reduce
(
dx
)
class
_ReduceInModelParallel
(
PyLayer
):
@
staticmethod
def
forward
(
ctx
,
x
):
return
mp_reduce
(
x
)
@
staticmethod
def
backward
(
ctx
,
dx
):
return
dx
class
_ScatterInModelParallel
(
PyLayer
):
@
staticmethod
def
forward
(
ctx
,
x
):
return
mp_split
(
x
)
@
staticmethod
def
backward
(
ctx
,
dx
):
return
mp_gather
(
dx
)
class
_GatherInModelParallel
(
PyLayer
):
@
staticmethod
def
forward
(
ctx
,
x
):
return
mp_gather
(
x
)
@
staticmethod
def
backward
(
ctx
,
dx
):
return
mp_split
(
dx
)
def
identity_in_model_parallel
(
x
):
return
_IdentityInModelParallel
.
apply
(
x
)
def
reduce_in_model_parallel
(
x
):
return
_ReduceInModelParallel
.
apply
(
x
)
def
scatter_in_model_parallel
(
x
):
return
_ScatterInModelParallel
.
apply
(
x
)
def
gather_in_model_parallel
(
x
):
return
_GatherInModelParallel
.
apply
(
x
)
python/paddle/distributed/fleet/meta_parallel/mp_utils/random.py
0 → 100644
浏览文件 @
66d46221
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
paddle
import
contextlib
__all__
=
[
'RNGStatesTracker'
,
'model_parallel_random_seed'
,
'get_rng_state_tracker'
]
MODEL_PARALLEL_RNG
=
'model_parallel_rng'
class
RNGStatesTracker
:
"""
Tracker the RNG states.
"""
def
__init__
(
self
):
# Map from name to the rng state.
self
.
states_
=
{}
self
.
seeds_
=
set
()
def
reset
(
self
):
self
.
states_
=
{}
self
.
seeds_
=
set
()
def
add
(
self
,
name
,
seed
):
if
seed
in
self
.
seeds_
:
raise
ValueError
(
'seed {} already exists'
.
format
(
seed
))
self
.
seeds_
.
add
(
seed
)
if
name
in
self
.
states_
:
raise
ValueError
(
'state {} already exists'
.
format
(
name
))
orig_rng_state
=
paddle
.
get_cuda_rng_state
()
paddle
.
seed
(
seed
)
self
.
states_
[
name
]
=
paddle
.
get_cuda_rng_state
()
paddle
.
set_cuda_rng_state
(
orig_rng_state
)
@
contextlib
.
contextmanager
def
rng_state
(
self
,
name
=
MODEL_PARALLEL_RNG
):
if
name
not
in
self
.
states_
:
raise
ValueError
(
'state {} does not exist'
.
format
(
name
))
orig_cuda_rng_state
=
paddle
.
get_cuda_rng_state
()
paddle
.
set_cuda_rng_state
(
self
.
states_
[
name
])
try
:
yield
finally
:
self
.
states_
[
name
]
=
paddle
.
get_cuda_rng_state
()
paddle
.
set_cuda_rng_state
(
orig_cuda_rng_state
)
RNG_STATE_TRACKER
=
RNGStatesTracker
()
def
get_rng_state_tracker
():
return
RNG_STATE_TRACKER
def
model_parallel_random_seed
(
seed
=
2048
):
import
paddle.distributed.fleet
as
fleet
hcg
=
fleet
.
get_hybrid_communicate_group
()
rank
=
hcg
.
get_model_parallel_rank
()
local_seed
=
seed
+
1024
+
rank
global_seed
=
seed
RNG_STATE_TRACKER
.
reset
()
paddle
.
seed
(
global_seed
)
RNG_STATE_TRACKER
.
add
(
MODEL_PARALLEL_RNG
,
local_seed
)
python/paddle/fluid/tests/unittests/CMakeLists.txt
浏览文件 @
66d46221
...
...
@@ -21,6 +21,7 @@ list(APPEND DIST_TEST_OPS test_gen_nccl_id_op)
list
(
APPEND DIST_TEST_OPS test_parallel_dygraph_unused_variables
)
list
(
APPEND DIST_TEST_OPS test_parallel_dygraph_control_flow
)
list
(
APPEND DIST_TEST_OPS test_parallel_dygraph_dataparallel
)
list
(
APPEND DIST_TEST_OPS test_parallel_dygraph_hybrid_parallel
)
set
(
MIXED_DIST_TEST_OPS
${
DIST_TEST_OPS
}
)
#remove distribute unittests.
list
(
APPEND MIXED_DIST_TEST_OPS test_dgc_op
)
...
...
@@ -166,6 +167,7 @@ if ((NOT WITH_GPU) AND (NOT WITH_ROCM))
LIST
(
REMOVE_ITEM TEST_OPS test_parallel_dygraph_sync_batch_norm
)
list
(
REMOVE_ITEM TEST_OPS test_parallel_dygraph_control_flow
)
list
(
REMOVE_ITEM TEST_OPS test_parallel_dygraph_dataparallel
)
list
(
REMOVE_ITEM TEST_OPS test_parallel_dygraph_hybrid_parallel
)
LIST
(
REMOVE_ITEM TEST_OPS test_imperative_auto_mixed_precision
)
LIST
(
REMOVE_ITEM TEST_OPS test_fleet_base_single
)
elseif
(
WITH_GPU
)
...
...
@@ -843,6 +845,7 @@ if(WITH_DISTRIBUTE AND WITH_GPU AND WITH_NCCL)
set_tests_properties
(
test_parallel_dygraph_dataparallel PROPERTIES TIMEOUT 120
)
set_tests_properties
(
test_parallel_dygraph_unused_variables PROPERTIES TIMEOUT 120
)
set_tests_properties
(
test_parallel_dygraph_control_flow PROPERTIES TIMEOUT 120
)
set_tests_properties
(
test_parallel_dygraph_hybrid_parallel PROPERTIES TIMEOUT 120 LABELS
"RUN_TYPE=DIST"
)
if
(
${
NCCL_VERSION
}
VERSION_GREATER_EQUAL 2212
)
set_tests_properties
(
test_parallel_dygraph_sparse_embedding PROPERTIES TIMEOUT 120
)
set_tests_properties
(
test_parallel_dygraph_transformer PROPERTIES TIMEOUT 120
)
...
...
python/paddle/fluid/tests/unittests/hybrid_communicate_group.py
→
python/paddle/fluid/tests/unittests/hybrid_
parallel_
communicate_group.py
浏览文件 @
66d46221
文件已移动
python/paddle/fluid/tests/unittests/hybrid_parallel_mp_layers.py
0 → 100644
浏览文件 @
66d46221
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
division
from
__future__
import
print_function
import
unittest
import
paddle
import
numpy
as
np
import
random
import
paddle.distributed
as
dist
import
paddle.fluid
as
fluid
import
paddle.distributed.fleet
as
fleet
from
paddle
import
framework
def
set_random_seed
(
seed
):
"""Set random seed for reproducability."""
random
.
seed
(
seed
)
np
.
random
.
seed
(
seed
)
paddle
.
seed
(
seed
)
fleet
.
meta_parallel
.
model_parallel_random_seed
(
seed
)
class
ColumnLinearNet
(
fluid
.
dygraph
.
Layer
):
def
__init__
(
self
,
input_size
,
output_size
,
global_dtype
):
super
(
ColumnLinearNet
,
self
).
__init__
()
self
.
parallel_linear
=
fleet
.
meta_parallel
.
ColumnParallelLinear
(
in_features
=
input_size
,
out_features
=
output_size
,
weight_attr
=
None
,
has_bias
=
True
,
gather_output
=
True
,
name
=
"test_column_linear"
)
def
forward
(
self
,
x
):
output
=
self
.
parallel_linear
(
x
)
return
output
class
RowLinearNet
(
fluid
.
dygraph
.
Layer
):
def
__init__
(
self
,
input_size
,
output_size
):
super
(
RowLinearNet
,
self
).
__init__
()
self
.
parallel_linear
=
fleet
.
meta_parallel
.
RowParallelLinear
(
in_features
=
input_size
,
out_features
=
output_size
,
has_bias
=
True
,
input_is_parallel
=
False
,
name
=
"test_row_linear"
)
def
forward
(
self
,
x
):
output
=
self
.
parallel_linear
(
x
)
return
output
class
EmbeddingNet
(
fluid
.
dygraph
.
Layer
):
def
__init__
(
self
,
vocab_size
,
hidden_size
):
super
(
EmbeddingNet
,
self
).
__init__
()
self
.
embedding
=
fleet
.
meta_parallel
.
VocabParallelEmbedding
(
vocab_size
,
hidden_size
)
def
forward
(
self
,
x
):
output
=
self
.
embedding
(
x
)
return
output
class
SimpleMatmul
(
fluid
.
dygraph
.
Layer
):
def
__init__
(
self
,
weight
,
output_size
,
global_dtype
):
super
(
SimpleMatmul
,
self
).
__init__
()
self
.
weight
=
paddle
.
create_parameter
(
shape
=
weight
.
shape
,
dtype
=
global_dtype
,
attr
=
paddle
.
ParamAttr
(
initializer
=
paddle
.
nn
.
initializer
.
Assign
(
weight
)))
self
.
bias
=
self
.
create_parameter
(
shape
=
[
output_size
],
dtype
=
global_dtype
,
attr
=
paddle
.
ParamAttr
(
initializer
=
paddle
.
nn
.
initializer
.
Constant
(
0.0
)))
def
forward
(
self
,
x
):
output
=
paddle
.
matmul
(
x
,
self
.
weight
)
+
self
.
bias
return
output
class
SimpleEmbedding
(
fluid
.
dygraph
.
Layer
):
def
__init__
(
self
,
vocab_size
,
hidden_size
,
weight
):
super
(
SimpleEmbedding
,
self
).
__init__
()
self
.
embedding
=
paddle
.
nn
.
Embedding
(
vocab_size
,
hidden_size
,
weight_attr
=
paddle
.
framework
.
ParamAttr
(
name
=
"origin_embedding"
,
initializer
=
paddle
.
nn
.
initializer
.
Assign
(
weight
)))
def
forward
(
self
,
x
):
output
=
self
.
embedding
(
x
)
return
output
class
TestDistTraning
(
unittest
.
TestCase
):
def
setUp
(
self
):
strategy
=
fleet
.
DistributedStrategy
()
self
.
model_parallel_size
=
2
strategy
.
hybrid_configs
=
{
"dp_degree"
:
1
,
"mp_degree"
:
self
.
model_parallel_size
,
"pp_degree"
:
1
}
fleet
.
init
(
is_collective
=
True
,
strategy
=
strategy
)
def
test_column_parallel_layer
(
self
):
set_random_seed
(
1024
)
global_dtype
=
"float32"
input_size_per_card
=
17
input_size
=
input_size_per_card
*
self
.
model_parallel_size
output_size_per_card
=
13
output_size
=
output_size_per_card
*
self
.
model_parallel_size
batch_size
=
4
model_a
=
ColumnLinearNet
(
input_size
,
output_size
,
global_dtype
)
# get w
check_group
=
dist
.
new_group
(
list
(
range
(
self
.
model_parallel_size
)))
integral_w
=
[]
partial_w
=
model_a
.
parallel_linear
.
weight
.
clone
().
detach
()
paddle
.
distributed
.
all_gather
(
integral_w
,
partial_w
,
group
=
check_group
)
integral_w
=
paddle
.
concat
(
integral_w
,
axis
=
1
)
model_b
=
SimpleMatmul
(
integral_w
,
output_size
,
global_dtype
)
optimizer_a
=
paddle
.
optimizer
.
SGD
(
learning_rate
=
0.001
,
parameters
=
model_a
.
parameters
())
optimizer_b
=
paddle
.
optimizer
.
SGD
(
learning_rate
=
0.001
,
parameters
=
model_b
.
parameters
())
for
idx
in
range
(
5
):
input
=
paddle
.
randn
([
batch_size
,
input_size
],
global_dtype
)
input
.
stop_gradient
=
True
output_a
=
model_a
(
input
)
loss_a
=
output_a
.
mean
()
loss_a
.
backward
()
output_b
=
model_b
(
input
)
loss_b
=
output_b
.
mean
()
loss_b
.
backward
()
optimizer_a
.
step
()
optimizer_b
.
step
()
np
.
testing
.
assert_allclose
(
loss_a
.
numpy
(),
loss_b
.
numpy
())
def
test_row_parallel_layer
(
self
):
global_dtype
=
"float32"
paddle
.
set_default_dtype
(
global_dtype
)
set_random_seed
(
1024
)
self
.
hcg
=
fleet
.
get_hybrid_communicate_group
()
self
.
word_size
=
self
.
hcg
.
get_model_parallel_world_size
()
self
.
rank_id
=
self
.
hcg
.
get_model_parallel_rank
()
input_size_per_card
=
17
input_size
=
input_size_per_card
*
self
.
model_parallel_size
output_size_per_card
=
13
output_size
=
output_size_per_card
*
self
.
model_parallel_size
batch_size
=
4
model_a
=
RowLinearNet
(
input_size
,
output_size
)
# get w
check_group
=
dist
.
new_group
(
list
(
range
(
self
.
model_parallel_size
)))
integral_w
=
[]
partial_w
=
model_a
.
parallel_linear
.
weight
.
clone
().
detach
()
paddle
.
distributed
.
all_gather
(
integral_w
,
partial_w
,
group
=
check_group
)
integral_w
=
paddle
.
concat
(
integral_w
,
axis
=
0
)
model_b
=
SimpleMatmul
(
integral_w
,
output_size
,
global_dtype
)
optimizer_a
=
paddle
.
optimizer
.
SGD
(
learning_rate
=
0.001
,
parameters
=
model_a
.
parameters
())
optimizer_b
=
paddle
.
optimizer
.
SGD
(
learning_rate
=
0.001
,
parameters
=
model_b
.
parameters
())
for
idx
in
range
(
5
):
input
=
paddle
.
randn
([
batch_size
,
input_size
],
global_dtype
)
input
.
stop_gradient
=
True
output_a
=
model_a
(
input
)
loss_a
=
output_a
.
mean
()
loss_a
.
backward
()
output_b
=
model_b
(
input
)
loss_b
=
output_b
.
mean
()
loss_b
.
backward
()
optimizer_a
.
step
()
optimizer_b
.
step
()
np
.
testing
.
assert_allclose
(
loss_a
.
numpy
(),
loss_b
.
numpy
(),
rtol
=
1e-5
)
def
test_parallel_embedding
(
self
):
batch_size
=
17
seq_length
=
23
vocab_size_per_card
=
2
vocab_size
=
vocab_size_per_card
*
self
.
model_parallel_size
hidden_size
=
2
seed
=
1236
set_random_seed
(
seed
)
rank_id
=
dist
.
get_rank
()
# model_a
model_a
=
EmbeddingNet
(
vocab_size
,
hidden_size
)
# model_b
check_group
=
dist
.
new_group
(
list
(
range
(
self
.
model_parallel_size
)))
integral_w
=
[]
partial_w
=
model_a
.
embedding
.
embedding
.
weight
.
clone
().
detach
()
paddle
.
distributed
.
all_gather
(
integral_w
,
partial_w
,
group
=
check_group
)
result_w
=
[]
for
idx
in
range
(
len
(
integral_w
)):
tmp
=
paddle
.
gather
(
integral_w
[
idx
],
paddle
.
to_tensor
(
list
(
range
(
vocab_size_per_card
))))
result_w
.
append
(
tmp
)
integral_w
=
paddle
.
concat
(
result_w
,
axis
=
0
)
model_b
=
SimpleEmbedding
(
vocab_size
,
hidden_size
,
integral_w
)
optimizer_a
=
paddle
.
optimizer
.
SGD
(
learning_rate
=
0.001
,
parameters
=
model_a
.
parameters
())
optimizer_b
=
paddle
.
optimizer
.
SGD
(
learning_rate
=
0.001
,
parameters
=
model_b
.
parameters
())
for
_
in
range
(
5
):
np_input_data
=
np
.
random
.
randint
(
0
,
vocab_size
,
(
batch_size
,
seq_length
))
input_data
=
paddle
.
to_tensor
(
np_input_data
,
dtype
=
"int32"
)
output_a
=
model_a
(
input_data
)
loss_a
=
output_a
.
mean
()
output_b
=
model_b
(
input_data
)
loss_b
=
output_b
.
mean
()
loss_a
.
backward
()
loss_b
.
backward
()
optimizer_a
.
step
()
optimizer_b
.
step
()
np
.
testing
.
assert_allclose
(
loss_a
.
numpy
(),
loss_b
.
numpy
(),
rtol
=
1e-6
)
if
__name__
==
'__main__'
:
unittest
.
main
()
python/paddle/fluid/tests/unittests/hybrid_parallel_mp_random.py
0 → 100644
浏览文件 @
66d46221
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
division
from
__future__
import
print_function
import
unittest
import
paddle
import
numpy
as
np
import
paddle.distributed
as
dist
import
paddle.fluid
as
fluid
import
paddle.distributed.fleet
as
fleet
import
random
class
TestDistTraning
(
unittest
.
TestCase
):
def
setUp
(
self
):
strategy
=
fleet
.
DistributedStrategy
()
self
.
model_parallel_size
=
2
strategy
.
hybrid_configs
=
{
"dp_degree"
:
1
,
"mp_degree"
:
self
.
model_parallel_size
,
"pp_degree"
:
1
}
fleet
.
init
(
is_collective
=
True
,
strategy
=
strategy
)
def
test_cuda_rng_tracker
(
self
):
seed_1
=
2021
seed_2
=
1024
size
=
[
20
,
15
]
paddle
.
seed
(
seed_1
)
target_11
=
paddle
.
randn
(
size
,
"float32"
)
target_12
=
paddle
.
randn
(
size
,
"float32"
)
paddle
.
seed
(
seed_2
)
target_21
=
paddle
.
randn
(
size
,
"float32"
)
target_22
=
paddle
.
randn
(
size
,
"float32"
)
paddle
.
seed
(
seed_1
)
fleet
.
meta_parallel
.
get_rng_state_tracker
().
add
(
"test"
,
seed_2
)
result_11
=
paddle
.
randn
(
size
,
"float32"
)
with
fleet
.
meta_parallel
.
get_rng_state_tracker
().
rng_state
(
"test"
):
result_21
=
paddle
.
randn
(
size
,
"float32"
)
result_12
=
paddle
.
randn
(
size
,
"float32"
)
with
fleet
.
meta_parallel
.
get_rng_state_tracker
().
rng_state
(
"test"
):
result_22
=
paddle
.
randn
(
size
,
"float32"
)
np
.
testing
.
assert_allclose
(
result_11
.
numpy
(),
target_11
.
numpy
())
np
.
testing
.
assert_allclose
(
result_12
.
numpy
(),
target_12
.
numpy
())
np
.
testing
.
assert_allclose
(
result_21
.
numpy
(),
target_21
.
numpy
())
np
.
testing
.
assert_allclose
(
result_22
.
numpy
(),
target_22
.
numpy
())
if
__name__
==
'__main__'
:
unittest
.
main
()
python/paddle/fluid/tests/unittests/test_fleet_distributed_strategy.py
浏览文件 @
66d46221
...
...
@@ -73,6 +73,17 @@ class TestStrategyConfig(unittest.TestCase):
strategy
.
pipeline_configs
=
configs
self
.
assertEqual
(
strategy
.
pipeline_configs
[
"accumulate_steps"
],
2
)
def
test_hybrid_parallel_configs
(
self
):
strategy
=
paddle
.
distributed
.
fleet
.
DistributedStrategy
()
strategy
.
hybrid_configs
=
{
"dp_degree"
:
1
,
"mp_degree"
:
2
,
"pp_degree"
:
4
}
self
.
assertEqual
(
strategy
.
hybrid_configs
[
"dp_degree"
],
1
)
self
.
assertEqual
(
strategy
.
hybrid_configs
[
"mp_degree"
],
2
)
self
.
assertEqual
(
strategy
.
hybrid_configs
[
"pp_degree"
],
4
)
def
test_localsgd
(
self
):
strategy
=
paddle
.
distributed
.
fleet
.
DistributedStrategy
()
strategy
.
localsgd
=
True
...
...
python/paddle/fluid/tests/unittests/test_new_group.sh
浏览文件 @
66d46221
...
...
@@ -17,4 +17,4 @@
set
-e
CUDA_VISIBLE_DEVICES
=
0,1 python
-m
paddle.distributed.launch
--gpus
=
0,1 new_group.py
CUDA_VISIBLE_DEVICES
=
0,1 python
-m
paddle.distributed.launch
--gpus
=
0,1 hybrid_communicate_group.py
CUDA_VISIBLE_DEVICES
=
0,1 python
-m
paddle.distributed.launch
--gpus
=
0,1 hybrid_
parallel_
communicate_group.py
python/paddle/fluid/tests/unittests/test_parallel_dygraph_hybrid_parallel.py
0 → 100644
浏览文件 @
66d46221
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
print_function
import
unittest
import
time
import
paddle.fluid
as
fluid
from
test_parallel_dygraph_dataparallel
import
TestMultipleGpus
class
TestHybridParallel
(
TestMultipleGpus
):
def
test_hybrid_parallel_mp_layers
(
self
):
self
.
run_mnist_2gpu
(
'hybrid_parallel_mp_layers.py'
)
def
test_hybrid_parallel_mp_random
(
self
):
self
.
run_mnist_2gpu
(
'hybrid_parallel_mp_random.py'
)
if
__name__
==
"__main__"
:
unittest
.
main
()
python/setup.py.in
浏览文件 @
66d46221
...
...
@@ -156,6 +156,8 @@ packages=['paddle',
'paddle.distributed.fleet.metrics',
'paddle.distributed.fleet.proto',
'paddle.distributed.fleet.utils',
'paddle.distributed.fleet.meta_parallel',
'paddle.distributed.fleet.meta_parallel.mp_utils',
'paddle.framework',
'paddle.jit',
'paddle.jit.dy2static',
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录