Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
机器未来
Paddle
提交
cc9c6196
P
Paddle
项目概览
机器未来
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1
Issue
1
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
cc9c6196
编写于
12月 01, 2020
作者:
1
123malin
提交者:
GitHub
12月 01, 2020
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
test=develop, fix doc (#29200)
* fix fleet api doc
上级
c0a991c8
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
131 addition
and
76 deletion
+131
-76
python/paddle/distributed/fleet/base/distributed_strategy.py
python/paddle/distributed/fleet/base/distributed_strategy.py
+61
-17
python/paddle/distributed/fleet/base/fleet_base.py
python/paddle/distributed/fleet/base/fleet_base.py
+70
-59
未找到文件。
python/paddle/distributed/fleet/base/distributed_strategy.py
浏览文件 @
cc9c6196
...
...
@@ -107,7 +107,7 @@ class DistributedStrategy(object):
All of the distributed training configurations can be configured in DistributedStrategy,
such as automatic mixed precision (AMP), Layer-wise Adaptive Rate Scaling (LARS),
asynchronous update parameter server(ASGD), etc.
DistributedStrategy can be serialized into protobuf file or deserialized from protobuf file
Users who run local training usually configure BuildStrategy and ExecutionStrategy, and
...
...
@@ -128,8 +128,9 @@ class DistributedStrategy(object):
Serialize current DistributedStrategy to string and save to output file
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.dgc = True
...
...
@@ -145,6 +146,7 @@ class DistributedStrategy(object):
Load from prototxt file for DistributedStrategy initialization
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -161,10 +163,11 @@ class DistributedStrategy(object):
Configure ExecutionStrategy for DistributedStrategy
Examples:
.. code-block:: python
import paddle
exe_strategy = paddle.
fluid
.ExecutionStrategy()
exe_strategy = paddle.
static
.ExecutionStrategy()
exe_strategy.num_threads = 10
exe_strategy.num_iteration_per_drop_scope = 10
exe_strategy.num_iteration_per_run = 10
...
...
@@ -195,10 +198,11 @@ class DistributedStrategy(object):
only if the property is non-distributed strategy.
Examples:
.. code-block:: python
import paddle
build_strategy = paddle.
fluid
.BuildStrategy()
build_strategy = paddle.
static
.BuildStrategy()
build_strategy.enable_sequential_execution = True
build_strategy.fuse_elewise_add_act_ops = True
build_strategy.fuse_bn_act_ops = True
...
...
@@ -207,7 +211,7 @@ class DistributedStrategy(object):
build_strategy.fuse_broadcast_ops = True
build_strategy.fuse_all_optimizer_ops = True
build_strategy.enable_inplace = True
strategy = paddle.distributed.fleet.DistributedStrategy()
strategy.build_strategy = build_strategy
"""
...
...
@@ -240,6 +244,7 @@ class DistributedStrategy(object):
Default value: True
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -248,7 +253,7 @@ class DistributedStrategy(object):
strategy = fleet.DistributedStrategy()
strategy.a_sync = True # by default this is True
# code block for defining loss and local optimizer
# sgd = fleet.distributed_optimizer(optimizer, strategy)
"""
...
...
@@ -288,6 +293,7 @@ class DistributedStrategy(object):
runtime_split_send_recv(bool): if we are using Tensor split for send and recv during runtime
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -319,6 +325,7 @@ class DistributedStrategy(object):
Default Value: False
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -360,6 +367,7 @@ class DistributedStrategy(object):
custom_black_list(list[str]): Users' custom black list which forbidden execution fp16.
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -384,6 +392,7 @@ class DistributedStrategy(object):
Default value: False
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -401,6 +410,7 @@ class DistributedStrategy(object):
We note that system overhead is usually lower when sync_nccl_allreduce = True
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -425,6 +435,7 @@ class DistributedStrategy(object):
allreduce among the leaders of each group
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -450,6 +461,7 @@ class DistributedStrategy(object):
Default value: number of GPU cards on each single GPU machine
Example:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -472,10 +484,11 @@ class DistributedStrategy(object):
def
sync_batch_norm
(
self
):
"""
Indicating whether we are using sync_batch_norm to do synchronous batch normalization among all training nodes.
Default value: False
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -500,6 +513,7 @@ class DistributedStrategy(object):
Default value: True
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -524,8 +538,9 @@ class DistributedStrategy(object):
Default value: 32
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.fuse_grad_size_in_MB = 50
...
...
@@ -562,8 +577,9 @@ class DistributedStrategy(object):
Default value: 1
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.nccl_comm_num = 2
...
...
@@ -594,8 +610,9 @@ class DistributedStrategy(object):
implementation should have some manually assign checkpoints
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.recompute = True
...
...
@@ -622,8 +639,9 @@ class DistributedStrategy(object):
Default value: False
Examples:
.. code-block:: python
import paddle.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.sharding = True
...
...
@@ -649,8 +667,9 @@ class DistributedStrategy(object):
and should be an empirical value decided by your model size and network topology.
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.sharding = True
...
...
@@ -674,8 +693,9 @@ class DistributedStrategy(object):
device_guard information in user-defined program.
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.pipeline = True
...
...
@@ -709,8 +729,9 @@ class DistributedStrategy(object):
**micro_batch**: the number of small batches in each user defined batch
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.pipeline = True
...
...
@@ -736,6 +757,7 @@ class DistributedStrategy(object):
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -764,6 +786,7 @@ class DistributedStrategy(object):
begin_step(int) The step of begining training by localsgd. Default 1.
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -791,6 +814,7 @@ class DistributedStrategy(object):
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -821,6 +845,7 @@ class DistributedStrategy(object):
begin_step(int) The step of begining training by adaptive localsgd. Default 1.
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -848,6 +873,7 @@ class DistributedStrategy(object):
Default Value: False
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -884,6 +910,7 @@ class DistributedStrategy(object):
element will be transmitted.
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -906,6 +933,7 @@ class DistributedStrategy(object):
Default Value: False
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -935,6 +963,7 @@ class DistributedStrategy(object):
to model parameters.
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -963,6 +992,7 @@ class DistributedStrategy(object):
avg(bool): whether to average the gradients of each mini-batch, the default value is `True`
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -989,6 +1019,7 @@ class DistributedStrategy(object):
Default Value: False
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -1019,6 +1050,7 @@ class DistributedStrategy(object):
will be exclude from weight decay in lars formula.
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -1048,8 +1080,9 @@ class DistributedStrategy(object):
[Large Batch Optimization for Deep Learning: Training BERT in 76 minutes](https://arxiv.org/abs/1904.00962).
Default Value: False
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -1078,6 +1111,7 @@ class DistributedStrategy(object):
will be exclude from weight decay in lamb formula.
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -1123,11 +1157,12 @@ class DistributedStrategy(object):
Default Value: False
Examples:
.. code-block:: python
import paddle
import paddle.distributed.fleet as fleet
paddle.enable_static()
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.auto = True
...
...
@@ -1156,8 +1191,11 @@ class DistributedStrategy(object):
Default Value: True
Examples:
.. code-block:: python
import paddle
paddle.enable_static()
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.cudnn_exhaustive_search = False
...
...
@@ -1187,15 +1225,18 @@ class DistributedStrategy(object):
Default Value: 4000
Examples:
.. code-block:: python
import paddle
paddle.enable_static()
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.conv_workspace_size_limit = 1024
optimizer = paddle.optimizer.SGD(learning_rate=0.01)
optimizer = fleet.distributed_optimizer(optimizer, strategy)
"""
return
self
.
strategy
.
conv_workspace_size_limit
...
...
@@ -1217,8 +1258,11 @@ class DistributedStrategy(object):
Default Value: True
Examples:
.. code-block:: python
import paddle
paddle.enable_static()
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.cudnn_batchnorm_spatial_persistent = True
...
...
python/paddle/distributed/fleet/base/fleet_base.py
浏览文件 @
cc9c6196
...
...
@@ -69,8 +69,11 @@ class Fleet(object):
Fleet: A Fleet instance
Example for collective training:
.. code-block:: python
import paddle
paddle.enable_static()
import paddle.distributed.fleet as fleet
fleet.init(is_collective=True)
...
...
@@ -86,6 +89,8 @@ class Fleet(object):
.. code-block:: python
import paddle
paddle.enable_static()
import paddle.distributed.fleet as fleet
fleet.init()
...
...
@@ -159,7 +164,7 @@ class Fleet(object):
.. code-block:: python
import paddle.distributed.fleet as fleet
role = fleet.PaddleCloudRoleMaker
role = fleet.PaddleCloudRoleMaker
()
fleet.init(role)
"""
...
...
@@ -233,6 +238,7 @@ class Fleet(object):
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
fleet.init()
fleet.worker_index()
...
...
@@ -246,8 +252,9 @@ class Fleet(object):
Returns:
int: worker numbers
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -266,6 +273,7 @@ class Fleet(object):
False if not.
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -283,6 +291,7 @@ class Fleet(object):
list/string: server endpoints
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -303,10 +312,12 @@ class Fleet(object):
int: server number
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
fleet.init()
fleet.server_num()
import paddle.distributed.fleet as fleet
fleet.init()
fleet.server_num()
"""
return
len
(
self
.
_role_maker
.
_get_pserver_endpoints
())
...
...
@@ -318,6 +329,7 @@ class Fleet(object):
int: node id
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -335,6 +347,7 @@ class Fleet(object):
list/string: server endpoints
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
...
...
@@ -359,6 +372,7 @@ class Fleet(object):
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
fleet.init()
fleet.is_server()
...
...
@@ -510,21 +524,21 @@ class Fleet(object):
def
save_persistables
(
self
,
executor
,
dirname
,
main_program
=
None
,
mode
=
1
):
"""
saves all persistable
variable
s from :code:`main_program` to
saves all persistable
tensor
s from :code:`main_program` to
the folder :code:`dirname`. You can refer to
The :code:`dirname` is used to specify the folder where persistable
variable
s
are going to be saved. If you would like to save
variable
s in separate
The :code:`dirname` is used to specify the folder where persistable
tensor
s
are going to be saved. If you would like to save
tensor
s in separate
files, set :code:`filename` None.
Args:
executor(Executor): The executor to run for saving persistable
variable
s.
executor(Executor): The executor to run for saving persistable
tensor
s.
You can refer to :ref:`api_guide_executor_en` for
more details.
dirname(str, optional): The saving directory path.
When you need to save the parameter to the memory, set it to None.
main_program(Program, optional): The program whose persistbale
variable
s will
main_program(Program, optional): The program whose persistbale
tensor
s will
be saved. Default: None.
...
...
@@ -535,16 +549,17 @@ class Fleet(object):
.. code-block:: text
import paddle
paddle.enable_static()
import paddle.distributed.fleet as fleet
import paddle.fluid as fluid
fleet.init()
# build net
# fleet.distributed_optimizer(...)
exe =
fluid.Executor(fluid
.CPUPlace())
fleet.save_persistables(exe, "dirname",
fluid
.default_main_program())
exe =
paddle.static.Executor(paddle
.CPUPlace())
fleet.save_persistables(exe, "dirname",
paddle.static
.default_main_program())
"""
...
...
@@ -569,9 +584,9 @@ class Fleet(object):
.. code-block:: python
import paddle
import paddle.distributed.fleet as fleet
role = fleet.role_maker.PaddleCloudRoleMaker(is_collective=True)
fleet.init(role)
fleet.init(is_collective=True)
strategy = fleet.DistributedStrategy()
optimizer = paddle.optimizer.SGD(learning_rate=0.001)
optimizer = fleet.distributed_optimizer(optimizer, strategy=strategy)
...
...
@@ -621,23 +636,20 @@ class Fleet(object):
def forward(self, x):
return self._linear2(self._linear1(x))
# 1. enable dynamic mode
paddle.disable_static()
# 2. initialize fleet environment
# 1. initialize fleet environment
fleet.init(is_collective=True)
#
3
. create layer & optimizer
#
2
. create layer & optimizer
layer = LinearNet()
loss_fn = nn.MSELoss()
adam = paddle.optimizer.Adam(
learning_rate=0.001, parameters=layer.parameters())
#
4
. get data_parallel model using fleet
#
3
. get data_parallel model using fleet
adam = fleet.distributed_optimizer(adam)
dp_layer = fleet.distributed_model(layer)
#
5
. run layer
#
4
. run layer
inputs = paddle.randn([10, 10], 'float32')
outputs = dp_layer(inputs)
labels = paddle.randn([10, 1], 'float32')
...
...
@@ -675,11 +687,10 @@ class Fleet(object):
import paddle
from paddle.distributed import fleet
paddle.disable_static()
fleet.init(is_collective=True)
value = np.arange(26).reshape(2, 13).astype("float32")
a = paddle.
fluid.dygraph.to_variable
(value)
a = paddle.
to_tensor
(value)
layer = paddle.nn.Linear(13, 5)
adam = paddle.optimizer.Adam(learning_rate=0.01, parameters=layer.parameters())
...
...
@@ -710,11 +721,10 @@ class Fleet(object):
import paddle
from paddle.distributed import fleet
paddle.disable_static()
fleet.init(is_collective=True)
value = np.arange(26).reshape(2, 13).astype("float32")
a = paddle.
fluid.dygraph.to_variable
(value)
a = paddle.
to_tensor
(value)
layer = paddle.nn.Linear(13, 5)
adam = paddle.optimizer.Adam(learning_rate=0.01, parameters=layer.parameters())
...
...
@@ -722,9 +732,9 @@ class Fleet(object):
adam = fleet.distributed_optimizer(adam)
dp_layer = fleet.distributed_model(layer)
state_dict = adam.state_dict()
paddle.
framework.
save(state_dict, "paddle_dy")
para_state_dict
, opti_state_dict = paddle.framework.load(
"paddle_dy")
adam.set_state_dict(
opti
_state_dict)
paddle.save(state_dict, "paddle_dy")
para_state_dict
= paddle.load(
"paddle_dy")
adam.set_state_dict(
para
_state_dict)
"""
# imitate target optimizer retrieval
return
self
.
user_defined_optimizer
.
set_state_dict
(
state_dict
)
...
...
@@ -748,11 +758,10 @@ class Fleet(object):
import paddle
from paddle.distributed import fleet
paddle.disable_static()
fleet.init(is_collective=True)
value = np.arange(26).reshape(2, 13).astype("float32")
a = paddle.
fluid.dygraph.to_variable
(value)
a = paddle.
to_tensor
(value)
layer = paddle.nn.Linear(13, 5)
adam = paddle.optimizer.Adam(learning_rate=0.01, parameters=layer.parameters())
...
...
@@ -785,17 +794,17 @@ class Fleet(object):
float: The learning rate of the current step.
Examples:
.. code-block:: python
import numpy as np
import paddle
from paddle.distributed import fleet
paddle.disable_static()
fleet.init(is_collective=True)
value = np.arange(26).reshape(2, 13).astype("float32")
a = paddle.
fluid.dygraph.to_variable
(value)
a = paddle.
to_tensor
(value)
layer = paddle.nn.Linear(13, 5)
adam = paddle.optimizer.Adam(learning_rate=0.01, parameters=layer.parameters())
...
...
@@ -819,6 +828,7 @@ class Fleet(object):
None
Examples:
.. code-block:: python
import paddle
...
...
@@ -834,23 +844,20 @@ class Fleet(object):
def forward(self, x):
return self._linear2(self._linear1(x))
# 1. enable dynamic mode
paddle.disable_static()
# 2. initialize fleet environment
# 1. initialize fleet environment
fleet.init(is_collective=True)
#
3
. create layer & optimizer
#
2
. create layer & optimizer
layer = LinearNet()
loss_fn = nn.MSELoss()
adam = paddle.optimizer.Adam(
learning_rate=0.001, parameters=layer.parameters())
#
4
. get data_parallel model using fleet
#
3
. get data_parallel model using fleet
adam = fleet.distributed_optimizer(adam)
dp_layer = fleet.distributed_model(layer)
#
5
. run layer
#
4
. run layer
inputs = paddle.randn([10, 10], 'float32')
outputs = dp_layer(inputs)
labels = paddle.randn([10, 1], 'float32')
...
...
@@ -878,6 +885,7 @@ class Fleet(object):
None
Examples:
.. code-block:: python
import paddle
...
...
@@ -893,23 +901,20 @@ class Fleet(object):
def forward(self, x):
return self._linear2(self._linear1(x))
# 1. enable dynamic mode
paddle.disable_static()
# 2. initialize fleet environment
# 1. initialize fleet environment
fleet.init(is_collective=True)
#
3
. create layer & optimizer
#
2
. create layer & optimizer
layer = LinearNet()
loss_fn = nn.MSELoss()
adam = paddle.optimizer.Adam(
learning_rate=0.001, parameters=layer.parameters())
#
4
. get data_parallel model using fleet
#
3
. get data_parallel model using fleet
adam = fleet.distributed_optimizer(adam)
dp_layer = fleet.distributed_model(layer)
#
5
. run layer
#
4
. run layer
inputs = paddle.randn([10, 10], 'float32')
outputs = dp_layer(inputs)
labels = paddle.randn([10, 1], 'float32')
...
...
@@ -962,38 +967,44 @@ class Fleet(object):
Add distributed operations to minimize ``loss`` by updating ``parameter_list``.
Args:
loss (
Variable): A ``Variable
`` containing the value to minimize.
loss (
Tensor): A ``Tensor
`` containing the value to minimize.
startup_program (Program, optional): :ref:`api_fluid_Program` for
initializing parameters in ``parameter_list``. The default value
is None, at this time :ref:`api_fluid_default_startup_program` will be used.
parameter_list (Iterable, optional): Iterable of ``
Variable`` or ``Variable
.name`` to update
parameter_list (Iterable, optional): Iterable of ``
Tensor`` or ``Tensor
.name`` to update
to minimize ``loss``. The default value is None, at this time all parameters
will be updated.
no_grad_set (set, optional): Set of ``
Variable`` or ``Variable
.name`` that don't need
no_grad_set (set, optional): Set of ``
Tensor`` or ``Tensor
.name`` that don't need
to be updated. The default value is None.
Returns:
tuple: tuple (optimize_ops, params_grads), A list of operators appended
by minimize and a list of (param, grad)
variable
pairs, param is
by minimize and a list of (param, grad)
tensor
pairs, param is
``Parameter``, grad is the gradient value corresponding to the parameter.
The returned tuple can be passed to ``fetch_list`` in ``Executor.run()`` to
indicate program pruning. If so, the program will be pruned by ``feed`` and
``fetch_list`` before run, see details in ``Executor``.
Examples:
.. code-block:: python
import paddle
paddle.enable_static()
import paddle.distributed.fleet as fleet
import paddle.nn.functional as F
hid_dim = 10
label_dim = 2
input_x = paddle.static.data(name='x', shape=[None, 13], dtype='float32')
input_y = paddle.static.data(name='y', shape=[None, 1], dtype='int64')
fc_1 = paddle.static.nn.fc(x=input_x, size=hid_dim, activation='tanh')
fc_2 = paddle.static.nn.fc(x=fc_1, size=hid_dim, activation='tanh')
prediction = paddle.static.nn.fc(x=[fc_2], size=label_dim, activation='softmax')
cost = F.cross_entropy(input=prediction, label=input_y)
avg_cost = paddle.mean(x=cost)
fc_1 = paddle.fluid.layers.fc(input=input_x, size=hid_dim, act='tanh')
fc_2 = paddle.fluid.layers.fc(input=fc_1, size=hid_dim, act='tanh')
prediction = paddle.fluid.layers.fc(input=[fc_2], size=label_dim, act='softmax')
cost = paddle.fluid.layers.cross_entropy(input=prediction, label=input_y)
avg_cost = paddle.fluid.layers.mean(x=cost)
role = fleet.role_maker.PaddleCloudRoleMaker(is_collective=True)
fleet.init(role)
fleet.init(is_collective=True)
strategy = fleet.DistributedStrategy()
optimizer = paddle.optimizer.SGD(learning_rate=0.001)
optimizer = fleet.distributed_optimizer(optimizer, strategy=strategy)
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录