Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
Paddle
提交
9b3b3b74
P
Paddle
项目概览
PaddlePaddle
/
Paddle
1 年多 前同步成功
通知
2302
Star
20931
Fork
5422
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1423
列表
看板
标记
里程碑
合并请求
543
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1,423
Issue
1,423
列表
看板
标记
里程碑
合并请求
543
合并请求
543
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
9b3b3b74
编写于
10月 14, 2020
作者:
H
Huihuang Zheng
提交者:
GitHub
10月 14, 2020
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Refine ParallelExecutor English Doc for 2.0RC (#27862)
As the title
上级
ae01801f
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
65 addition
and
61 deletion
+65
-61
python/paddle/fluid/parallel_executor.py
python/paddle/fluid/parallel_executor.py
+65
-61
未找到文件。
python/paddle/fluid/parallel_executor.py
浏览文件 @
9b3b3b74
...
@@ -30,7 +30,7 @@ class ParallelExecutor(object):
...
@@ -30,7 +30,7 @@ class ParallelExecutor(object):
"""
"""
:api_attr: Static Graph
:api_attr: Static Graph
The ParallelExecutor is an upgraded version of :code:`
fluid
.Executor` that supports multi-node model
The ParallelExecutor is an upgraded version of :code:`
paddle.static
.Executor` that supports multi-node model
training and testing based on the data-parallel mode. In data-parallel mode,
training and testing based on the data-parallel mode. In data-parallel mode,
ParallelExecutor will broadcast the parameters from Node0 to other nodes during
ParallelExecutor will broadcast the parameters from Node0 to other nodes during
construction and copy the input Program to other nodes from Node0 to make sure
construction and copy the input Program to other nodes from Node0 to make sure
...
@@ -50,12 +50,12 @@ class ParallelExecutor(object):
...
@@ -50,12 +50,12 @@ class ParallelExecutor(object):
Args:
Args:
use_cuda (bool): Whether to use CUDA or not.
use_cuda (bool): Whether to use CUDA or not.
loss_name (str): This parameter is the name of the loss
variable
of the
loss_name (str): This parameter is the name of the loss
Tensor
of the
model. **Note: If it is data-parallel model training, you must set loss_name,
model. **Note: If it is data-parallel model training, you must set loss_name,
otherwise, the results may be wrong**. The default is None.
otherwise, the results may be wrong**. The default is None.
main_program (Program): This parameter represents the Program to be executed.
main_program (Program): This parameter represents the Program to be executed.
If this parameter is not provided, that parameter is None, the program will
If this parameter is not provided, that parameter is None, the program will
be set to :code:`
fluid
.default_main_program()`. The default is None.
be set to :code:`
paddle.static
.default_main_program()`. The default is None.
share_vars_from(ParallelExecutor): If share_vars_from is set, the current
share_vars_from(ParallelExecutor): If share_vars_from is set, the current
ParallelExecutor will share the parameters with the ParallelExecutor
ParallelExecutor will share the parameters with the ParallelExecutor
specified by share_vars_from. This parameter needs to be set when model testing
specified by share_vars_from. This parameter needs to be set when model testing
...
@@ -66,13 +66,13 @@ class ParallelExecutor(object):
...
@@ -66,13 +66,13 @@ class ParallelExecutor(object):
The default is None.
The default is None.
exec_strategy(ExecutionStrategy): exec_strategy specifies the options that can
exec_strategy(ExecutionStrategy): exec_strategy specifies the options that can
be changed when running the current model, such as the thread pool size.
be changed when running the current model, such as the thread pool size.
For more information about exec_strategy, please refer to :code:`
fluid
.ExecutionStrategy`.
For more information about exec_strategy, please refer to :code:`
paddle.static
.ExecutionStrategy`.
The default is None.
The default is None.
build_strategy(BuildStrategy): By configuring build_strategy, we can
build_strategy(BuildStrategy): By configuring build_strategy, we can
optimize the computational graph, such as operators' fusion in the
optimize the computational graph, such as operators' fusion in the
computational graph and memory optimization during the execution
computational graph and memory optimization during the execution
of the computational graph. For more information about build_strategy,
of the computational graph. For more information about build_strategy,
please refer to :code:`
fluid
.BuildStrategy`. The default is None.
please refer to :code:`
paddle.static
.BuildStrategy`. The default is None.
num_trainers(int): This parameter needs to be set in GPU distributed training.
num_trainers(int): This parameter needs to be set in GPU distributed training.
If the parameter value is greater than 1, NCCL will be initialized by multi-level
If the parameter value is greater than 1, NCCL will be initialized by multi-level
nodes. Each node should have the same number of GPUs. The default is 1.
nodes. Each node should have the same number of GPUs. The default is 1.
...
@@ -81,7 +81,7 @@ class ParallelExecutor(object):
...
@@ -81,7 +81,7 @@ class ParallelExecutor(object):
Trainer_id indicates the "rank" of the current node. The trainer_id starts
Trainer_id indicates the "rank" of the current node. The trainer_id starts
counting from 0. The default is 0.
counting from 0. The default is 0.
scope(Scope): Specifies the scope in which the program is executed.
scope(Scope): Specifies the scope in which the program is executed.
The default is
fluid
.global_scope().
The default is
paddle.static
.global_scope().
Returns:
Returns:
ParallelExecutor: The initialized ParallelExecutor object.
ParallelExecutor: The initialized ParallelExecutor object.
...
@@ -101,15 +101,16 @@ class ParallelExecutor(object):
...
@@ -101,15 +101,16 @@ class ParallelExecutor(object):
Examples:
Examples:
.. code-block:: python
.. code-block:: python
import paddle
.fluid as fluid
import paddle
import numpy
import numpy
import os
import os
use_cuda = True
use_cuda = True
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
paddle.enable_static()
place = paddle.CUDAPlace(0) if use_cuda else paddle.CPUPlace()
# NOTE: If you use CPU to run the program, you need
# NOTE: If you use CPU to run the program, you need
# to specify the CPU_NUM, otherwise,
fluid
will use
# to specify the CPU_NUM, otherwise,
PaddlePaddle
will use
# all the number of the logic core as the CPU_NUM,
# all the number of the logic core as the CPU_NUM,
# in that case, the batch size of the input should be
# in that case, the batch size of the input should be
# greater than CPU_NUM, if not, the process will be
# greater than CPU_NUM, if not, the process will be
...
@@ -117,26 +118,26 @@ class ParallelExecutor(object):
...
@@ -117,26 +118,26 @@ class ParallelExecutor(object):
if not use_cuda:
if not use_cuda:
os.environ['CPU_NUM'] = str(2)
os.environ['CPU_NUM'] = str(2)
exe =
fluid
.Executor(place)
exe =
paddle.static
.Executor(place)
train_program =
fluid
.Program()
train_program =
paddle.static
.Program()
startup_program =
fluid
.Program()
startup_program =
paddle.static
.Program()
with
fluid
.program_guard(train_program, startup_program):
with
paddle.static
.program_guard(train_program, startup_program):
data =
fluid
.data(name='X', shape=[None, 1], dtype='float32')
data =
paddle.static
.data(name='X', shape=[None, 1], dtype='float32')
hidden =
fluid.layers.fc(input=data, size=
10)
hidden =
paddle.static.nn.fc(data,
10)
loss =
fluid.layers
.mean(hidden)
loss =
paddle
.mean(hidden)
test_program =
fluid
.default_main_program().clone(for_test=True)
test_program =
paddle.static
.default_main_program().clone(for_test=True)
fluid
.optimizer.SGD(learning_rate=0.01).minimize(loss)
paddle
.optimizer.SGD(learning_rate=0.01).minimize(loss)
startup_program.random_seed=1
exe.run(startup_program)
exe.run(startup_program)
train_exe = fluid.ParallelExecutor(use_cuda=use_cuda,
train_exe = paddle.static.ParallelExecutor(use_cuda=use_cuda,
main_program=train_program,
main_program=train_program,
loss_name=loss.name)
loss_name=loss.name)
test_exe = fluid.ParallelExecutor(use_cuda=use_cuda,
# Note: if share_vars_from is not set here, the test parameter is different to the train one
main_program=test_program,
test_exe = paddle.static.ParallelExecutor(use_cuda=use_cuda,
share_vars_from=train_exe)
main_program=test_program,
share_vars_from=train_exe)
x = numpy.random.random(size=(10, 1)).astype('float32')
x = numpy.random.random(size=(10, 1)).astype('float32')
loss_data, = train_exe.run(feed={"X": x},
loss_data, = train_exe.run(feed={"X": x},
...
@@ -205,11 +206,11 @@ class ParallelExecutor(object):
...
@@ -205,11 +206,11 @@ class ParallelExecutor(object):
fetch_list.
fetch_list.
Args:
Args:
fetch_list(list): This parameter represents the
variable
s that need to be returned
fetch_list(list): This parameter represents the
Tensor
s that need to be returned
after the model runs. The default is None.
after the model runs. The default is None.
feed(list|dict): This parameter represents the input
variable
s of the model.
feed(list|dict): This parameter represents the input
Tensor
s of the model.
If it is single card training, the feed is dict type, and if it is multi-card
If it is single card training, the feed is dict type, and if it is multi-card
training, the parameter feed can be dict or list
type variable
. If the
training, the parameter feed can be dict or list
of Tensor
. If the
parameter type is dict, the data in the feed will be split and sent to
parameter type is dict, the data in the feed will be split and sent to
multiple devices (CPU/GPU), that is to say, the input data will be evenly
multiple devices (CPU/GPU), that is to say, the input data will be evenly
sent to different devices, so you should make sure the number of samples of
sent to different devices, so you should make sure the number of samples of
...
@@ -219,8 +220,8 @@ class ParallelExecutor(object):
...
@@ -219,8 +220,8 @@ class ParallelExecutor(object):
The default is None.
The default is None.
feed_dict: Alias for feed parameter, for backward compatibility.
feed_dict: Alias for feed parameter, for backward compatibility.
This parameter has been deprecated. Default None.
This parameter has been deprecated. Default None.
return_numpy(bool): This parameter indicates whether convert the fetched
variable
s
return_numpy(bool): This parameter indicates whether convert the fetched
Tensor
s
(the
variable
specified in the fetch list) to numpy.ndarray. if it is False,
(the
Tensor
specified in the fetch list) to numpy.ndarray. if it is False,
the type of the return value is a list of :code:`LoDTensor`. The default is True.
the type of the return value is a list of :code:`LoDTensor`. The default is True.
Returns:
Returns:
...
@@ -241,22 +242,23 @@ class ParallelExecutor(object):
...
@@ -241,22 +242,23 @@ class ParallelExecutor(object):
number of CPU cores or GPU cards, if it is less than, it is recommended that
number of CPU cores or GPU cards, if it is less than, it is recommended that
the batch be discarded.
the batch be discarded.
2. If the number of CPU cores or GPU cards available is greater than 1, the fetch
2. If the number of CPU cores or GPU cards available is greater than 1, the fetch
results are spliced together in dimension 0 for the same
variable
values
results are spliced together in dimension 0 for the same
Tensor
values
(
variable
s in fetch_list) on different devices.
(
Tensor
s in fetch_list) on different devices.
Examples:
Examples:
.. code-block:: python
.. code-block:: python
import paddle
.fluid as fluid
import paddle
import numpy
import numpy
import os
import os
use_cuda = True
use_cuda = True
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
paddle.enable_static()
place = paddle.CUDAPlace(0) if use_cuda else paddle.CPUPlace()
# NOTE: If you use CPU to run the program, you need
# NOTE: If you use CPU to run the program, you need
# to specify the CPU_NUM, otherwise,
fluid
will use
# to specify the CPU_NUM, otherwise,
PaddlePaddle
will use
# all the number of the logic core as the CPU_NUM,
# all the number of the logic core as the CPU_NUM,
# in that case, the batch size of the input should be
# in that case, the batch size of the input should be
# greater than CPU_NUM, if not, the process will be
# greater than CPU_NUM, if not, the process will be
...
@@ -264,21 +266,21 @@ class ParallelExecutor(object):
...
@@ -264,21 +266,21 @@ class ParallelExecutor(object):
if not use_cuda:
if not use_cuda:
os.environ['CPU_NUM'] = str(2)
os.environ['CPU_NUM'] = str(2)
exe =
fluid
.Executor(place)
exe =
paddle.static
.Executor(place)
train_program =
fluid
.Program()
train_program =
paddle.static
.Program()
startup_program =
fluid
.Program()
startup_program =
paddle.static
.Program()
with
fluid
.program_guard(train_program, startup_program):
with
paddle.static
.program_guard(train_program, startup_program):
data =
fluid
.data(name='X', shape=[None, 1], dtype='float32')
data =
paddle.static
.data(name='X', shape=[None, 1], dtype='float32')
hidden =
fluid.layers.fc(input=data, size=
10)
hidden =
paddle.static.nn.fc(data,
10)
loss =
fluid.layers
.mean(hidden)
loss =
paddle
.mean(hidden)
fluid
.optimizer.SGD(learning_rate=0.01).minimize(loss)
paddle
.optimizer.SGD(learning_rate=0.01).minimize(loss)
exe.run(startup_program)
exe.run(startup_program)
train_exe =
fluid
.ParallelExecutor(use_cuda=use_cuda,
train_exe =
paddle.static
.ParallelExecutor(use_cuda=use_cuda,
main_program=train_program,
main_program=train_program,
loss_name=loss.name)
loss_name=loss.name)
# If the feed is a dict:
# If the feed is a dict:
# the image will be split into devices. If there is two devices
# the image will be split into devices. If there is two devices
...
@@ -314,7 +316,7 @@ class ParallelExecutor(object):
...
@@ -314,7 +316,7 @@ class ParallelExecutor(object):
application and release of temporary variables, the strategy adopted by
application and release of temporary variables, the strategy adopted by
ParallelExecutor is to drop the local execution scopes after several iterations.
ParallelExecutor is to drop the local execution scopes after several iterations.
ParallelExecutor provides the num_iteration_per_drop_scope option in
ParallelExecutor provides the num_iteration_per_drop_scope option in
:code:`
fluid
.ExecutionStrategy`, which indicates how many iterations are intervened to
:code:`
paddle.static
.ExecutionStrategy`, which indicates how many iterations are intervened to
drop the local execution scopes. If the num_iteration_per_drop_scope value
drop the local execution scopes. If the num_iteration_per_drop_scope value
is 100, but you want to drop the local execution scopes after 50 iterations,
is 100, but you want to drop the local execution scopes after 50 iterations,
you can call the interface manually.
you can call the interface manually.
...
@@ -325,13 +327,13 @@ class ParallelExecutor(object):
...
@@ -325,13 +327,13 @@ class ParallelExecutor(object):
Examples:
Examples:
.. code-block:: python
.. code-block:: python
import paddle
.fluid as fluid
import paddle
import numpy
import numpy
import os
import os
use_cuda = True
use_cuda = True
# NOTE: If you use CPU to run the program, you need
# NOTE: If you use CPU to run the program, you need
# to specify the CPU_NUM, otherwise,
fluid
will use
# to specify the CPU_NUM, otherwise,
PaddlePaddle
will use
# all the number of the logic core as the CPU_NUM,
# all the number of the logic core as the CPU_NUM,
# in that case, the batch size of the input should be
# in that case, the batch size of the input should be
# greater than CPU_NUM, if not, the process will be
# greater than CPU_NUM, if not, the process will be
...
@@ -339,26 +341,28 @@ class ParallelExecutor(object):
...
@@ -339,26 +341,28 @@ class ParallelExecutor(object):
if not use_cuda:
if not use_cuda:
os.environ['CPU_NUM'] = str(2)
os.environ['CPU_NUM'] = str(2)
train_program = fluid.Program()
paddle.enable_static()
startup_program = fluid.Program()
train_program = paddle.static.Program()
with fluid.program_guard(train_program, startup_program):
startup_program = paddle.static.Program()
data = fluid.data(name='X', shape=[None, 1], dtype='float32')
with paddle.static.program_guard(train_program, startup_program):
hidden = fluid.layers.fc(input=data, size=10)
data = paddle.static.data(name='X', shape=[None, 1], dtype='float32')
loss = fluid.layers.mean(hidden)
hidden = paddle.static.nn.fc(data, 10)
loss = paddle.mean(hidden)
place =
fluid.CUDAPlace(0) if use_cuda else fluid
.CPUPlace()
place =
paddle.CUDAPlace(0) if use_cuda else paddle
.CPUPlace()
exe =
fluid
.Executor(place)
exe =
paddle.static
.Executor(place)
exe.run(startup_program)
exe.run(startup_program)
parallel_exe =
fluid
.ParallelExecutor(use_cuda=use_cuda,
parallel_exe =
paddle.static
.ParallelExecutor(use_cuda=use_cuda,
main_program=train_program,
main_program=train_program,
loss_name=loss.name)
loss_name=loss.name)
x = numpy.random.random(size=(10, 1)).astype('float32')
x = numpy.random.random(size=(10, 1)).astype('float32')
loss_data, = parallel_exe.run(feed={"X": x},
loss_data, = parallel_exe.run(feed={"X": x},
fetch_list=[loss.name])
fetch_list=[loss.name])
parallel_exe.drop_local_exe_scopes()
parallel_exe.drop_local_exe_scopes()
"""
"""
check_type
(
self
.
_compiled_program
.
_executor
,
check_type
(
self
.
_compiled_program
.
_executor
,
"the Executor of compiled program"
,
core
.
ParallelExecutor
,
"the Executor of compiled program"
,
core
.
ParallelExecutor
,
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录