Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
FluidDoc
提交
b360aa16
F
FluidDoc
项目概览
PaddlePaddle
/
FluidDoc
通知
7
Star
2
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
23
列表
看板
标记
里程碑
合并请求
111
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
F
FluidDoc
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
23
Issue
23
列表
看板
标记
里程碑
合并请求
111
合并请求
111
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
b360aa16
编写于
7月 30, 2019
作者:
D
Dong Daxiang
提交者:
GitHub
7月 30, 2019
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Update fleet_api_howto_cn.rst
test=document_preview
上级
dc0a7938
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
210 addition
and
203 deletion
+210
-203
doc/fluid/user_guides/howto/training/fleet_api_howto_cn.rst
doc/fluid/user_guides/howto/training/fleet_api_howto_cn.rst
+210
-203
未找到文件。
doc/fluid/user_guides/howto/training/fleet_api_howto_cn.rst
浏览文件 @
b360aa16
# 使用FleetAPI进行分布式训练
使用FleetAPI进行分布式训练
==========================
FleetAPI 设计说明
-----------------
Fleet是PaddlePaddle分布式训练的高级API。Fleet的命名出自于PaddlePaddle,象征一个舰队中的多只双桨船协同工作。Fleet的设计在易用性和算法可扩展性方面做出了权衡。用户可以很容易从单机版的训练程序,通过添加几行代码切换到分布式训练程序。此外,分布式训练的算法也可以通过Fleet
API接口灵活定义。具体的设计原理可以参考\ `Fleet
API设计文档 <https://github.com/PaddlePaddle/Fleet/blob/develop/README.md>`__\ 。当前FleetAPI还处于paddle.fluid.incubate目录下,未来功能完备后会放到paddle.fluid目录中,欢迎持续关注。
## FleetAPI 设计说明
Fleet API快速上手示例
---------------------
Fleet是PaddlePaddle分布式训练的高级API。Fleet的命名出自于PaddlePaddle,象征一个舰队中的多只双桨船协同工作。Fleet的设计在易用性和算法可扩展性方面做出了权衡。用户可以很容易从单机版的训练程序,通过添加几行代码切换到分布式训练程序。此外,分布式训练的算法也可以通过Fleet API接口灵活定义。具体的设计原理可以参考[Fleet API设计文档](https://github.com/PaddlePaddle/Fleet/blob/develop/README.md)。当前FleetAPI还处于paddle.fluid.incubate目录下,未来功能完备后会放到paddle.fluid目录中,欢迎持续关注。
下面会针对Fleet
API最常见的两种使用场景,用一个模型做示例,目的是让用户有快速上手体验的模板。快速上手的示例源代码可以在\ `Fleet
Quick
Start <https://github.com/PaddlePaddle/Fleet/tree/develop/examples/quick-start>`__\ 找到。
假设我们定义MLP网络如下:
.. code:: python
## Fleet API快速上手示例
import paddle.fluid as fluid
下面会针对Fleet API最常见的两种使用场景,用一个模型做示例,目的是让用户有快速上手体验的模板。快速上手的示例源代码可以在[Fleet Quick Start](https://github.com/PaddlePaddle/Fleet/tree/develop/examples/quick-start)找到。
def mlp(input_x, input_y, hid_dim=128, label_dim=2):
fc_1 = fluid.layers.fc(input=input_x, size=hid_dim, act='tanh')
fc_2 = fluid.layers.fc(input=fc_1, size=hid_dim, act='tanh')
prediction = fluid.layers.fc(input=[fc_2], size=label_dim, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=input_y)
avg_cost = fluid.layers.mean(x=cost)
return avg_cost
假设我们定义MLP网络
如下:
定义一个在内存生成数据的Reader
如下:
```python
import paddle.fluid as fluid
.. code:: python
def mlp(input_x, input_y, hid_dim=128, label_dim=2):
fc_1 = fluid.layers.fc(input=input_x, size=hid_dim, act='tanh')
fc_2 = fluid.layers.fc(input=fc_1, size=hid_dim, act='tanh')
prediction = fluid.layers.fc(input=[fc_2], size=label_dim, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=input_y)
avg_cost = fluid.layers.mean(x=cost)
return avg_cost
```
import numpy as np
定义一个在内存生成数据的Reader如下:
def gen_data():
return {"x": np.random.random(size=(128, 32)).astype('float32'),
"y": np.random.randint(2, size=(128, 1)).astype('int64')}
```python
import numpy as np
def gen_data():
return {"x": np.random.random(size=(128, 32)).astype('float32'),
"y": np.random.randint(2, size=(128, 1)).astype('int64')}
```
#### 单机Trainer定义
```python
import paddle.fluid as fluid
from nets import mlp
from utils import gen_data
input_x = fluid.layers.data(name="x", shape=[32], dtype='float32')
input_y = fluid.layers.data(name="y", shape=[1], dtype='int64')
单机Trainer定义
^^^^^^^^^^^^^^^
cost = mlp(input_x, input_y)
optimizer = fluid.optimizer.SGD(learning_rate=0.01)
optimizer.minimize(cost)
place = fluid.CUDAPlace(0)
.. code:: python
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
step = 1001
for i in range(step):
cost_val = exe.run(feed=gen_data(), fetch_list=[cost.name])
print("step%d cost=%f" % (i, cost_val[0]))
```
import paddle.fluid as fluid
from nets import mlp
from utils import gen_data
#### Parameter Server训练方法
input_x = fluid.layers.data(name="x", shape=[32], dtype='float32')
input_y = fluid.layers.data(name="y", shape=[1], dtype='int64')
参数服务器方法对于大规模数据,简单模型的并行训练非常适用,我们基于单机模型的定义给出其实用Parameter Server进行训练的示例如下:
cost = mlp(input_x, input_y)
optimizer = fluid.optimizer.SGD(learning_rate=0.01)
optimizer.minimize(cost)
place = fluid.CUDAPlace(0)
```python
import paddle.fluid as fluid
from nets import mlp
from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet
from paddle.fluid.incubate.fleet.base import role_maker
from utils import gen_data
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
step = 1001
for i in range(step):
cost_val = exe.run(feed=gen_data(), fetch_list=[cost.name])
print("step%d cost=%f" % (i, cost_val[0]))
input_x = fluid.layers.data(name="x", shape=[32], dtype='float32')
input_y = fluid.layers.data(name="y", shape=[1], dtype='int64')
Parameter Server训练方法
^^^^^^^^^^^^^^^^^^^^^^^^
cost = mlp(input_x, input_y)
optimizer = fluid.optimizer.SGD(learning_rate=0.01)
参数服务器方法对于大规模数据,简单模型的并行训练非常适用,我们基于单机模型的定义给出其实用Parameter
Server进行训练的示例如下:
role = role_maker.PaddleCloudRoleMaker()
fleet.init(role)
optimizer = fleet.distributed_optimizer(optimizer)
optimizer.minimize(cost)
.. code:: python
if fleet.is_server():
fleet.init_server()
fleet.run_server()
elif fleet.is_worker():
place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
step = 1001
for i in range(step):
cost_val = exe.run(
program=fluid.default_main_program(),
feed=gen_data(),
fetch_list=[cost.name])
print("worker_index: %d, step%d cost = %f" %
(fleet.worker_index(), i, cost_val[0]))
```
import paddle.fluid as fluid
from nets import mlp
from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet
from paddle.fluid.incubate.fleet.base import role_maker
from utils import gen_data
input_x = fluid.layers.data(name="x", shape=[32], dtype='float32')
input_y = fluid.layers.data(name="y", shape=[1], dtype='int64')
cost = mlp(input_x, input_y)
optimizer = fluid.optimizer.SGD(learning_rate=0.01)
#### Collective训练方法
role = role_maker.PaddleCloudRoleMaker()
fleet.init(role)
optimizer = fleet.distributed_optimizer(optimizer)
optimizer.minimize(cost)
if fleet.is_server():
fleet.init_server()
fleet.run_server()
elif fleet.is_worker():
place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
step = 1001
for i in range(step):
cost_val = exe.run(
program=fluid.default_main_program(),
feed=gen_data(),
fetch_list=[cost.name])
print("worker_index: %d, step%d cost = %f" %
(fleet.worker_index(), i, cost_val[0]))
Collective训练方法
^^^^^^^^^^^^^^^^^^
collective
training通常在GPU多机多卡训练中使用,一般在复杂模型的训练中比较常见,我们基于上面的单机模型定义给出使用Collective方法进行分布式训练的示例如下:
.. code:: python
import paddle.fluid as fluid
from nets import mlp
from paddle.fluid.incubate.fleet.collective import fleet
from paddle.fluid.incubate.fleet.base import role_maker
from utils import gen_data
collective training通常在GPU多机多卡训练中使用,一般在复杂模型的训练中比较常见,我们基于上面的单机模型定义给出使用Collective方法进行分布式训练的示例如下:
input_x = fluid.layers.data(name="x", shape=[32], dtype='float32')
input_y = fluid.layers.data(name="y", shape=[1], dtype='int64')
```python
import paddle.fluid as fluid
from nets import mlp
from paddle.fluid.incubate.fleet.collective import fleet
from paddle.fluid.incubate.fleet.base import role_maker
from utils import gen_data
cost = mlp(input_x, input_y)
optimizer = fluid.optimizer.SGD(learning_rate=0.01)
role = role_maker.PaddleCloudRoleMaker(is_collective=True)
fleet.init(role)
input_x = fluid.layers.data(name="x", shape=[32], dtype='float32')
input_y = fluid.layers.data(name="y", shape=[1], dtype='int64')
optimizer = fleet.distributed_optimizer(optimizer)
optimizer.minimize(cost)
place = fluid.CUDAPlace(0)
cost = mlp(input_x, input_y)
optimizer = fluid.optimizer.SGD(learning_rate=0.01)
role = role_maker.PaddleCloudRoleMaker(is_collective=True)
fleet.init(role)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
step = 1001
for i in range(step):
cost_val = exe.run(
program=fluid.default_main_program(),
feed=gen_data(),
fetch_list=[cost.name])
print("worker_index: %d, step%d cost = %f" %
(fleet.worker_index(), i, cost_val[0]))
optimizer = fleet.distributed_optimizer(optimizer)
optimizer.minimize(cost)
place = fluid.CUDAPlace(0)
更多使用示例
------------
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
step = 1001
for i in range(step):
cost_val = exe.run(
program=fluid.default_main_program(),
feed=gen_data(),
fetch_list=[cost.name])
print("worker_index: %d, step%d cost = %f" %
(fleet.worker_index(), i, cost_val[0]))
```
`点击率预估 <>`__
## 更多使用示例
`语义匹配 <>`__
[点击率预估]()
`向量学习 <>`__
[语义匹配]()
`基于Resnet50的图像分类 <>`__
[向量学习]()
`基于Transformer的机器翻译 <>`__
[基于Resnet50的图像分类]()
`基于Bert的语义表示学习 <>`__
[基于Transformer的机器翻译]()
Fleet API相关的接口说明
-----------------------
[基于Bert的语义表示学习]()
Fleet API接口
~~~~~~~~~~~~~
- init(role\_maker=None)
- fleet初始化,需要在使用fleet其他接口前先调用,用于定义多机的环境配置
- is\_worker()
- Parameter
Server训练中使用,判断当前节点是否是Worker节点,是则返回True,否则返回False
- is\_server(model\_dir=None)
- Parameter
Server训练中使用,判断当前节点是否是Server节点,是则返回True,否则返回False
- init\_server()
- Parameter
Server训练中,fleet加载model\_dir中保存的模型相关参数进行parameter
server的初始化
- run\_server()
- Parameter Server训练中使用,用来启动server端服务
- init\_worker()
- Parameter Server训练中使用,用来启动worker端服务
- stop\_worker()
- 训练结束后,停止worker
- distributed\_optimizer(optimizer, strategy=None)
- 分布式优化算法装饰器,用户可带入单机optimizer,并配置分布式训练策略,返回一个分布式的optimizer
RoleMaker
~~~~~~~~~
## Fleet API相关的接口说明
- MPISymetricRoleMaker
### Fleet API接口
- 描述:MPISymetricRoleMaker会假设每个节点启动两个进程,1worker+1pserver,这种RoleMaker要求用户的集群上有mpi环境。
- init(role_maker=None)
- fleet初始化,需要在使用fleet其他接口前先调用,用于定义多机的环境配置
- is_worker()
- Parameter Server训练中使用,判断当前节点是否是Worker节点,是则返回True,否则返回False
- is_server(model_dir=None)
- Parameter Server训练中使用,判断当前节点是否是Server节点,是则返回True,否则返回False
- init_server()
- Parameter Server训练中,fleet加载model_dir中保存的模型相关参数进行parameter server的初始化
- run_server()
- Parameter Server训练中使用,用来启动server端服务
- init_worker()
- Parameter Server训练中使用,用来启动worker端服务
- stop_worker()
- 训练结束后,停止worker
- distributed_optimizer(optimizer, strategy=None)
- 分布式优化算法装饰器,用户可带入单机optimizer,并配置分布式训练策略,返回一个分布式的optimizer
- 示例:
### RoleMaker
.. code:: python
- MPISymetricRoleMaker
from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet
from paddle.fluid.incubate.fleet.base import role_maker
- 描述:MPISymetricRoleMaker会假设每个节点启动两个进程,1worker+1pserver,这种RoleMaker要求用户的集群上有mpi环境。
role = role_maker.MPISymetricRoleMaker()
fleet.init(role)
- 示例
:
- 启动方法
:
```python
from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet
from paddle.fluid.incubate.fleet.base import role_maker
role = role_maker.MPISymetricRoleMaker()
fleet.init(role)
```
.. code:: shell
- 启动方法:
mpirun -np 2 python trainer.py
```shell
mpirun -np 2 python trainer.py
```
- PaddleCloudRoleMaker
-
PaddleCloudRoleMaker
-
描述:PaddleCloudRoleMaker是一个高级封装,支持使用paddle.distributed.launch或者paddle.distributed.launch\_ps启动脚本
- 描述:PaddleCloudRoleMaker是一个高级封装,支持使用paddle.distributed.launch或者paddle.distributed.launch_ps启动脚本
- Parameter Server训练示例:
- Parameter Server训练示例:
.. code:: python
```python
from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet
from paddle.fluid.incubate.fleet.base import role_maker
role = role_maker.PaddleCloudRoleMaker()
fleet.init(role)
```
from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet
from paddle.fluid.incubate.fleet.base import role_maker
- 启动方法:
role = role_maker.PaddleCloudRoleMaker()
fleet.init(role)
```python
python -m paddle.distributed.launch_ps --worker_num 2 --server_num 2 trainer.py
```
- 启动方法:
- Collective训练示例:
.. code:: python
```python
from paddle.fluid.incubate.fleet.collective import fleet
from paddle.fluid.incubate.fleet.base import role_maker
role = role_maker.PaddleCloudRoleMaker(is_collective=True)
fleet.init(role)
```
python -m paddle.distributed.launch_ps --worker_num 2 --server_num 2 trainer.py
- 启动方法
:
- Collective训练示例
:
```python
python -m paddle.distributed.launch trainer.py
```
.. code:: python
- UserDefinedRoleMaker
from paddle.fluid.incubate.fleet.collective import fleet
from paddle.fluid.incubate.fleet.base import role_maker
- 描述:用户自定义节点的角色信息,IP和端口信息
role = role_maker.PaddleCloudRoleMaker(is_collective=True)
fleet.init(role)
- 示例
:
- 启动方法
:
```python
from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet
from paddle.fluid.incubate.fleet.base import role_maker
role = role_maker.UserDefinedRoleMaker(
current_id=int(os.getenv("CURRENT_ID")),
role=role_maker.Role.WORKER if bool(int(os.getenv("IS_WORKER")))
else role_maker.Role.SERVER,
worker_num=int(os.getenv("WORKER_NUM")),
server_endpoints=pserver_endpoints)
fleet.init(role)
```
.. code:: python
python -m paddle.distributed.launch trainer.py
- UserDefinedRoleMaker
- 描述:用户自定义节点的角色信息,IP和端口信息
- 示例:
.. code:: python
### Strategy
from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet
from paddle.fluid.incubate.fleet.base import role_maker
- Parameter Server Training
- Sync_mode
- Collective Training
- LocalSGD
- ReduceGrad
role = role_maker.UserDefinedRoleMaker(
current_id=int(os.getenv("CURRENT_ID")),
role=role_maker.Role.WORKER if bool(int(os.getenv("IS_WORKER")))
else role_maker.Role.SERVER,
worker_num=int(os.getenv("WORKER_NUM")),
server_endpoints=pserver_endpoints)
fleet.init(role)
### Fleet Mode
Strategy
~~~~~~~~
- Parameter Server Training
- Parameter Server Training
- Sync\_mode
- Collective Training
- LocalSGD
- ReduceGrad
```python
from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet
```
Fleet Mode
~~~~~~~~~~
- Parameter Server Training
- Collective Training
``python from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet``
```python
from paddle.fluid.incubate.fleet.collective import fleet
```
- Collective Training
``python from paddle.fluid.incubate.fleet.collective import fleet``
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录