Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Crayon鑫
Paddle
提交
58c3cc8c
P
Paddle
项目概览
Crayon鑫
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1
Issue
1
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
58c3cc8c
编写于
9月 04, 2020
作者:
M
mapingshuo
提交者:
GitHub
9月 04, 2020
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Revert "fix strategy example (#26856)"
This reverts commit
9e4fe923
.
上级
9e4fe923
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
44 addition
and
68 deletion
+44
-68
python/paddle/distributed/fleet/base/distributed_strategy.py
python/paddle/distributed/fleet/base/distributed_strategy.py
+44
-68
未找到文件。
python/paddle/distributed/fleet/base/distributed_strategy.py
浏览文件 @
58c3cc8c
...
...
@@ -118,7 +118,7 @@ class DistributedStrategy(object):
strategy = fleet.DistributedStrategy()
strategy.dgc = True
strategy.recompute = True
strategy.recompute_configs = {"checkpoint
s
": ["x"]}
strategy.recompute_configs = {"checkpoint": ["x"]}
strategy.save_to_prototxt("dist_strategy.prototxt")
"""
with
open
(
output
,
"w"
)
as
fout
:
...
...
@@ -133,7 +133,7 @@ class DistributedStrategy(object):
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.load_from_prototxt("dist_strategy.proto
t
xt")
strategy.load_from_prototxt("dist_strategy.protoxt")
"""
with
open
(
pb_file
,
'r'
)
as
f
:
self
.
strategy
=
google
.
protobuf
.
text_format
.
Merge
(
...
...
@@ -147,7 +147,6 @@ class DistributedStrategy(object):
Examples:
.. code-block:: python
import paddle
exe_strategy = paddle.fluid.ExecutionStrategy()
exe_strategy.num_threads = 10
exe_strategy.num_iteration_per_drop_scope = 10
...
...
@@ -180,7 +179,6 @@ class DistributedStrategy(object):
Examples:
.. code-block:: python
import paddle
build_strategy = paddle.fluid.BuildStrategy()
build_strategy.enable_sequential_execution = True
build_strategy.fuse_elewise_add_act_ops = True
...
...
@@ -254,19 +252,14 @@ class DistributedStrategy(object):
a dict.
**Notes**:
k_step(int): number of local optimization updates before communication
max_merge_var_num(int): maximum number of merged gradients before communication
send_queue_size(int): a buffer size of worker communication
independent_recv_thread(bool): if we are using independent recv thread for communication
thread_pool_size(int): number of thread pool
send_wait_times(int): waiting time for sending gradients
runtime_split_send_recv(bool): if we are using Tensor split for send and recv during runtime
**Detailed arguments for a_sync_configs**
**k_step**: number of local optimization updates before communication
**max_merge_var_num**: maximum number of merged gradients before communication
**send_queue_size**: a buffer size of worker communication
**independent_recv_thread**: if we are using independent recv thread for communication
**thread_pool_size**: number of thread pool
**send_wait_times**: waiting time for sending gradients
**runtime_split_send_recv**: if we are using Tensor split for send and recv during runtime
Examples:
.. code-block:: python
...
...
@@ -277,12 +270,11 @@ class DistributedStrategy(object):
strategy = fleet.DistributedStrategy()
strategy.a_sync = True # by default this is True
configs = {"k_step
s": 1024
, "send_queue_size": 32}
configs = {"k_step
": 10000
, "send_queue_size": 32}
strategy.a_sync_configs = configs
# code block for defining loss and local optimizer
# sgd = fleet.distributed_optimizer(optimizer, strategy)
"""
return
get_msg_dict
(
self
.
strategy
.
a_sync_configs
)
...
...
@@ -322,21 +314,14 @@ class DistributedStrategy(object):
settings that can be configured through a dict.
**Notes**:
init_loss_scaling(float): The initial loss scaling factor. Default 32768.
use_dynamic_loss_scaling(bool): Whether to use dynamic loss scaling. Default True.
incr_every_n_steps(int): Increases loss scaling every n consecutive steps with finite gradients. Default 1000.
decr_every_n_nan_or_inf(int): Decreases loss scaling every n accumulated steps with nan or inf gradients. Default 2.
incr_ratio(float): The multiplier to use when increasing the loss scaling. Default 2.0.
decr_ratio(float): The less-than-one-multiplier to use when decreasing the loss scaling. Default 0.5.
custom_white_list(list[str]): Users' custom white list which always execution fp16.
custom_black_list(list[str]): Users' custom black list which forbidden execution fp16.
**init_loss_scaling(float)**: The initial loss scaling factor. Default 32768.
**use_dynamic_loss_scaling(bool)**: Whether to use dynamic loss scaling. Default True.
**incr_every_n_steps(int)**: Increases loss scaling every n consecutive steps with finite gradients. Default 1000.
**decr_every_n_nan_or_inf(int)**: Decreases loss scaling every n accumulated steps with nan or inf gradients. Default 2.
**incr_ratio(float)**: The multiplier to use when increasing the loss scaling. Default 2.0.
**decr_ratio(float)**: The less-than-one-multiplier to use when decreasing the loss scaling. Default 0.5.
**custom_white_list(list[str])**: Users' custom white list which always execution fp16.
**custom_black_list(list[str])**: Users' custom black list which forbidden execution fp16.
Examples:
.. code-block:: python
...
...
@@ -568,7 +553,7 @@ class DistributedStrategy(object):
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.recompute = True
strategy.recompute_configs = {"checkp
oi
nts": ["x", "y"]}
strategy.recompute_configs = {"checkp
io
nts": ["x", "y"]}
"""
return
get_msg_dict
(
self
.
strategy
.
recompute_configs
)
...
...
@@ -618,7 +603,6 @@ class DistributedStrategy(object):
**Notes**:
**Detailed arguments for pipeline_configs**
**micro_batch**: the number of small batches in each user defined batch
Examples:
...
...
@@ -642,10 +626,10 @@ class DistributedStrategy(object):
@
property
def
localsgd
(
self
):
"""
Indicating whether we are using Local SGD training. Default Value: False
For more details, please refer to
`Don't Use Large Mini-Batches, Use Local SGD <https://arxiv.org/pdf/1808.07217.pdf>`_.
Indicating whether we are using Local SGD training. For more details, please refer to
[Don't Use Large Mini-Batches, Use Local SGD](https://arxiv.org/pdf/1808.07217.pdf),
Default Value: False
Examples:
.. code-block:: python
...
...
@@ -671,12 +655,13 @@ class DistributedStrategy(object):
setting that can be configured through a dict.
**Notes**:
k_steps(int) The local steps for training before parameter synchronization. Default 1.
If strategy.auto is set True, the local steps will be calculated automatically during training.
The algorithm is referenced in this paper:
`Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD <https://arxiv.org/pdf/1810.08313.pdf>`_.
In this case, k_steps indicates the first local steps which is suggested setting to 1.
**k_steps(int)**: The local steps for training before parameter
synchronization. Default 1. If strategy.auto is set True, the
local steps will be calculated automatically during training.
The algorithm is referenced in this paper:
[Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD](https://arxiv.org/pdf/1810.08313.pdf).
In this case, k_steps indicates the first local steps which
is suggested setting to 1.
Examples:
.. code-block:: python
...
...
@@ -727,16 +712,14 @@ class DistributedStrategy(object):
settings that can be configured through a dict.
**Notes**:
rampup_begin_step(int): The beginning step from which gradient compression is implemented. Default 0.
rampup_step(int): Time steps used in sparsity warm-up periods. Default is 1.
\
For example, if the sparsity is [0.75, 0.9375, 0.984375, 0.996, 0.999], and the rampup_step is 100,
\
it will use 0.75 at 0~19 steps, and 0.9375 at 20~39 steps, and so on. And when reach sparsity array
\
ends, it will use 0.999 then and after.
sparsity(list[float]): Get top important element from gradient tensor, the ratio is (1 - sparsity).
\
Default is [0.999]. For example, if the sparsity is [0.99, 0.999], the top [1%, 0.1%] important
\
element will be transmitted.
**rampup_begin_step(int)**: The beginning step from which gradient compression is implemented. Default 0.
**rampup_step(int)**: Time steps used in sparsity warm-up periods. Default is 1.
For example, if the sparsity is [0.75, 0.9375, 0.984375, 0.996, 0.999], and the rampup_step is 100,
it will use 0.75 at 0~19 steps, and 0.9375 at 20~39 steps, and so on. And when reach sparsity array
ends, it will use 0.999 then and after.
**sparsity(list[float])**: Get top important element from gradient tensor, the ratio is (1 - sparsity).
Default is [0.999]. For example, if the sparsity is [0.99, 0.999], the top [1%, 0.1%] important
element will be transmitted.
Examples:
.. code-block:: python
...
...
@@ -766,8 +749,7 @@ class DistributedStrategy(object):
to model parameters.
Examples:
.. code-block:: python
.. code-block:: python
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.gradient_merge = True
...
...
@@ -786,15 +768,11 @@ class DistributedStrategy(object):
def
gradient_merge_configs
(
self
):
"""
the key-value configs of distribute_strategy
**Note**:
k_steps(int): the update period of the parameters.
avg(bool): whether to average the gradients of each mini-batch, the default value is `True`
Examples:
.. code-block:: python
Keys:
k_steps (int): the update period of the parameters
avg (bool): whether to average the gradients of each mini-batch,
the default value is `True`
Example:
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.gradient_merge = True
...
...
@@ -848,7 +826,6 @@ class DistributedStrategy(object):
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.lars = True
...
...
@@ -905,7 +882,6 @@ class DistributedStrategy(object):
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.lamb = True
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录