Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
958d7212
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
958d7212
编写于
8月 26, 2020
作者:
J
JZ-LIANG
提交者:
GitHub
8月 26, 2020
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
【paddle.fleet】Document refine lars & lamb (#26533)
上级
ada1e129
变更
1
显示空白变更内容
内联
并排
Showing
1 changed file
with
71 addition
and
0 deletion
+71
-0
python/paddle/distributed/fleet/base/distributed_strategy.py
python/paddle/distributed/fleet/base/distributed_strategy.py
+71
-0
未找到文件。
python/paddle/distributed/fleet/base/distributed_strategy.py
浏览文件 @
958d7212
...
...
@@ -750,6 +750,20 @@ class DistributedStrategy(object):
@
property
def
lars
(
self
):
"""
Set lars configurations. lars is used to deal with the convergence problems when the global
batch size is larger than 8k. For more details, please refer to
[Large Batch Training of Convolutional Networks](https://arxiv.org/abs/1708.03888).
Default Value: False
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.lars = True # by default this is false
"""
return
self
.
strategy
.
lars
@
lars
.
setter
...
...
@@ -761,6 +775,29 @@ class DistributedStrategy(object):
@
property
def
lars_configs
(
self
):
"""
Set Lars training configurations.
**Notes**:
**lars_coeff (float)**: trust ratio in lars formula.
**lars_weight_decay** (float): weight decay coefficient in lars formula.
**epsilon (float)**: argument is used to avoid potential devision-by-zero
when compute the local lr;
**exclude_from_weight_decay ([string])**: is a list of name strings of layers which
will be exclude from weight decay in lars formula.
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.lars = True
strategy.lars_configs = {
"lars_coeff": 0.01,
"lars_weight_decay": 0.0005,
"epsilon": 0,
"exclude_from_weight_decay": ['batch_norm', '.b_0']
}
"""
return
get_msg_dict
(
self
.
strategy
.
lars_configs
)
@
lars_configs
.
setter
...
...
@@ -770,6 +807,22 @@ class DistributedStrategy(object):
@
property
def
lamb
(
self
):
"""
Set lamb configurations. lamb is used to deal with the convergence problems for large
batch size training, specially for attention-related model like BERT. For more details,
please refer to
[Large Batch Optimization for Deep Learning: Training BERT in 76 minutes](https://arxiv.org/abs/1904.00962).
Default Value: False
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.lamb = True # by default this is false
"""
return
self
.
strategy
.
lamb
@
lamb
.
setter
...
...
@@ -781,6 +834,24 @@ class DistributedStrategy(object):
@
property
def
lamb_configs
(
self
):
"""
Set Lars training configurations.
**Notes**:
**lamb_weight_decay** (float): weight decay coefficient in lamb formula.
**exclude_from_weight_decay ([string])**: is a list of name strings of layers which
will be exclude from weight decay in lamb formula.
Examples:
.. code-block:: python
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
strategy.lamb = True
strategy.lamb_configs = {
'lamb_weight_decay': 0.01,
'exclude_from_weight_decay': [],
}
"""
return
get_msg_dict
(
self
.
strategy
.
lamb_configs
)
@
lamb_configs
.
setter
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录