Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
0e552c08
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
0e552c08
编写于
10月 20, 2022
作者:
H
Haohongxiang
提交者:
GitHub
10月 20, 2022
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
support qat in sharding stage2 (#47169)
上级
8d2ce06e
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
7 addition
and
4 deletion
+7
-4
python/paddle/distributed/fleet/meta_parallel/sharding/group_sharded_optimizer_stage2.py
.../meta_parallel/sharding/group_sharded_optimizer_stage2.py
+3
-1
python/paddle/distributed/fleet/meta_parallel/sharding/group_sharded_stage2.py
...uted/fleet/meta_parallel/sharding/group_sharded_stage2.py
+4
-3
未找到文件。
python/paddle/distributed/fleet/meta_parallel/sharding/group_sharded_optimizer_stage2.py
浏览文件 @
0e552c08
...
...
@@ -301,7 +301,9 @@ class GroupShardedOptimizerStage2(Optimizer):
"""
if
len
(
self
.
_dtype_rank_params
)
==
0
:
# Assign the parameters of each rank according to the type
for
param
in
self
.
_local_params
:
trainable_params
=
list
(
filter
(
lambda
x
:
x
.
trainable
,
self
.
_local_params
))
for
param
in
trainable_params
:
if
param
.
dtype
not
in
self
.
_dtype_rank_params
.
keys
():
self
.
_dtype_rank_params
[
param
.
dtype
]
=
[
[]
for
_
in
range
(
self
.
world_size
)
...
...
python/paddle/distributed/fleet/meta_parallel/sharding/group_sharded_stage2.py
浏览文件 @
0e552c08
...
...
@@ -102,11 +102,12 @@ class GroupShardedStage2(nn.Layer):
# sharing stage 2 comm overlap flag
self
.
_reduce_overlap
=
False
self
.
_trainable_params
=
[]
self
.
_grad_reduced
=
[]
self
.
_trainable_param2rank
=
{}
self
.
_trainable_param2align
=
{}
self
.
_trainable_mask
=
list
(
map
(
_trainable
,
self
.
_all_params
))
self
.
_trainable_params
=
list
(
filter
(
lambda
x
:
x
.
trainable
,
self
.
_all_params
))
self
.
_trainable_mask
=
list
(
map
(
_trainable
,
self
.
_trainable_params
))
self
.
_param_grads
=
[]
# Set grad storage size & Display param sizes and model sizes
...
...
@@ -512,7 +513,7 @@ class GroupShardedStage2(nn.Layer):
def
_detect_train_change
(
self
):
# Current trainable parameters
trainable_mask
=
list
(
map
(
_trainable
,
self
.
_
all
_params
))
trainable_mask
=
list
(
map
(
_trainable
,
self
.
_
trainable
_params
))
# Whether parameters trainability changed
trainability_changed
=
trainable_mask
!=
self
.
_trainable_mask
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录