Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleClas
提交
7040ce83
P
PaddleClas
项目概览
PaddlePaddle
/
PaddleClas
接近 2 年 前同步成功
通知
116
Star
4999
Fork
1114
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
19
列表
看板
标记
里程碑
合并请求
6
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleClas
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
19
Issue
19
列表
看板
标记
里程碑
合并请求
6
合并请求
6
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
7040ce83
编写于
1月 11, 2022
作者:
G
gaotingquan
提交者:
Tingquan Gao
1月 25, 2022
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
refactor: change params to be consistent with amp
上级
d6d5efe0
变更
9
隐藏空白更改
内联
并排
Showing
9 changed file
with
49 addition
and
38 deletion
+49
-38
ppcls/configs/ImageNet/ResNet/ResNet50_amp_O1.yaml
ppcls/configs/ImageNet/ResNet/ResNet50_amp_O1.yaml
+4
-5
ppcls/configs/ImageNet/ResNet/ResNet50_amp_O2.yaml
ppcls/configs/ImageNet/ResNet/ResNet50_amp_O2.yaml
+7
-6
ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d_amp_O2.yaml
ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d_amp_O2.yaml
+6
-4
ppcls/engine/engine.py
ppcls/engine/engine.py
+9
-3
ppcls/engine/evaluation/classification.py
ppcls/engine/evaluation/classification.py
+11
-7
ppcls/engine/train/train.py
ppcls/engine/train/train.py
+6
-4
ppcls/static/program.py
ppcls/static/program.py
+2
-2
ppcls/static/run_dali.sh
ppcls/static/run_dali.sh
+3
-6
ppcls/static/train.py
ppcls/static/train.py
+1
-1
未找到文件。
ppcls/configs/ImageNet/ResNet/ResNet50_
fp16_dygraph
.yaml
→
ppcls/configs/ImageNet/ResNet/ResNet50_
amp_O1
.yaml
浏览文件 @
7040ce83
...
...
@@ -22,7 +22,8 @@ Global:
AMP
:
scale_loss
:
128.0
use_dynamic_loss_scaling
:
True
use_pure_fp16
:
&use_pure_fp16
False
# O1: mixed fp16
level
:
O1
# model architecture
Arch
:
...
...
@@ -44,6 +45,7 @@ Loss:
Optimizer
:
name
:
Momentum
momentum
:
0.9
multi_precision
:
True
lr
:
name
:
Piecewise
learning_rate
:
0.1
...
...
@@ -74,12 +76,11 @@ DataLoader:
mean
:
[
0.485
,
0.456
,
0.406
]
std
:
[
0.229
,
0.224
,
0.225
]
order
:
'
'
output_fp16
:
*use_pure_fp16
channel_num
:
*image_channel
sampler
:
name
:
DistributedBatchSampler
batch_size
:
256
batch_size
:
64
drop_last
:
False
shuffle
:
True
loader
:
...
...
@@ -104,7 +105,6 @@ DataLoader:
mean
:
[
0.485
,
0.456
,
0.406
]
std
:
[
0.229
,
0.224
,
0.225
]
order
:
'
'
output_fp16
:
*use_pure_fp16
channel_num
:
*image_channel
sampler
:
name
:
DistributedBatchSampler
...
...
@@ -131,7 +131,6 @@ Infer:
mean
:
[
0.485
,
0.456
,
0.406
]
std
:
[
0.229
,
0.224
,
0.225
]
order
:
'
'
output_fp16
:
*use_pure_fp16
channel_num
:
*image_channel
-
ToCHWImage
:
PostProcess
:
...
...
ppcls/configs/ImageNet/ResNet/ResNet50_
fp16
.yaml
→
ppcls/configs/ImageNet/ResNet/ResNet50_
amp_O2
.yaml
浏览文件 @
7040ce83
...
...
@@ -10,8 +10,8 @@ Global:
epochs
:
120
print_batch_step
:
10
use_visualdl
:
False
# used for static mode and model export
image_channel
:
&image_channel
4
# used for static mode and model export
image_shape
:
[
*image_channel
,
224
,
224
]
save_inference_dir
:
./inference
# training model under @to_static
...
...
@@ -22,7 +22,8 @@ Global:
AMP
:
scale_loss
:
128.0
use_dynamic_loss_scaling
:
True
use_pure_fp16
:
&use_pure_fp16
True
# O2: pure fp16
level
:
O2
# model architecture
Arch
:
...
...
@@ -43,7 +44,7 @@ Loss:
Optimizer
:
name
:
Momentum
momentum
:
0.9
multi_precision
:
*use_pure_fp16
multi_precision
:
True
lr
:
name
:
Piecewise
learning_rate
:
0.1
...
...
@@ -74,7 +75,7 @@ DataLoader:
mean
:
[
0.485
,
0.456
,
0.406
]
std
:
[
0.229
,
0.224
,
0.225
]
order
:
'
'
output_fp16
:
*use_pure_fp16
output_fp16
:
True
channel_num
:
*image_channel
sampler
:
...
...
@@ -104,7 +105,7 @@ DataLoader:
mean
:
[
0.485
,
0.456
,
0.406
]
std
:
[
0.229
,
0.224
,
0.225
]
order
:
'
'
output_fp16
:
*use_pure_fp16
output_fp16
:
True
channel_num
:
*image_channel
sampler
:
name
:
DistributedBatchSampler
...
...
@@ -131,7 +132,7 @@ Infer:
mean
:
[
0.485
,
0.456
,
0.406
]
std
:
[
0.229
,
0.224
,
0.225
]
order
:
'
'
output_fp16
:
*use_pure_fp16
output_fp16
:
True
channel_num
:
*image_channel
-
ToCHWImage
:
PostProcess
:
...
...
ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d_
fp16
.yaml
→
ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d_
amp_O2
.yaml
浏览文件 @
7040ce83
...
...
@@ -35,11 +35,13 @@ Loss:
AMP
:
scale_loss
:
128.0
use_dynamic_loss_scaling
:
True
use_pure_fp16
:
&use_pure_fp16
True
# O2: pure fp16
level
:
O2
Optimizer
:
name
:
Momentum
momentum
:
0.9
multi_precision
:
True
lr
:
name
:
Cosine
learning_rate
:
0.1
...
...
@@ -67,7 +69,7 @@ DataLoader:
mean
:
[
0.485
,
0.456
,
0.406
]
std
:
[
0.229
,
0.224
,
0.225
]
order
:
'
'
output_fp16
:
*use_pure_fp16
output_fp16
:
True
channel_num
:
*image_channel
sampler
:
name
:
DistributedBatchSampler
...
...
@@ -96,7 +98,7 @@ DataLoader:
mean
:
[
0.485
,
0.456
,
0.406
]
std
:
[
0.229
,
0.224
,
0.225
]
order
:
'
'
output_fp16
:
*use_pure_fp16
output_fp16
:
True
channel_num
:
*image_channel
sampler
:
name
:
BatchSampler
...
...
@@ -123,7 +125,7 @@ Infer:
mean
:
[
0.485
,
0.456
,
0.406
]
std
:
[
0.229
,
0.224
,
0.225
]
order
:
'
'
output_fp16
:
*use_pure_fp16
output_fp16
:
True
channel_num
:
*image_channel
-
ToCHWImage
:
PostProcess
:
...
...
ppcls/engine/engine.py
浏览文件 @
7040ce83
...
...
@@ -211,14 +211,20 @@ class Engine(object):
self
.
optimizer
,
self
.
lr_sch
=
build_optimizer
(
self
.
config
[
"Optimizer"
],
self
.
config
[
"Global"
][
"epochs"
],
len
(
self
.
train_dataloader
),
[
self
.
model
])
# for amp training
if
self
.
amp
:
self
.
scaler
=
paddle
.
amp
.
GradScaler
(
init_loss_scaling
=
self
.
scale_loss
,
use_dynamic_loss_scaling
=
self
.
use_dynamic_loss_scaling
)
if
self
.
config
[
'AMP'
][
'use_pure_fp16'
]
is
True
:
self
.
model
=
paddle
.
amp
.
decorate
(
models
=
self
.
model
,
level
=
'O2'
,
save_dtype
=
'float32'
)
amp_level
=
self
.
config
[
'AMP'
].
get
(
"level"
,
"O1"
)
if
amp_level
not
in
[
"O1"
,
"O2"
]:
msg
=
"[Parameter Error]: The optimize level of AMP only support 'O1' and 'O2'. The level has been set 'O1'."
logger
.
warning
(
msg
)
self
.
config
[
'AMP'
][
"level"
]
=
"O1"
amp_level
=
"O1"
self
.
model
=
paddle
.
amp
.
decorate
(
models
=
self
.
model
,
level
=
amp_level
,
save_dtype
=
'float32'
)
# for distributed
self
.
config
[
"Global"
][
...
...
ppcls/engine/evaluation/classification.py
浏览文件 @
7040ce83
...
...
@@ -56,13 +56,15 @@ def classification_eval(engine, epoch_id=0):
batch
[
0
]
=
paddle
.
to_tensor
(
batch
[
0
]).
astype
(
"float32"
)
if
not
engine
.
config
[
"Global"
].
get
(
"use_multilabel"
,
False
):
batch
[
1
]
=
batch
[
1
].
reshape
([
-
1
,
1
]).
astype
(
"int64"
)
# image input
if
engine
.
amp
:
amp_level
=
'O1'
if
engine
.
config
[
'AMP'
][
'use_pure_fp16'
]
is
True
:
amp_level
=
'O2'
with
paddle
.
amp
.
auto_cast
(
custom_black_list
=
{
"flatten_contiguous_range"
,
"greater_than"
},
level
=
amp_level
):
amp_level
=
engine
.
config
[
'AMP'
].
get
(
"level"
,
"O1"
).
upper
()
with
paddle
.
amp
.
auto_cast
(
custom_black_list
=
{
"flatten_contiguous_range"
,
"greater_than"
},
level
=
amp_level
):
out
=
engine
.
model
(
batch
[
0
])
# calc loss
if
engine
.
eval_loss_func
is
not
None
:
...
...
@@ -70,7 +72,8 @@ def classification_eval(engine, epoch_id=0):
for
key
in
loss_dict
:
if
key
not
in
output_info
:
output_info
[
key
]
=
AverageMeter
(
key
,
'7.5f'
)
output_info
[
key
].
update
(
loss_dict
[
key
].
numpy
()[
0
],
batch_size
)
output_info
[
key
].
update
(
loss_dict
[
key
].
numpy
()[
0
],
batch_size
)
else
:
out
=
engine
.
model
(
batch
[
0
])
# calc loss
...
...
@@ -79,7 +82,8 @@ def classification_eval(engine, epoch_id=0):
for
key
in
loss_dict
:
if
key
not
in
output_info
:
output_info
[
key
]
=
AverageMeter
(
key
,
'7.5f'
)
output_info
[
key
].
update
(
loss_dict
[
key
].
numpy
()[
0
],
batch_size
)
output_info
[
key
].
update
(
loss_dict
[
key
].
numpy
()[
0
],
batch_size
)
# just for DistributedBatchSampler issue: repeat sampling
current_samples
=
batch_size
*
paddle
.
distributed
.
get_world_size
()
...
...
ppcls/engine/train/train.py
浏览文件 @
7040ce83
...
...
@@ -42,10 +42,12 @@ def train_epoch(engine, epoch_id, print_batch_step):
# image input
if
engine
.
amp
:
amp_level
=
'O1'
if
engine
.
config
[
'AMP'
][
'use_pure_fp16'
]
is
True
:
amp_level
=
'O2'
with
paddle
.
amp
.
auto_cast
(
custom_black_list
=
{
"flatten_contiguous_range"
,
"greater_than"
},
level
=
amp_level
):
amp_level
=
engine
.
config
[
'AMP'
].
get
(
"level"
,
"O1"
).
upper
()
with
paddle
.
amp
.
auto_cast
(
custom_black_list
=
{
"flatten_contiguous_range"
,
"greater_than"
},
level
=
amp_level
):
out
=
forward
(
engine
,
batch
)
loss_dict
=
engine
.
train_loss_func
(
out
,
batch
[
1
])
else
:
...
...
ppcls/static/program.py
浏览文件 @
7040ce83
...
...
@@ -158,7 +158,7 @@ def create_strategy(config):
exec_strategy
.
num_threads
=
1
exec_strategy
.
num_iteration_per_drop_scope
=
(
10000
if
'AMP'
in
config
and
config
.
AMP
.
get
(
"
use_pure_fp16"
,
False
)
else
10
)
if
'AMP'
in
config
and
config
.
AMP
.
get
(
"
level"
,
"O1"
)
==
"O2"
else
10
)
fuse_op
=
True
if
'AMP'
in
config
else
False
...
...
@@ -206,7 +206,7 @@ def mixed_precision_optimizer(config, optimizer):
scale_loss
=
amp_cfg
.
get
(
'scale_loss'
,
1.0
)
use_dynamic_loss_scaling
=
amp_cfg
.
get
(
'use_dynamic_loss_scaling'
,
False
)
use_pure_fp16
=
amp_cfg
.
get
(
'use_pure_fp16'
,
False
)
use_pure_fp16
=
amp_cfg
.
get
(
"level"
,
"O1"
)
==
"O2"
optimizer
=
paddle
.
static
.
amp
.
decorate
(
optimizer
,
init_loss_scaling
=
scale_loss
,
...
...
ppcls/static/run_dali.sh
浏览文件 @
7040ce83
#!/usr/bin/env bash
export
CUDA_VISIBLE_DEVICES
=
"0,1,2,3,4,5,6,7"
export
FLAGS_fraction_of_gpu_memory_to_use
=
0.80
export
CUDA_VISIBLE_DEVICES
=
"0,1,2,3"
python3.7
-m
paddle.distributed.launch
\
--gpus
=
"0,1,2,3
,4,5,6,7
"
\
--gpus
=
"0,1,2,3"
\
ppcls/static/train.py
\
-c
./ppcls/configs/ImageNet/ResNet/ResNet50_fp16.yaml
\
-o
Global.use_dali
=
True
-c
./ppcls/configs/ImageNet/ResNet/ResNet50_amp_O1.yaml
ppcls/static/train.py
浏览文件 @
7040ce83
...
...
@@ -158,7 +158,7 @@ def main(args):
# load pretrained models or checkpoints
init_model
(
global_config
,
train_prog
,
exe
)
if
'AMP'
in
config
and
config
.
AMP
.
get
(
"
use_pure_fp16"
,
False
)
:
if
'AMP'
in
config
and
config
.
AMP
.
get
(
"
level"
,
"O1"
)
==
"O2"
:
optimizer
.
amp_init
(
device
,
scope
=
paddle
.
static
.
global_scope
(),
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录