fp16 训练分类模型 hang 住
Created by: mzchtx
环境
- paddle 版本:1.5.0.post97
- python: 2.7
- 代码:image_classification
问题1
utility.py 此处缩进有问题,多了一个空格,会报错:
IndentationError: unexpected indent
问题2
train.py 中已经没有 input_dtype
参数,而在 run.sh 中还大量存在
问题3
ResNet50_vd + fp16=True + use_label_smoothing=True
出错:
命令如下:
python train.py \
--model=ResNet50_vd \
--batch_size=256 \
--fp16=True \
--total_images=1281167 \
--image_shape=3,224,224 \
--class_dim=1000 \
--lr_strategy=cosine_decay \
--lr=0.1 \
--num_epochs=200 \
--with_mem_opt=False \
--model_save_dir=output/ \
--l2_decay=7e-5 \
--use_mixup=True \
--use_label_smoothing=True \
--label_smoothing_epsilon=0.1
错误信息:
Traceback (most recent call last):
File "train.py", line 655, in <module>
main()
File "train.py", line 651, in main
train(args)
File "train.py", line 530, in train
loss, lr = train_exe.run(fetch_list=train_fetch_list)
File "/usr/local/lib/python2.7/site-packages/paddle/fluid/parallel_executor.py", line 280, in run
return_numpy=return_numpy)
File "/usr/local/lib/python2.7/site-packages/paddle/fluid/executor.py", line 665, in run
return_numpy=return_numpy)
File "/usr/local/lib/python2.7/site-packages/paddle/fluid/executor.py", line 527, in _run_parallel
exe.run(fetch_var_names, fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: Invoke operator cross_entropy error.
Python Callstacks:
File "/usr/local/lib/python2.7/site-packages/paddle/fluid/framework.py", line 1748, in append_op
attrs=kwargs.get("attrs", None))
File "/usr/local/lib/python2.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
return self.main_program.current_block().append_op(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/paddle/fluid/layers/nn.py", line 1547, in cross_entropy
"ignore_index": ignore_index})
File "train.py", line 235, in calc_loss
loss = fluid.layers.cross_entropy(input=softmax_out, label=smooth_label, soft_label=True)
File "train.py", line 275, in net_config
loss_a = calc_loss(epsilon,y_a,class_dim,softmax_out,use_label_smoothing)
File "train.py", line 332, in build_program
avg_cost = net_config(image=image, y_a=y_a, y_b=y_b, lam=lam, model=model, args=args, label=0, is_train=True)
File "train.py", line 401, in train
args=args)
File "train.py", line 651, in main
train(args)
File "train.py", line 655, in <module>
main()
C++ Callstacks:
问题4
ResNet50_vd + fp16=True + use_label_smoothing=False
hang 住:
命令如下:
python train.py \
--model=ResNet50_vd \
--batch_size=256 \
--fp16=True \
--total_images=1281167 \
--image_shape=3,224,224 \
--class_dim=1000 \
--lr_strategy=cosine_decay \
--lr=0.1 \
--num_epochs=200 \
--with_mem_opt=False \
--model_save_dir=output/ \
--l2_decay=7e-5 \
--use_mixup=True \
--use_label_smoothing=False \
--label_smoothing_epsilon=0.1