训练resnet50错误
Created by: hwx724221178
为使您的问题得到快速解决,在建立Issues前,请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
标题:利用Natural Images fine tune,像是gpu错误 版本、环境信息: 1)PaddlePaddle版本:paddlepaddle-gpu 1.8.0.post107 3)GPU:gtx950m ,cuda 10.2, V10.2.89 4)系统环境:py3.7+win10 家庭普通版 训练信息 1)单机,单卡 2)6169m 问题描述: 这个是b站上边别人fine-tune Natural Images 视频的文件(录制时间2018年),视频文件显示运行良好,没有任何bug。自己加载文件目录也是正确的,但是当运行时候就会报错,下边先展示运行moudle代码 代码:
# 全局默认上下文是 fluid.default_main_program() 和 fluid.default_startup_program()
# 然而,为了更好地掌控,我们选择如下方式自己创建 Program 上下文
# 生成训练部分上下文
train_program = fluid.Program()
train_startup = fluid.Program()
# 使用 program_guard() 选择 train_program 和 train_startup 作为上下文
# 这样,在 with 块中,新产生的 Operator 和 Variable 将会添加到他们之中
with fluid.program_guard(train_program, train_startup):
# fluid.unique_name.guard() 会处理使得上下文中 Operator 和 Variable 的名字不会同名冲突
with fluid.unique_name.guard():
# 注意:从数据流入 PaddlePaddle 后端开始,即从 py_reader 开始,
# 所有的 Operator 和 Variable 必须保证在同一个 Program 上下文中
# 否则的话会造成异常情况
# py_reader 部分主要是给予 python 部分数据流的数据以类型信息
# 例如,矩阵的大小,矩阵的类型等等
train_reader = fluid.layers.py_reader(
capacity=64,
shapes=[(-1, 3, 224, 224), (-1, 1)],
dtypes=('float32', 'int64'),
name='train_reader')
train_reader.decorate_paddle_reader(base_train_reader)
# read_file() 的功能和其名字有些出入,主要作用是执行参数 reader
# 在数据流中生成 Variable(s),而不是读取一个文件之类的功用
train_image, train_label = fluid.layers.read_file(train_reader)
# 使用训练模式创建网络,即添加到 train_program 中去
loss, train_fetch_list = create_network(
train_image, train_label, class_dim=len(label_names), is_test=False)
# 优化器是用于对网络求导并反向传播训练网络的
# 这里我们使用 Adam 这个优化器进行 fine-tune (再训练),大家也可以选择一些其他的优化器
# 此处学习率设定为 0.005 如果发现 loss 的输出有明显震荡或者很快发散 (变得很大或者达到 NaN)
# 请重新启动 python 内核,并将其减小
optimizer = fluid.optimizer.Adam(learning_rate=0.005)
optimizer.minimize(loss)
# 生成验证部分上下文
test_program = fluid.Program()
test_startup = fluid.Program()
with fluid.program_guard(test_program, test_startup):
with fluid.unique_name.guard():
test_reader = fluid.layers.py_reader(
capacity=64,
shapes=[(-1, 3, 224, 224), (-1, 1)],
dtypes=('float32', 'int64'),
name='test_reader')
test_reader.decorate_paddle_reader(base_test_reader)
test_image, test_label = fluid.layers.read_file(test_reader)
_, test_fetch_list = create_network(
test_image, test_label, class_dim=len(label_names), is_test=True)
# 验证部分不需要优化器
# 大家如果没有 GPU,可以将其改为 False
# 注意: CPU 中 PaddlePaddle 并不一定实现了所有的 Operator 的训练,一般建议大家使用 GPU
use_gpu = True
place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
# 运行两个上下文的初始化部分
exe.run(program=train_startup)
exe.run(program=test_startup)
# 载入预训练的模型参数
def _predicate(var):
# 不要载入最后一个 全连接层 (名字中含有 fc),因为 ImageNet 的最后输出是 1000 分类
# 现在 NaturalImages 这个数据集是 8 分类,导致全连接层的参数矩阵大小不一致
if 'fc' in var.name:
return False
return os.path.exists(os.path.join(pretrained_model_path, var.name))
fluid.io.load_vars(exe, pretrained_model_path, predicate=_predicate, main_program=train_program)
错误
2020-05-24 17:57:29,993-WARNING: paddle.fluid.layers.py_reader() may be deprecated in the near future. Please use paddle.fluid.io.DataLoader.from_generator() instead.
---------------------------------------------------------------------------
EnforceNotMet Traceback (most recent call last)
<ipython-input-16-6840936cb321> in <module>
35 # 请重新启动 python 内核,并将其减小
36 optimizer = fluid.optimizer.Adam(learning_rate=0.005)
---> 37 optimizer.minimize(loss)
38
39 # 生成验证部分上下文
<decorator-gen-187> in minimize(self, loss, startup_program, parameter_list, no_grad_set)
E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\dygraph\base.py in __impl__(func, *args, **kwargs)
201 def __impl__(func, *args, **kwargs):
202 with _switch_tracer_mode_guard_(is_train=False):
--> 203 return func(*args, **kwargs)
204
205 return __impl__(func)
E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\optimizer.py in minimize(self, loss, startup_program, parameter_list, no_grad_set)
832 startup_program=startup_program,
833 parameter_list=parameter_list,
--> 834 no_grad_set=no_grad_set)
835
836 optimize_ops = self.apply_optimize(
E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\optimizer.py in backward(self, loss, startup_program, parameter_list, no_grad_set, callbacks)
675 with program_guard(program, startup_program):
676 params_grads = append_backward(loss, parameter_list,
--> 677 act_no_grad_set, callbacks)
678 # Note: since we can't use all_reduce_op now,
679 # dgc_op should be the last op of one grad.
E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\backward.py in append_backward(loss, parameter_list, no_grad_set, callbacks, checkpoints)
1414
1415 _append_backward_vars_(target_grad_block, fwd_op_num, grad_to_var,
-> 1416 grad_info_map)
1417
1418 program.current_block_idx = current_block_idx
E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\backward.py in _append_backward_vars_(block, start_op_idx, grad_to_var, grad_info_map)
1114 # infer_shape and infer_type
1115 op_desc.infer_var_type(block.desc)
-> 1116 op_desc.infer_shape(block.desc)
1117
1118 for arg in op_desc.output_arg_names():
EnforceNotMet:
--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
Windows not support stack backtrace yet.
------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\framework.py", line 2610, in append_op
attrs=kwargs.get("attrs", None))
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\layer_helper.py", line 43, in append_op
return self.main_program.current_block().append_op(*args, **kwargs)
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\layers\nn.py", line 4197, in batch_norm
type="batch_norm", inputs=inputs, outputs=outputs, attrs=attrs)
File "<ipython-input-9-c0a07b9fd0eb>", line 76, in conv_bn_layer
input=conv, act=act, is_test=self.is_test, param_attr=param_attr)
File "<ipython-input-9-c0a07b9fd0eb>", line 106, in bottleneck_block
trainable=trainable)
File "<ipython-input-9-c0a07b9fd0eb>", line 41, in net
conv = self.bottleneck_block(input=conv, num_filters=512, stride=1, trainable=False)
File "<ipython-input-10-3a2c2fad7823>", line 11, in create_network
out = model.net(image, class_dim=class_dim)
File "<ipython-input-16-6840936cb321>", line 30, in <module>
train_image, train_label, class_dim=len(label_names), is_test=False)
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\IPython\core\interactiveshell.py", line 3254, in run_ast_nodes
if (await self.run_code(code, result, async_=asy)):
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\IPython\core\interactiveshell.py", line 3063, in run_cell_async
interactivity=interactivity, compiler=compiler, result=result)
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\IPython\core\async_helpers.py", line 68, in _pseudo_sync_runner
coro.send(None)
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\IPython\core\interactiveshell.py", line 2886, in _run_cell
return runner(coro)
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\IPython\core\interactiveshell.py", line 2858, in run_cell
raw_cell, store_history, silent, shell_futures)
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\ipykernel\zmqshell.py", line 536, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\ipykernel\ipkernel.py", line 300, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\tornado\gen.py", line 209, in wrapper
yielded = next(result)
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\ipykernel\kernelbase.py", line 545, in execute_request
user_expressions, allow_stdin,
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\tornado\gen.py", line 209, in wrapper
yielded = next(result)
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\ipykernel\kernelbase.py", line 268, in dispatch_shell
yield gen.maybe_future(handler(stream, idents, msg))
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\tornado\gen.py", line 209, in wrapper
yielded = next(result)
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\ipykernel\kernelbase.py", line 365, in process_one
yield gen.maybe_future(dispatch(*args))
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\tornado\gen.py", line 748, in run
yielded = self.gen.send(value)
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\tornado\gen.py", line 787, in inner
self.run()
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\tornado\ioloop.py", line 743, in _run_callback
ret = callback()
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\tornado\ioloop.py", line 690, in <lambda>
lambda f: self._run_callback(functools.partial(callback, future))
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\asyncio\events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\asyncio\base_events.py", line 1786, in _run_once
handle._run()
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\asyncio\base_events.py", line 541, in run_forever
self._run_once()
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\tornado\platform\asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\ipykernel\kernelapp.py", line 583, in start
self.io_loop.start()
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\traitlets\config\application.py", line 664, in launch_instance
app.start()
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
----------------------
Error Message Summary:
----------------------
NotFoundError: Output(Scale@GRAD) and Output(Bias@GRAD) must be null or not be null at same time. But now, has Scale@Grad=[0], has Bias@GRAD=[1]
[Hint: Expected (has_scale_grad == has_bias_grad) == true, but received (has_scale_grad == has_bias_grad):0 != true:1.] at (D:\1.8.0\paddle\paddle\fluid\operators\batch_norm_op.cc:468)
[operator < batch_norm_grad > error]