Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #24715

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 5月 24, 2020 by saxon_zh@saxon_zhGuest

训练resnet50错误

Created by: hwx724221178

为使您的问题得到快速解决,在建立Issues前,请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】

标题:利用Natural Images fine tune,像是gpu错误 版本、环境信息: 1)PaddlePaddle版本:paddlepaddle-gpu 1.8.0.post107 3)GPU:gtx950m ,cuda 10.2, V10.2.89 4)系统环境:py3.7+win10 家庭普通版 训练信息 1)单机,单卡 2)6169m 问题描述: 这个是b站上边别人fine-tune Natural Images 视频的文件(录制时间2018年),视频文件显示运行良好,没有任何bug。自己加载文件目录也是正确的,但是当运行时候就会报错,下边先展示运行moudle代码 代码:

# 全局默认上下文是 fluid.default_main_program() 和 fluid.default_startup_program()
# 然而,为了更好地掌控,我们选择如下方式自己创建 Program 上下文

# 生成训练部分上下文
train_program = fluid.Program()
train_startup = fluid.Program()

# 使用 program_guard() 选择 train_program 和 train_startup 作为上下文
# 这样,在 with 块中,新产生的 Operator 和 Variable 将会添加到他们之中
with fluid.program_guard(train_program, train_startup):
    # fluid.unique_name.guard() 会处理使得上下文中 Operator 和 Variable 的名字不会同名冲突
    with fluid.unique_name.guard():
        # 注意:从数据流入 PaddlePaddle 后端开始,即从 py_reader 开始,
        # 所有的 Operator 和 Variable 必须保证在同一个 Program 上下文中
        # 否则的话会造成异常情况
        
        # py_reader 部分主要是给予 python 部分数据流的数据以类型信息
        # 例如,矩阵的大小,矩阵的类型等等
        train_reader = fluid.layers.py_reader(
            capacity=64,
            shapes=[(-1, 3, 224, 224), (-1, 1)],
            dtypes=('float32', 'int64'),
            name='train_reader')
        train_reader.decorate_paddle_reader(base_train_reader)
        # read_file() 的功能和其名字有些出入,主要作用是执行参数 reader
        # 在数据流中生成 Variable(s),而不是读取一个文件之类的功用
        train_image, train_label = fluid.layers.read_file(train_reader)
        # 使用训练模式创建网络,即添加到 train_program 中去
        loss, train_fetch_list = create_network(
            train_image, train_label, class_dim=len(label_names), is_test=False)
        
        # 优化器是用于对网络求导并反向传播训练网络的
        # 这里我们使用 Adam 这个优化器进行 fine-tune (再训练),大家也可以选择一些其他的优化器
        # 此处学习率设定为 0.005 如果发现 loss 的输出有明显震荡或者很快发散 (变得很大或者达到 NaN)
        # 请重新启动 python 内核,并将其减小
        optimizer = fluid.optimizer.Adam(learning_rate=0.005)
        optimizer.minimize(loss)

# 生成验证部分上下文
test_program = fluid.Program()
test_startup = fluid.Program()

with fluid.program_guard(test_program, test_startup):
    with fluid.unique_name.guard():
        test_reader = fluid.layers.py_reader(
            capacity=64,
            shapes=[(-1, 3, 224, 224), (-1, 1)],
            dtypes=('float32', 'int64'),
            name='test_reader')
        test_reader.decorate_paddle_reader(base_test_reader)
        test_image, test_label = fluid.layers.read_file(test_reader)
        _, test_fetch_list = create_network(
            test_image, test_label, class_dim=len(label_names), is_test=True)
        
        # 验证部分不需要优化器

# 大家如果没有 GPU,可以将其改为 False
# 注意: CPU 中 PaddlePaddle 并不一定实现了所有的 Operator 的训练,一般建议大家使用 GPU
use_gpu = True
place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)

# 运行两个上下文的初始化部分
exe.run(program=train_startup)
exe.run(program=test_startup)

# 载入预训练的模型参数
def _predicate(var):
    # 不要载入最后一个 全连接层 (名字中含有 fc),因为 ImageNet 的最后输出是 1000 分类
    # 现在 NaturalImages 这个数据集是 8 分类,导致全连接层的参数矩阵大小不一致
    if 'fc' in var.name:
        return False
    return os.path.exists(os.path.join(pretrained_model_path, var.name))
fluid.io.load_vars(exe, pretrained_model_path, predicate=_predicate, main_program=train_program)

错误

2020-05-24 17:57:29,993-WARNING: paddle.fluid.layers.py_reader() may be deprecated in the near future. Please use paddle.fluid.io.DataLoader.from_generator() instead.
---------------------------------------------------------------------------
EnforceNotMet                             Traceback (most recent call last)
<ipython-input-16-6840936cb321> in <module>
     35         # 请重新启动 python 内核,并将其减小
     36         optimizer = fluid.optimizer.Adam(learning_rate=0.005)
---> 37         optimizer.minimize(loss)
     38 
     39 # 生成验证部分上下文

<decorator-gen-187> in minimize(self, loss, startup_program, parameter_list, no_grad_set)

E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\dygraph\base.py in __impl__(func, *args, **kwargs)
    201         def __impl__(func, *args, **kwargs):
    202             with _switch_tracer_mode_guard_(is_train=False):
--> 203                 return func(*args, **kwargs)
    204 
    205         return __impl__(func)

E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\optimizer.py in minimize(self, loss, startup_program, parameter_list, no_grad_set)
    832             startup_program=startup_program,
    833             parameter_list=parameter_list,
--> 834             no_grad_set=no_grad_set)
    835 
    836         optimize_ops = self.apply_optimize(

E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\optimizer.py in backward(self, loss, startup_program, parameter_list, no_grad_set, callbacks)
    675             with program_guard(program, startup_program):
    676                 params_grads = append_backward(loss, parameter_list,
--> 677                                                act_no_grad_set, callbacks)
    678                 # Note: since we can't use all_reduce_op now,
    679                 #  dgc_op should be the last op of one grad.

E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\backward.py in append_backward(loss, parameter_list, no_grad_set, callbacks, checkpoints)
   1414 
   1415     _append_backward_vars_(target_grad_block, fwd_op_num, grad_to_var,
-> 1416                            grad_info_map)
   1417 
   1418     program.current_block_idx = current_block_idx

E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\backward.py in _append_backward_vars_(block, start_op_idx, grad_to_var, grad_info_map)
   1114         # infer_shape and infer_type
   1115         op_desc.infer_var_type(block.desc)
-> 1116         op_desc.infer_shape(block.desc)
   1117 
   1118         for arg in op_desc.output_arg_names():

EnforceNotMet: 

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
Windows not support stack backtrace yet.

------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\framework.py", line 2610, in append_op
    attrs=kwargs.get("attrs", None))
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\layer_helper.py", line 43, in append_op
    return self.main_program.current_block().append_op(*args, **kwargs)
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\layers\nn.py", line 4197, in batch_norm
    type="batch_norm", inputs=inputs, outputs=outputs, attrs=attrs)
  File "<ipython-input-9-c0a07b9fd0eb>", line 76, in conv_bn_layer
    input=conv, act=act, is_test=self.is_test, param_attr=param_attr)
  File "<ipython-input-9-c0a07b9fd0eb>", line 106, in bottleneck_block
    trainable=trainable)
  File "<ipython-input-9-c0a07b9fd0eb>", line 41, in net
    conv = self.bottleneck_block(input=conv, num_filters=512, stride=1, trainable=False)
  File "<ipython-input-10-3a2c2fad7823>", line 11, in create_network
    out = model.net(image, class_dim=class_dim)
  File "<ipython-input-16-6840936cb321>", line 30, in <module>
    train_image, train_label, class_dim=len(label_names), is_test=False)
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\IPython\core\interactiveshell.py", line 3254, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\IPython\core\interactiveshell.py", line 3063, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\IPython\core\async_helpers.py", line 68, in _pseudo_sync_runner
    coro.send(None)
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\IPython\core\interactiveshell.py", line 2886, in _run_cell
    return runner(coro)
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\IPython\core\interactiveshell.py", line 2858, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\ipykernel\zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\ipykernel\ipkernel.py", line 300, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\tornado\gen.py", line 209, in wrapper
    yielded = next(result)
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\ipykernel\kernelbase.py", line 545, in execute_request
    user_expressions, allow_stdin,
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\tornado\gen.py", line 209, in wrapper
    yielded = next(result)
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\ipykernel\kernelbase.py", line 268, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\tornado\gen.py", line 209, in wrapper
    yielded = next(result)
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\ipykernel\kernelbase.py", line 365, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\tornado\gen.py", line 748, in run
    yielded = self.gen.send(value)
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\tornado\gen.py", line 787, in inner
    self.run()
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\tornado\ioloop.py", line 743, in _run_callback
    ret = callback()
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\tornado\ioloop.py", line 690, in <lambda>
    lambda f: self._run_callback(functools.partial(callback, future))
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\asyncio\events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\asyncio\base_events.py", line 1786, in _run_once
    handle._run()
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\asyncio\base_events.py", line 541, in run_forever
    self._run_once()
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\tornado\platform\asyncio.py", line 149, in start
    self.asyncio_loop.run_forever()
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\ipykernel\kernelapp.py", line 583, in start
    self.io_loop.start()
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\traitlets\config\application.py", line 664, in launch_instance
    app.start()
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)

----------------------
Error Message Summary:
----------------------
NotFoundError: Output(Scale@GRAD) and Output(Bias@GRAD) must be null or not be null at same time. But now, has Scale@Grad=[0], has Bias@GRAD=[1]
  [Hint: Expected (has_scale_grad == has_bias_grad) == true, but received (has_scale_grad == has_bias_grad):0 != true:1.] at (D:\1.8.0\paddle\paddle\fluid\operators\batch_norm_op.cc:468)
  [operator < batch_norm_grad > error]
指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#24715
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7