训练过程中出现Error: Tensor holds no memory. Call Tensor::mutable_data first. (#22247) · Issue · PaddlePaddle / Paddle

训练过程中出现Error: Tensor holds no memory. Call Tensor::mutable_data first.

Created by: dipthomas

标题：简洁、精准概括您的问题，例如“Insufficient Memory xxx" ”
版本、环境信息： 1）PaddlePaddle版本：1.6 2）CPU： 3）GPU：k-40, CUDA-8.0， cudnn-7 4）系统环境：centos python2.7
训练信息 1）单机，单卡 2）显存信息 3）Operator信息在windows下cpu训练没有报错（python3+paddle1.6），gpu因为模型太大显卡装不下没有测试。在Centos下gpu训练报错错误信息如下：

I0113 16:03:55.668917 18128 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1I0113 16:03:56.319447 18128 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = TrueI0113 16:03:59.995787 18128 parallel_executor.cc:315] Cross op memory reuse strategy is enabled, when build_strategy.memory_optimize = True or garbage collection strategy is disabled, which is not recommended/home/map/wuwenda/Python-2.7.14/Python27/lib/python2.7/site-packages/paddle/fluid/executor.py:779: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "infer_attention_flow.py", line 90, in <module>
    run(conf=conf)
  File "infer_attention_flow.py", line 81, in run
    attention_flow.train(use_gpu=True)
  File "/home/map/wuwenda/kongming/Attention_flow_net_Paddle/network/attention_flow_net.py", line 538, in train
    fetch_list=[inf_pred, inf_loss, inf_acc]
  File "/home/map/wuwenda/Python-2.7.14/Python27/lib/python2.7/site-packages/paddle/fluid/executor.py", line 780, in run
    six.reraise(*sys.exc_info())
  File "/home/map/wuwenda/Python-2.7.14/Python27/lib/python2.7/site-packages/paddle/fluid/executor.py", line 775, in run
    use_program_cache=use_program_cache)
  File "/home/map/wuwenda/Python-2.7.14/Python27/lib/python2.7/site-packages/paddle/fluid/executor.py", line 834, in _run_impl
    return_numpy=return_numpy)
  File "/home/map/wuwenda/Python-2.7.14/Python27/lib/python2.7/site-packages/paddle/fluid/executor.py", line 674, in _run_parallel
    tensors = exe.run(fetch_var_names)._move_to_list()
paddle.fluid.core_avx.EnforceNotMet: 

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2   paddle::framework::Tensor::check_memory_size() const
3   float const* paddle::framework::Tensor::data<float>() const
4   paddle::operators::BatchNormGradKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const
5   std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::BatchNormGradKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::BatchNormGradKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::BatchNormGradKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
6   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
7   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
8   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
9   paddle::framework::details::ComputationOpHandle::RunImpl()
10  paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*)
11  paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue<unsigned long> > const&, unsigned long*)
12  std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&)
13  std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
14  ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
  File "/home/map/wuwenda/Python-2.7.14/Python27/lib/python2.7/site-packages/paddle/fluid/framework.py", line 2488, in append_op
    attrs=kwargs.get("attrs", None))
  File "/home/map/wuwenda/Python-2.7.14/Python27/lib/python2.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
    return self.main_program.current_block().append_op(*args, **kwargs)
  File "/home/map/wuwenda/Python-2.7.14/Python27/lib/python2.7/site-packages/paddle/fluid/layers/nn.py", line 4363, in batch_norm
    attrs=attrs)
  File "/home/map/wuwenda/kongming/Attention_flow_net_Paddle/network/attention_flow_net.py", line 399, in _build_graph
    moving_variance_name='final_conv.bn.pop_var'
  File "/home/map/wuwenda/kongming/Attention_flow_net_Paddle/network/attention_flow_net.py", line 438, in build_graph
    output_pl = self._build_graph()
  File "/home/map/wuwenda/kongming/Attention_flow_net_Paddle/network/attention_flow_net.py", line 473, in train
    inf_pred, inf_loss, inf_acc = self.build_graph()
  File "infer_attention_flow.py", line 81, in run
    attention_flow.train(use_gpu=True)
  File "infer_attention_flow.py", line 90, in <module>
    run(conf=conf)

----------------------
Error Message Summary:
----------------------
Error: Tensor holds no memory. Call Tensor::mutable_data first.
  [Hint: holder_ should not be null.] at (/Paddle/paddle/fluid/framework/tensor.cc:23)
  [operator < batch_norm_grad > error]

出错代码段如下：
       final_pl = fluid.layers.conv2d(
            input=decodoing_pl,
            num_filters=self._num_classes,
            filter_size=1,
            padding='SAME',
            name='final_conv'
        )
        final_pl = fluid.layers.batch_norm(
            input=final_pl,
            act=None,
            is_test=self._is_training_pl,
            param_attr=fluid.ParamAttr(name='final_conv.bn.gamma'),
            bias_attr=fluid.ParamAttr(name='final_conv.bn.beta'),
            moving_mean_name='final_conv.bn.pop_mean',
            moving_variance_name='final_conv.bn.pop_var'
            #name='final_conv.bn'
        )
        final_pl = fluid.layers.reshape(final_pl, shape=[b, t, -1, h, w])

        return final_pl

PaddlePaddle / Paddle 大约 2 年 前同步成功

训练过程中出现Error: Tensor holds no memory. Call Tensor::mutable_data first.

PaddlePaddle / Paddle
大约 2 年前同步成功