训练过程中出现Error: Tensor holds no memory. Call Tensor::mutable_data first.
Created by: dipthomas
- 标题:简洁、精准概括您的问题,例如“Insufficient Memory xxx" ”
- 版本、环境信息: 1)PaddlePaddle版本:1.6 2)CPU: 3)GPU:k-40, CUDA-8.0, cudnn-7 4)系统环境:centos python2.7
- 训练信息 1)单机,单卡 2)显存信息 3)Operator信息 在windows下cpu训练没有报错(python3+paddle1.6),gpu因为模型太大显卡装不下没有测试。 在Centos下gpu训练报错 错误信息如下:
I0113 16:03:55.668917 18128 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1I0113 16:03:56.319447 18128 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = TrueI0113 16:03:59.995787 18128 parallel_executor.cc:315] Cross op memory reuse strategy is enabled, when build_strategy.memory_optimize = True or garbage collection strategy is disabled, which is not recommended/home/map/wuwenda/Python-2.7.14/Python27/lib/python2.7/site-packages/paddle/fluid/executor.py:779: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
Traceback (most recent call last):
File "infer_attention_flow.py", line 90, in <module>
run(conf=conf)
File "infer_attention_flow.py", line 81, in run
attention_flow.train(use_gpu=True)
File "/home/map/wuwenda/kongming/Attention_flow_net_Paddle/network/attention_flow_net.py", line 538, in train
fetch_list=[inf_pred, inf_loss, inf_acc]
File "/home/map/wuwenda/Python-2.7.14/Python27/lib/python2.7/site-packages/paddle/fluid/executor.py", line 780, in run
six.reraise(*sys.exc_info())
File "/home/map/wuwenda/Python-2.7.14/Python27/lib/python2.7/site-packages/paddle/fluid/executor.py", line 775, in run
use_program_cache=use_program_cache)
File "/home/map/wuwenda/Python-2.7.14/Python27/lib/python2.7/site-packages/paddle/fluid/executor.py", line 834, in _run_impl
return_numpy=return_numpy)
File "/home/map/wuwenda/Python-2.7.14/Python27/lib/python2.7/site-packages/paddle/fluid/executor.py", line 674, in _run_parallel
tensors = exe.run(fetch_var_names)._move_to_list()
paddle.fluid.core_avx.EnforceNotMet:
--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::framework::Tensor::check_memory_size() const
3 float const* paddle::framework::Tensor::data<float>() const
4 paddle::operators::BatchNormGradKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const
5 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::BatchNormGradKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::BatchNormGradKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::BatchNormGradKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
6 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
7 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
8 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
9 paddle::framework::details::ComputationOpHandle::RunImpl()
10 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*)
11 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue<unsigned long> > const&, unsigned long*)
12 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&)
13 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
14 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const
------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
File "/home/map/wuwenda/Python-2.7.14/Python27/lib/python2.7/site-packages/paddle/fluid/framework.py", line 2488, in append_op
attrs=kwargs.get("attrs", None))
File "/home/map/wuwenda/Python-2.7.14/Python27/lib/python2.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
return self.main_program.current_block().append_op(*args, **kwargs)
File "/home/map/wuwenda/Python-2.7.14/Python27/lib/python2.7/site-packages/paddle/fluid/layers/nn.py", line 4363, in batch_norm
attrs=attrs)
File "/home/map/wuwenda/kongming/Attention_flow_net_Paddle/network/attention_flow_net.py", line 399, in _build_graph
moving_variance_name='final_conv.bn.pop_var'
File "/home/map/wuwenda/kongming/Attention_flow_net_Paddle/network/attention_flow_net.py", line 438, in build_graph
output_pl = self._build_graph()
File "/home/map/wuwenda/kongming/Attention_flow_net_Paddle/network/attention_flow_net.py", line 473, in train
inf_pred, inf_loss, inf_acc = self.build_graph()
File "infer_attention_flow.py", line 81, in run
attention_flow.train(use_gpu=True)
File "infer_attention_flow.py", line 90, in <module>
run(conf=conf)
----------------------
Error Message Summary:
----------------------
Error: Tensor holds no memory. Call Tensor::mutable_data first.
[Hint: holder_ should not be null.] at (/Paddle/paddle/fluid/framework/tensor.cc:23)
[operator < batch_norm_grad > error]
出错代码段如下:
final_pl = fluid.layers.conv2d(
input=decodoing_pl,
num_filters=self._num_classes,
filter_size=1,
padding='SAME',
name='final_conv'
)
final_pl = fluid.layers.batch_norm(
input=final_pl,
act=None,
is_test=self._is_training_pl,
param_attr=fluid.ParamAttr(name='final_conv.bn.gamma'),
bias_attr=fluid.ParamAttr(name='final_conv.bn.beta'),
moving_mean_name='final_conv.bn.pop_mean',
moving_variance_name='final_conv.bn.pop_var'
#name='final_conv.bn'
)
final_pl = fluid.layers.reshape(final_pl, shape=[b, t, -1, h, w])
return final_pl