训练报错paddle.fluid.core_avx.EnforceNotMet: Invoke operator reshape2 error
Created by: szqxx
1)PaddlePaddle版本:1.5.2.post10.7 3)GPU:NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 CUDNN 7.0 4)系统环境:Centos OS 7,Python 3.7.2
-
训练信息 1)单卡 V100 2)显存信息 16G
-
复现信息:复现Thundernet,网络每层shape打印没问题,参考faster-RCNN修改的,训练报错
-
报错日志:请详细描述您的问题,同步贴出报错信息、日志、可复现的代码片段
Traceback (most recent call last):
File "train.py", line 248, in <module>
train()
File "train.py", line 240, in train
train_loop()
File "train.py", line 212, in train_loop
outs = train_exe.run(feed=feeder.feed(data), fetch_list=[v.name for v in fetch_list])
File "/home/wangbh/miniconda3/lib/python3.7/site-packages/paddle/fluid/parallel_executor.py", line 280, in run
return_numpy=return_numpy)
File "/home/wangbh/miniconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 672, in run
return_numpy=return_numpy)
File "/home/wangbh/miniconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 534, in _run_parallel
exe.run(fetch_var_names, fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: Invoke operator reshape2 error.
Python Call stacks:
File "/home/wangbh/miniconda3/lib/python3.7/site-packages/paddle/fluid/framework.py", line 1774, in append_op
attrs=kwargs.get("attrs", None))
File "/home/wangbh/miniconda3/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
return self.main_program.current_block().append_op(*args, **kwargs)
File "/home/wangbh/miniconda3/lib/python3.7/site-packages/paddle/fluid/layers/nn.py", line 6840, in reshape
"XShape": x_shape})
File "/home/wangbh/astar_detection/rcnn_test/utils.py", line 91, in channel_shuffle
x = fluid.layers.reshape(x=x, shape=[batchsize, 2, channels_per_group, height, width])
File "/home/wangbh/astar_detection/rcnn_test/models/snet.py", line 187, in inverted_residual_unit
return channel_shuffle(out)
File "/home/wangbh/astar_detection/rcnn_test/models/snet.py", line 62, in net
semodule=False, use_res_connect=False, name=str(idxstage+2)+'_'+str(i+1))
File "/home/wangbh/astar_detection/rcnn_test/models/model_builder.py", line 43, in build_model
c4,c5,cglb = snet.net(self.image)
File "train.py", line 80, in train
model.build_model(image_shape, backbone = 'SNet535')
File "train.py", line 248, in <module>
train()
C++ Call stacks:
Enforce failed. Expected output_shape[unk_dim_idx] * capacity == -in_size, but received output_shape[unk_dim_idx] * capacity:0 != -in_size:-267840.
让人疑惑的是最后的报错信息居然是but received output_shape[unk_dim_idx] * capacity:0 != -in_size:-267840
,这里等号左边为0意味着什么呢?
构建网络代码段如下:
def build_model(self, image_shape, backbone):
self.build_input(image_shape)
snet = SNet(backbone)
c4,c5,cglb = snet.net(self.image)
cem = context_enhancement_module(c4,c5,cglb)
# 5x5 dw and 1x1 conv
rpn = depthwise_separable(cem,
245, 256, filter_size=5, groups=1, stride=1, scale=1,
name='rpn')
print('rpn.shape = ',rpn.shape)
sam = spatial_attention_module(cem,rpn)
print('sam.shape = ',sam.shape)
# RPN
self.rpn_heads(rpn)
# Fast RCNN
self.fast_rcnn_heads(sam)
现在不便把所有代码段公开,我能保证自己复现network没问题,但在feed或者train这里报错了,还请指点~多谢