bug in NeonDepthwiseConv
Created by: NHZlX
条件:
input为(256, 6, 4)
,kernel = 3, stride = 1
, output为 (256, 6, 4)
当对最后一个channel的feature map进行滑动窗口的时候,并且执行到
https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/function/neon/NeonDepthwiseConv.h#L103
且:
c = 255, h = 5, s =0
时候,在 https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/function/neon/NeonDepthwiseConv.h#L116
会出现bug
原因是 tmp = vld1q_f32(r2 + 4);
会在该条件下产生越界访问。
目前采取的措施为:增大的newSize的值以避免内存越界访问。 https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/function/neon/NeonDepthwiseConv.cpp#L70