depthwise conv2d在实现时,GPU上和CPU上的存在较大diff
Created by: jiangjiajun
import paddle.fluid as fluid
input = fluid.layers.data(dtype='float32', shape=[None, 384, 28, 28], name='data')
pad_input = fluid.layers.pad2d(input, paddings=[2, 2, 2, 2], pad_value=0.0)
result = fluid.layers.conv2d(pad_input, bias_attr=False, param_attr="depthwise_kernel",
num_filters=384, filter_size=[5, 5], stride=[1, 1], groups=384)
exe = fluid.Executor(fluid.CPUPlace())
exe.run(fluid.default_startup_program())
#fluid.io.save_params(executor=exe, dirname='test_model', main_program=None)
fluid.io.load_params(executor=exe, dirname='test_model', main_program=fluid.default_main_program())
import numpy
numpy.random.seed(13)
data = numpy.random.rand(5, 384, 28, 28).astype('float32')
res, = exe.run(feed={'data':data}, fetch_list=[result])
numpy.save('res_cpu.npy', res)
上面是我的测试代码,当把CPUPlace换成CUDAPlace后,两者同样的模型结果存在1e-03级的diff,目前测得CPU上跟TensorFlow对齐,但GPU没有对齐(TensorFlow测得CPU与GPU结果一致)