Created by: zhangting2020
Shape Op的kernel只使用到了输入变量的dims,因此当输入变量所在的设备与shape的执行设备不一致时,并不需要进行设备间的数据拷贝。
测试代码:
import paddle.fluid as fluid
import numpy as np
# inputs will be on CPUPlace
inputs = fluid.layers.fill_constant(shape=[3, 100, 100], dtype="float32", value=0.5, force_cpu=True)
output = fluid.layers.shape(inputs) # shape will be executed on CUDAPlace
exe = fluid.Executor(fluid.CUDAPlace(0))
exe.run(fluid.default_startup_program())
res = exe.run(fluid.default_main_program(), fetch_list=[output])
修改前的log:inputs发生了CPU->GPU的拷贝
I0316 08:26:28.890712 1537 operator.cc:1075] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0316 08:26:28.893126 1537 operator.cc:180] CUDAPlace(0) Op(fill_constant), inputs:{ShapeTensor[], ShapeTensorList[]}, outputs:{Out[fill_constant_0.tmp_0:float[3, 100, 100]({})]}.
I0316 08:26:28.893184 1537 operator.cc:1075] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0316 08:26:28.893223 1537 operator.cc:1207] Transform Variable fill_constant_0.tmp_0 from data_type[float]:data_layout[NCHW]:place[CPUPlace]:library_type[PLAIN] to data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0316 08:26:28.893250 1537 scope.cc:169] Create variable fill_constant_0.tmp_0
I0316 08:26:28.893293 1537 data_device_transform.cc:21] DeviceTransform in, src_place CPUPlace dst_place: CUDAPlace(0)
I0316 08:26:28.893330 1537 tensor_util.cu:129] TensorCopySync 3, 100, 100 from CPUPlace to CUDAPlace(0)
I0316 08:26:28.893654 1537 auto_growth_best_fit_allocator.cc:97] Not found and reallocate 120320, and remaining 256
I0316 08:26:28.893820 1537 operator.cc:180] CUDAPlace(0) Op(shape), inputs:{Input[fill_constant_0.tmp_0:float[3, 100, 100]({})]}, outputs:{Out[shape_0.tmp_0:int[3]({})]}.
修改后的log:CPU->GPU的数据拷贝已避免
I0316 08:47:48.874794 13387 operator.cc:1075] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0316 08:47:48.891762 13387 operator.cc:180] CUDAPlace(0) Op(fill_constant), inputs:{ShapeTensor[], ShapeTensorList[]}, outputs:{Out[fill_constant_0.tmp_0:float[3, 100, 100]({})]}.
I0316 08:47:48.891891 13387 operator.cc:1075] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0316 08:47:48.891978 13387 operator.cc:180] CUDAPlace(0) Op(shape), inputs:{Input[fill_constant_0.tmp_0:float[3, 100, 100]({})]}, outputs:{Out[shape_0.tmp_0:int[3]({})]}.