argsort GPU预测报错(temporary_buffer::allocate: get_temporary_buffer failed)
Created by: lxastro
环境: CentOS 6.3 Python 3.6.5 PaddlePaddle 1.5.0 GPU版 GPU:Tesla P40 CUDA: cuda-9.0 配置:
FLAGS_eager_delete_tensor_gb=0.0
FLAGS_fast_eager_deletion_mode=1
FLAGS_fraction_of_gpu_memory_to_use=0.9
FLAGS_limit_of_tmp_allocation=0
CUDA_VISIBLE_DEVICES=0
问题描述: 模型预测时,使用CPU预测,完全正常。 使用GPU预测,将 place = fluid.CPUPlace()改为 place = fluid.CUDAPlace(0) 后出现错误:
temporary_buffer::allocate: get_temporary_buffer failed
temporary_buffer::allocate: get_temporary_buffer failed
temporary_buffer::allocate: get_temporary_buffer failed
temporary_buffer::allocate: get_temporary_buffer failed
temporary_buffer::allocate: get_temporary_buffer failed
terminate called after throwing an instance of 'thrust::system::system_error'
what(): device free failed: unspecified launch failure
Aborted
最小复现代码如下:
import paddle.fluid as fluid
import numpy as np
import sys
N = 900
if len(sys.argv)>1:
N = int(sys.argv[1])
def sort(feat):
feat, _ = fluid.layers.argsort(input=feat, axis=2)
return feat
def get_data():
return np.zeros((2, 1024, N), dtype='float32')
if __name__ == "__main__":
x = fluid.layers.data(name='x', dtype='float32', shape=[1024, N])
y = sort(x)
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
program = fluid.default_main_program()
exe.run(fluid.default_startup_program())
test_program = program.clone(for_test=True)
indata = get_data()
result = exe.run(test_program, feed={"x": indata}, fetch_list=[y])
print('output:', result[0].shape)
在上述代码中: 将N设置为200以下,基本不会报错。 将N设置为250左右,有一定概率报错。 将N设置为300以上,几乎都会报错。