Created by: NHZlX
The Bug in multihead matmul op:
Tensor multihead_temp_tensor;
// B * head_number * S * S * 1 + B * S * 3 * N * H
int scratch_size = batch * head_number * seq_len * seq_len * 1;
multihead_temp_tensor.Resize({scratch_size + temp_out_tensor.numel()});
auto *multihead_temp_data =
multihead_temp_tensor.mutable_data<T>(context.GetPlace());
auto *qkptr = multihead_temp_data;
auto *tptr = multihead_temp_data + scratch_size;
TransQKVWithBias(batch, seq_len, head_size, head_number, temp_out_data,
bias_d, tptr, stream);
When run TransQKVWithBias func, float4 may be used,so the tptr pointer value needs to be guaranteed to be a multiple of 4.
The error log:
nvalid __global__ write of size 16
========= at 0x00000250 in void paddle::operators::transpose_qkv_kernel<float4>(int, float4 const *, float4 const , paddle::operators::transpose_qkv_kernel<float4>*)
========= by thread (0,0,0) in block (20,0,0)
========= Address 0x7f8685d19048 is misaligned
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/local/nvidia/lib64/libcuda.so.1 (cuLaunchKernel + 0x2c5) [0x269e85]
========= Host Frame:/paddle/pd_dynamic_shape/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so [0x3a05c59]
========= Host Frame:/paddle/pd_dynamic_shape/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so [0x3a05ce7]
========= Host Frame:/paddle/pd_dynamic_shape/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so [0x3a3c035]
========= Host Frame:/paddle/pd_dynamic_shape/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so (_ZN6paddle9operators16TransQKVWithBiasEiiiiPKfS2_PfP11CUstream_st + 0x552) [0x1f3c1d2]
请注册或登录再回复