add index initialization in the block loop for index_sample kernel when...

add index initialization in the block loop for index_sample kernel when dealing with a input tensor whose shape is larger than block_dim * grid_dim (#39736) * add block and grid loop for index_sample kernel to deal with a large-shape tensor * fix code format * limit grid dim * fix the omissive initialization of index_i in the second cycle for index_sample kernel * fix conflicts

add index initialization in the block loop for index_sample kernel when...
add index initialization in the block loop for index_sample kernel when dealing with a input tensor whose shape is larger than block_dim * grid_dim (#39736) * add block and grid loop for index_sample kernel to deal with a large-shape tensor * fix code format * limit grid dim * fix the omissive initialization of index_i in the second cycle for index_sample kernel * fix conflicts
c6950ab2 · FlyingQianMM · GitHub · 553afc07 · c6950ab2
隐藏空白更改
内联并排

Showing with 2 addition and 0 deletion

paddle/fluid/operators/index_sample_op.cu paddle/fluid/operators/index_sample_op.cu +2 -0

未找到文件。
--- a/paddle/fluid/operators/index_sample_op.cu
+++ b/paddle/fluid/operators/index_sample_op.cu
@@ -44,6 +44,7 @@ __global__ void IndexSampleForward(const IndexT* index, const T* in_data,
  unsigned int index_i = blockDim.x * blockIdx.x + threadIdx.x;
  unsigned int index_j = blockDim.y * blockIdx.y + threadIdx.y;
  for (; index_j < batch_size; index_j += blockDim.y * gridDim.y) {
+    index_i = blockDim.x * blockIdx.x + threadIdx.x;
    for (; index_i < index_length; index_i += blockDim.x * gridDim.x) {
      unsigned int index_idx = index_j * index_length + index_i;
      unsigned int in_idx = index_j * input_length + index_i;
@@ -62,6 +63,7 @@ __global__ void IndexSampleGrad(const IndexT* index, T* in_grad,
  unsigned int index_j = blockDim.y * blockIdx.y + threadIdx.y;

  for (; index_j < batch_size; index_j += blockDim.y * gridDim.y) {
+    index_i = blockDim.x * blockIdx.x + threadIdx.x;
    for (; index_i < index_length; index_i += blockDim.x * gridDim.x) {
      unsigned int index_idx = index_j * index_length + index_i;
      unsigned int in_idx = index_j * input_length + index_i;