Created by: chengduoZH
- wrap the __shfl_down_sync
-
make __shfl_down_sync support unsigned, long, long long, double, etc.
For CUDA 9.0 and above 9.0, the type of __shfl_down_sync's parameter can be int, unsigned int, long, unsigned long, long long, unsigned long long, float or double. With the cuda_fp16.h header included, it can also be __half or __half2.
Below CUDA 9.0, the type of __shfl_down's parameter should be int or float, other than must first be cast. Those things that casting other types to int or float have been done in sm_30_intrinsics.h or sm_30_intrinsics.hpp. please refer: /home/work/cuda-6.5/include/sm_30_intrinsics.h