Shall we import cub library to make writing cuda kernels more simple?
Created by: typhoonzero
Saw that caffe2 uses an external cuda library "CUB" to simplify the development of cuda kernels, like cub::BlockReduce
. Shall we import this library to simplify the cuda kernel development, and if there's need to write cuda kernels by hand and gain some performance, we can do it later.