Created by: NHZlX
Performance optimization
OPs
For minimize the inference lib,we reimpl the reduce op using cub