Add GPU Kernels of Segment Ops, support, sum, max, min, mean
Add the cpu version of segment sum mean max min op