Created by: donproc
PR types
Performance optimization
PR changes
OPs
Describe
Optimize SimpleElemwiseSubGradCUDAKernel on half Profiled in Docker with nsys Tesla V100-SXM2, CUDA 10.1, NVIDIA-SMI 440.33.01, Driver Version: 440.33.01
size | before | after | speedup |
---|---|---|---|
16,2048,16,16 | 69312 | 64624 | 1.07x |