Created by: zlsh80826
Function optimization
Others
Call cublasGemmStridedBatchedEx when using fp16 and the tensor core is available
cublasGemmStridedBatchedEx