[cherry-pick] improve perfomance of cast and tril op (#30498)
* add fp16 support for tril_triu op (#30186)
* add VecCastCUDAKernel (#30296)
Co-authored-by: Nfurnace <34057289+windstamp@users.noreply.github.com>
Showing
想要评论请 注册 或 登录