[Paddle-TRT] Fixes #24731, opt for SoftmaxKernelWithEltadd kernel, test=develop (#24834)
* blockReduce opt * launch threads align to warpSize * reduce unnecessary shared memory for broadcast reduced value * vectorize SoftmaxKernelWithEltadd * add fp16 constrain * test=develop
Showing
想要评论请 注册 或 登录