PaddlePaddle / Paddle
接近 2 年前同步成功

代码
- 文件
- 提交
- 分支
- Tags
- 贡献者
- 分支图
- Diff
Issue 1423
- 列表
- 看板
- 标记
- 里程碑
合并请求 543
Wiki 0
- Wiki
分析
- 仓库
- DevOps
项目成员
Pages

[Paddle-TRT] Fixes #24731, opt for SoftmaxKernelWithEltadd kernel, test=develop !24834

Created by: zlsh80826

PR types

Function optimization

PR changes

OPs

Describe

Few changes to give the kernel SoftmaxKernelWithEltadd 1.25x speedup

Change the blockReduce behavior as issue #24731 (closed) describe. Every thread calls the blockReduceXXX can directly obtain the reduced value. Thus, shared memory broadcast isn't needed.
Number of launch threads for SoftmaxKernelWithEltadd align to 32 instead of power of 2.
vectorization when available