[OP Update]：Replace the current Eigen arg min or arg max implemention with the cub one (!25941) · 合并请求 · PaddlePaddle / Paddle

[OP Update]：Replace the current Eigen arg min or arg max implemention with the cub one !25941

Created by: NHZlX

PR types

Performance optimization

PR changes

OPs

Describe

The current arg max and arg min impl base on eigen， in addition, many templates are used at the same time. This leads to the increase of the size of the inference lib(60M for each). So, we use the cub to impl this.

Here are some data to show the current performance results and lib size compared with eigen.

	eigen	Cuda cub
(1000, 10) axis = -1	0.068563ms	0.044079ms
(1000, 100) axis = -1	0.101589ms	0.042587ms
(1000, 1000) axis = -1	0.947479ms	0.170187ms
(1000, 10000) axis = -1	9.41424ms	0.336944ms

	eigen	Cuda cub
(1000, 10, 10) axis = 1	0.12406ms	0.13495ms
(1000, 100, 10) axis = 1	0.121825ms	0.185797ms
(1000, 1000, 10) axis = 1	0.406775ms	1.35506ms
(1000, 10000, 10) axis = 1	3.57772ms	3.65801ms

	Eigen	Cuda cub
ArgMin	60M	1.3M
ArgMax	60M	1.3M

PaddlePaddle / Paddle 1 年多 前同步成功

[OP Update]：Replace the current Eigen arg min or arg max implemention with the cub one !25941

PR types

PR changes

Describe

PaddlePaddle / Paddle
1 年多前同步成功