update histogram op for performance optimization, test=develop (!24912) · 合并请求 · PaddlePaddle / Paddle

update histogram op for performance optimization, test=develop !24912

Created by: qili93

PR types

Performance optimization

PR changes

OPs

Describe

Address review comments in RP https://github.com/PaddlePaddle/Paddle/pull/24562

paddle/fluid/operators/histogram_op.cu 1.1 变量名修改，例如bVal => b_val 1.2 kernel使用shared memory优化CudaAtomicAdd

python/paddle/tensor/linalg.py 修改: 2.1 code-block下面要有一个空行 2.2 确认英文文档可以正常预览

python/paddle/fluid/tests/unittests/test_histogram_op.py 3.1 添加报错相关的单测 3.2 单测中，输入数据类型要求是fp64 3.3 增加浮点数的例子，说明浮点数计算规则

其中cuda代码修改之后，cuda kernel的运行时间比较如下

原有的非shared memory方式，input date shape = shape x shape shape = 512 time = 0.23 shape = 1024 time = 0.86 shape = 4096 time = 3.37 shape = 8192 time = 13.43

当前shared memory方式，input date shape = shape x shape shape = 512 time = 0.03 shape = 1024 time = 0.16 shape = 4096 time = 0.58 shape = 8192 time = 4.27

PaddlePaddle / Paddle 大约 2 年 前同步成功