PaddlePaddle / Paddle
大约 1 年前同步成功

代码
- 文件
- 提交
- 分支
- Tags
- 贡献者
- 分支图
- Diff
Issue 1423
- 列表
- 看板
- 标记
- 里程碑
合并请求 543
Wiki 0
- Wiki
分析
- 仓库
- DevOps
项目成员
Pages

Blas optimized elementwise_add forward and backward passes !10913

Created by: tpatejko

This PR implements optimization of elementwse_add forward and backward passes. It includes for forward pass:

MKL VML-based optimization with v?Add then MKL/MKLDNN are used
Blas-based optimization with VCopy and SAXPY operations when MKL is disabled

For backward pass:

Blas level 1 VCopy is used for copying dx and dy vectors.

When integral or float16 types, or GPU device are used, the implementation falls back to the default (generic) elementwise_add operation.