提交 · 5648bd80d9dc07afea3b93395e61888cdeb40424 · 机器未来 / Paddle

12 4月, 2021 1 次提交

[NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994) · 5648bd80

由 pangyoki 提交于 4月 12, 2021

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* change TensorFromVector to FillNpuTensorWithConstant

* fix ignored api

* delete extra unittest

* fix little error

* fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu

* change TensorCopySync to TensorCopy

* delete useless Wait and add StreamWait

* fix npu_stream error

* fix check_finite_and_unscale_op_npu TensorCopy

* only save stream wait

* fix NPUDeviceContext in all c++ unittest

* delete wait
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

5648bd80

02 3月, 2021 1 次提交

[NPU] add npu kernel for elementwise_add_grad (#31347) · 8497e2aa

由 Leo Chen 提交于 3月 02, 2021

* fix reading flags from env

* fix problem caused by async run

* support partial grad

* support elementwise_add_grad npu kernel

* add unittest

* fix bug?

8497e2aa

22 2月, 2021 1 次提交

add npu kernel for elementwise_sub and elementwise_sub_grad (#30973) · 5cb20f30

由 Leo Chen 提交于 2月 22, 2021

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

5cb20f30

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致