提交 · 064bc4b819b505682fa46dedfb2fc4d8d5813c61 · 机器未来 / Paddle

21 1月, 2022 1 次提交

[PTEN] Add cpu context (#38979) · 064bc4b8

由 Wilber 提交于 1月 21, 2022

* add cpu_context.

* update

* update

* update

* update

* update

* fix ci problem

* fix npu ci problem

* update

* fix ci compile

064bc4b8

18 1月, 2022 1 次提交

[Unify Tensors PR #8] Merged Tensor into DenseTensor, test=allcases (#38914) · 2052f1e3

由 Zhanlue Yang 提交于 1月 18, 2022

* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Patched python level LoDTensor

* Merge Tensor into DenseTensor

* Fixed namespace issues,test=allcases

* Fixed merge issues

* Fixed inference issues

* Fixed NPU test issues

* Fixed merge issues

2052f1e3

15 1月, 2022 1 次提交

[Unify Tensors PR ] Merged LoDTensor with Tensor, test=allcases (#38880) · 88966b28

由 Zhanlue Yang 提交于 1月 15, 2022

* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Fixed example code failure

* Polished function names, removed duplicated forward declarations

88966b28

04 1月, 2022 1 次提交
- A
  Fix memcpyD2H sync behavior with other stream (#38647) · c0c54ba3
  由 Aurelius84 提交于 1月 04, 2022
```
* Fix memcpyD2H sync behavior with other stream

* add wait

* add wait

* add wait
```
  c0c54ba3
29 11月, 2021 1 次提交

Support fetch lodtensor array (#37580) · a0678eb1

由 wanghuancoder 提交于 11月 29, 2021

* suport fetch lodtensor array, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

a0678eb1

26 8月, 2021 1 次提交

Support Multi-Stream, Single-Thread in New Executor (#35024) · 678a259a

由 Aurelius84 提交于 8月 26, 2021

* Modify into QueueSync QueueAsync

* fix complie on MacOS

* fix pointer

* fix conflict

* polish unittest

* fix windows fetch error

* polish code according reviewer

* fix device_guard on CPU place

678a259a

22 7月, 2021 1 次提交

copy found_inf to cpu in advance to improve performance (#34274) · 781f4028

由 Leo Chen 提交于 7月 22, 2021

* copy found_inf to cpu in advance to improve performance

* add npu test

* add npu test

* refine code

* refine memcpy op

* fix adam

781f4028

19 4月, 2021 1 次提交

[NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8

由 Leo Chen 提交于 4月 19, 2021

* [NPU] support GarbageCollector for npu (#31874)

* support GarbageCollector for npu

* fix typo

* fix gather_grad

* disable NPUDefaultStreamGarbageCollector on NPU

* [NPU] support npu for memcpy op (#31808)

* support npu for memcpy op

* add ut

* fix ut

* fix typo

* 【NPU】fix bug of using temp vector (#31963)

* fix bug when beta1_pow on cpu (#31995)

* [NPU] support npu profiler (#31684)

* support npu profiler

* add python api

* fix bugs

* add wrapper for incomplete type

* update profile proto

* record npu wait

* add xpu placeholder

* fix adam (#32016)

* [NPU] enable async copy and  add wait before sync operation (#31956)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* make TensorFromVector/TensorToVector sync

* [NPU] Support dataloader on npu place. (#31867)

* [NPU] Wait on NPUPlace (#32086)

* [NPU] fix cast op (#32121)

* fix npu kernel of cast op to handle casting to same dtype

* add comments

* [NPU] support cann 20.3 (#32044)

* fix compile problem on cann 20.3

* fix ut

* fix test_mul

* fix check_finite_and_scale

* fix lookup_table_v2_grad

* fix cmake

* support print op

* [NPU] Support npu save load (#31893)

* support save load for NPU

* add save load npu unittest

* support np.array transform in NPU

* fix errors

* delete dygraph in unittest

* add Wait

* fix unittest

* fix review comment

* fix unittest problem

* fix little problem

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace

* refine code

* fix NPUDeviceContext in all c++ unittest (#32198)

* fix NPUDeviceContext in all c++ unittest

* refine log
Co-authored-by: Npangyoki <pangyoki@126.com>

* [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* change TensorFromVector to FillNpuTensorWithConstant

* fix ignored api

* delete extra unittest

* fix little error

* fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu

* change TensorCopySync to TensorCopy

* delete useless Wait and add StreamWait

* fix npu_stream error

* fix check_finite_and_unscale_op_npu TensorCopy

* only save stream wait

* fix NPUDeviceContext in all c++ unittest

* delete wait
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* delete useless unittest file (#32206)

* Fix op test (#32231)

* fix conditional block (#32243)

* fix adam bug again (#32246)

* fix compile

* fix ut

* fix ut
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

cbe5c9f8

04 2月, 2021 1 次提交
- W
  use iwyu clean include second time, test=develop (#30829) · 35c5b23f
  由 wanghuancoder 提交于 2月 04, 2021
```
* use iwyu clean include second time, test=develop
```
  35c5b23f
18 1月, 2021 1 次提交
- J
  
  Recompute Offload: fixed bug in memcpy (#30484) · 16ba0abc
  由 JZ-LIANG 提交于 1月 18, 2021
  
  16ba0abc
12 1月, 2021 1 次提交
- J
  
  Recompute Offload (#30233) · 75936d83
  由 JZ-LIANG 提交于 1月 12, 2021
  
  75936d83
15 10月, 2020 1 次提交
- Z
  add tensor clone (#27953) · bf412f46
  由 Zhou Wei 提交于 10月 15, 2020
```
* add tensor clone

* fix unittest test_var_base
```
  bf412f46
24 9月, 2020 1 次提交

use iwyu clean include (#27267) · df43905f

由 wanghuancoder 提交于 9月 24, 2020

* use iwyu clean include, test=develop, test=win

* compilation error, test=develop

* fix compilation error2, test=develop

* fix compilation error3, test=develop

* fix compilation error4, test=develop

* fix compilation error5, test=develop

* fix compilation error6, test=develop

* fix compilation error7, test=develop

* fix compilation error8, test=develop

* fix compilation error8, test=develop

* fix compilation error10, test=develop

* fix compilation error11, test=develop

df43905f

16 9月, 2020 1 次提交
- W
  add the error message check for the some operator · d003573f
  由 wawltor 提交于 9月 16, 2020
```
add the error message check for the some operator
```
  d003573f
20 11月, 2019 1 次提交

optimize assign op to avoid copy data from GPU to GPU (#21181) · 01a96463

由 Zhang Ting 提交于 11月 20, 2019

* optimize assign op to avoid copy data from GPU to GPU, test=develop

* modified GetkernelTypeForVar and just avoid device transform, test=develop

01a96463

07 11月, 2019 1 次提交
- H
  Add select_input_op and select_output_op (#21016) · 1957192f
  由 Huihuang Zheng 提交于 11月 07, 2019
```
These ops are useful in control flow.
```
  1957192f

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致