提交 · 5c68e79d78372b73ad9b74fe1b32259da577355c · 机器未来 / Paddle

16 6月, 2021 1 次提交

[cherry pick] Fix issue #33021 setCacheCapacity could not limit memory consumption (#33571) · 5c68e79d

由 lidanqing 提交于 6月 16, 2021

* [oneDNN] First fix to #33021  (#33174)

* - First fix to #33021

* [oneDNN] Second fix to #33021 (#33471)

* use older download_data function
Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>

5c68e79d

29 4月, 2021 1 次提交
- J
  - Added clearing oneDNN per executor (#32664) · 7ae0a80f
  由 Jacek Czaja 提交于 4月 29, 2021
```
- Executor is nt always having FLAGS_use_mkldnn set to true
```
  7ae0a80f
19 4月, 2021 1 次提交

[NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8

由 Leo Chen 提交于 4月 19, 2021

* [NPU] support GarbageCollector for npu (#31874)

* support GarbageCollector for npu

* fix typo

* fix gather_grad

* disable NPUDefaultStreamGarbageCollector on NPU

* [NPU] support npu for memcpy op (#31808)

* support npu for memcpy op

* add ut

* fix ut

* fix typo

* 【NPU】fix bug of using temp vector (#31963)

* fix bug when beta1_pow on cpu (#31995)

* [NPU] support npu profiler (#31684)

* support npu profiler

* add python api

* fix bugs

* add wrapper for incomplete type

* update profile proto

* record npu wait

* add xpu placeholder

* fix adam (#32016)

* [NPU] enable async copy and  add wait before sync operation (#31956)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* make TensorFromVector/TensorToVector sync

* [NPU] Support dataloader on npu place. (#31867)

* [NPU] Wait on NPUPlace (#32086)

* [NPU] fix cast op (#32121)

* fix npu kernel of cast op to handle casting to same dtype

* add comments

* [NPU] support cann 20.3 (#32044)

* fix compile problem on cann 20.3

* fix ut

* fix test_mul

* fix check_finite_and_scale

* fix lookup_table_v2_grad

* fix cmake

* support print op

* [NPU] Support npu save load (#31893)

* support save load for NPU

* add save load npu unittest

* support np.array transform in NPU

* fix errors

* delete dygraph in unittest

* add Wait

* fix unittest

* fix review comment

* fix unittest problem

* fix little problem

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace

* refine code

* fix NPUDeviceContext in all c++ unittest (#32198)

* fix NPUDeviceContext in all c++ unittest

* refine log
Co-authored-by: Npangyoki <pangyoki@126.com>

* [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* change TensorFromVector to FillNpuTensorWithConstant

* fix ignored api

* delete extra unittest

* fix little error

* fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu

* change TensorCopySync to TensorCopy

* delete useless Wait and add StreamWait

* fix npu_stream error

* fix check_finite_and_unscale_op_npu TensorCopy

* only save stream wait

* fix NPUDeviceContext in all c++ unittest

* delete wait
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* delete useless unittest file (#32206)

* Fix op test (#32231)

* fix conditional block (#32243)

* fix adam bug again (#32246)

* fix compile

* fix ut

* fix ut
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

cbe5c9f8

09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

04 3月, 2021 1 次提交
- Q
  
  [ROCM] update fluid platform for rocm (part5), test=develop (#31315) · 4d647ec1
  由 Qi Li 提交于 3月 04, 2021
  
  4d647ec1
24 2月, 2021 1 次提交
- L
  Add cublas_handle() to expose cublas_handle to ops (#31157) · ae2be49f
  由 liu zhengxi 提交于 2月 24, 2021
```
* add get_cublas_handle() api

* update format

* add unittests

* alter function name
```
  ae2be49f
09 2月, 2021 1 次提交
- W
  update eigen version on Windows (#30573) · 9b3c80c8
  由 wuhuanzhou 提交于 2月 09, 2021
```
* update eigen version on Windows, test=develop

* add /bigobj for cl, test=develop
```
  9b3c80c8
08 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid platform for rocm39 (part3), test=develop (#30913) · 93c1d9e7
  由 Qi Li 提交于 2月 08, 2021
  
  93c1d9e7
04 2月, 2021 1 次提交
- W
  use iwyu clean include second time, test=develop (#30829) · 35c5b23f
  由 wanghuancoder 提交于 2月 04, 2021
```
* use iwyu clean include second time, test=develop
```
  35c5b23f
03 2月, 2021 1 次提交
- W
  
  【kunlun】dygraph supports multi xpu card training (#30671) · b1026f64
  由 WangXi 提交于 2月 03, 2021
  
  b1026f64
01 2月, 2021 1 次提交
- Q
  fix malloc L3 failed bug for kunlun (#30745) · c35a9880
  由 QingshuChen 提交于 2月 01, 2021
```
* fix malloc L3 failed bug for kunlun

* minor
```
  c35a9880
25 1月, 2021 1 次提交
- J
  
  [oneDNN] Cache oneDNN stream not to recreate in each oneDNN op (#30358) · 173660be
  由 Jacek Czaja 提交于 1月 25, 2021
  
  173660be
18 1月, 2021 2 次提交
- L
  
  [Kunlun]PR3: add xpu executor, multi xpu card train function optimization (#30317) · 843dc3cd
  由 liuyuhui 提交于 1月 18, 2021
  
  843dc3cd
- Q
  
  optimize batch_norm & pool op for kunlun (#30490) · 8489d4f7
  由 QingshuChen 提交于 1月 18, 2021
  
  8489d4f7
13 1月, 2021 1 次提交
- Q
  optimize memcpy perf for kunlun (#30291) · 2c1bba02
  由 QingshuChen 提交于 1月 13, 2021
```
* optimize memcpy perf for kunlun

* remove useless unitest for kunlun mean

* minor
```
  2c1bba02
11 1月, 2021 1 次提交
- A
  
  Add tf32 switch for cuDNN (#29192) · 924aac22
  由 AshburnLee 提交于 1月 11, 2021
  
  924aac22
23 12月, 2020 1 次提交
- J
  
  [oneDNN] Unit test for checking oneDNN caching (#29606) · c9e874fc
  由 Jacek Czaja 提交于 12月 23, 2020
  
  c9e874fc
16 12月, 2020 1 次提交
- L
  
  [Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337) · f13c3a9c
  由 liuyuhui 提交于 12月 16, 2020
  
  f13c3a9c
15 12月, 2020 1 次提交
- A
  
  Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732) · efea540c
  由 AshburnLee 提交于 12月 15, 2020
  
  efea540c
14 12月, 2020 1 次提交
- A
  
  Added verbose oneDNN lib version (#29378) · 62d44836
  由 arlesniak 提交于 12月 14, 2020
  
  62d44836
26 11月, 2020 1 次提交
- A
  
  Polish CUDA Information stdout (#29109) · 7ae3cb55
  由 Aurelius84 提交于 11月 26, 2020
  
  7ae3cb55
25 11月, 2020 1 次提交
- W
  remove eigen threadpool for the speed up · b2c8a007
  由 wawltor 提交于 11月 25, 2020
```
remove eigen threadpool for the speed up
```
  b2c8a007
29 9月, 2020 1 次提交
- 1
  test=develop, optimize geo communicator (#26857) · cc780b19
  由 123malin 提交于 9月 29, 2020
```
* test=develop, optimize geo communicator 
```
  cc780b19
17 9月, 2020 1 次提交
- J
  enhance reduce op which can reduce tensor with arbitrary rank · 63203c4a
  由 Jack Zhou 提交于 9月 17, 2020
```
enhance reduce op which can reduce tensor with arbitrary rank 
```
  63203c4a
21 8月, 2020 2 次提交

A
Add mechanism for blocking oneDNN cache clearing (#26502) · f3909020
由 Adam 提交于 8月 21, 2020
```
* Add mechanism for blocking oneDNN cache clearing

* Review changes and Add thread guards
```
f3909020

support Baidu Kunlun AI Accelerator (#25959) · 138ecf24

由 QingshuChen 提交于 8月 21, 2020

* support Baidu AI Accelerator
  * test=kunlun

* minor
 * test=kunlun

* support xpu op in separate file
 * test=kunlun

* update XPU error message and remove duplicated code

 * test=kunlun

* minor
 * test=kunlun

* minor
 * test=kunlun

138ecf24

15 7月, 2020 1 次提交
- G
  refine PADDLE_ENFORCE (#25456) · c10dcff1
  由 GaoWei8 提交于 7月 15, 2020
```
* Refine PADDLE_ENFORCE in paddle/fluid/platform
test=develop
```
  c10dcff1
07 7月, 2020 1 次提交
- G
  Refine PADDLE_ENFORCE (#25369) · ea7e5325
  由 GaoWei8 提交于 7月 07, 2020
```
* refine PADDLE_ENFORCE
test=develop
```
  ea7e5325
03 6月, 2020 1 次提交

Replace all errors thrown by LOG(FATAL) with PADDLE_THROW (#24759) · d1062d52

由 Chen Weihang 提交于 6月 03, 2020

* remove REPLACE_ENFORCE_GLOG compile option & add ci rule prohibit LOG(FATAL) using, test=develop

* remove ci test case, test=develop

* replace all LOG(FATAL) & polish message, test=develop

* fix typo, test=develop

* polish error info detail, test=develop

d1062d52

14 5月, 2020 1 次提交
- P
  Hide globals & redesign restore PR (#24279) · db2b6b65
  由 pawelpiotrowicz 提交于 5月 14, 2020
```
test=develop
```
  db2b6b65
11 5月, 2020 1 次提交

Add macro BOOST_GET to enrich the error information of boost :: get (#24175) · aa0f254f

由 Chen Weihang 提交于 5月 11, 2020

* add new macro BOOST_GET_SAFELY & unittests, test=develop

* add different macro type, test=develop

* fix get macro type in executor, test=develop

* four macro part change backup

* using one macro for all case, test=develop

* revert attribute change, test=develop

* change to three func to solve gcc4.8 bug, test=develop

* polish some details, test=develop

aa0f254f

28 4月, 2020 1 次提交
- S
  
  added reshape transpose matmul fuse pass (#23754) · e1a7a880
  由 Sylwester Fraczek 提交于 4月 28, 2020
  
  e1a7a880
24 4月, 2020 1 次提交

Add cholesky_op (#23543) · a8c0fb4e

由 Guo Sheng 提交于 4月 24, 2020

* Add cholesky_op forward part. test=develop

* Complete cholesky_op forward part. test=develop

* Add cholesky_op backward part. test=develop

* Complete cholesky_op backward part. test=develop

* Refine cholesky_op error check and docs. test=develop

* Add grad_check unit test for cholesky_op. test=develop

* Fix sample code in cholesky doc. test=develop

* Refine some error messages of cholesky_op. test=develop

* Refine some error messages of cholesky_op. test=develop

* Remove unused input in cholesky_grad. test=develop

* Remove unused input in cholesky_grad. test=develop

* Fix stream for cusolverDnSetStream. test=develop

* Update PADDLE_ENFORCE_CUDA_SUCCESS from cholesky_op to adapt to latest code.
test=develop

* Add CUSOLVER ERROR in enforce.h
test=develop

* Fix the missing return value in cholesky. test=develop

a8c0fb4e

23 4月, 2020 1 次提交
- 石
  
  declare the stream::Priority as enum class, test=develop (#24013) · 34d7d6ae
  由石晓伟提交于 4月 23, 2020
  
  34d7d6ae
18 4月, 2020 1 次提交

Update eigen (#23203) · b89dd86f

由 Zhang Ting 提交于 4月 18, 2020

* update eigen, test=develop

* remove patches, test=develop

* add definition of -fabi-version, test=develop

* add patch for TensorBlock.h, test=develop

* test windows, test=develop

* only update eigen for Linux, test=develop

* add code comments, test=develop

b89dd86f

17 4月, 2020 1 次提交

石

DeviceContext Split, test=develop (#23737) · 2d01cc85

由石晓伟提交于 4月 17, 2020

* supports thread-binding stream, test=develop

* avoid using thread_local variables in dtor, test=develop

* modify the stream priority enum, test=develop

2d01cc85

01 4月, 2020 1 次提交
- 石
  
  reverts the commit 23177, test=develop (#23363) · 5c59d213
  由石晓伟提交于 4月 01, 2020
  
  5c59d213
31 3月, 2020 1 次提交

fix nccl comm double free bug (#23344) · 0471476a

由 Yi Liu 提交于 3月 31, 2020

As nccl comm is not created by CUDADeviceContext, it should be destroyed by the creator as the best practice of RAII.

0471476a

30 3月, 2020 1 次提交
- 石
  
  supports thread-binding stream, test=develop (#23177) · 75ebb48a
  由石晓伟提交于 3月 30, 2020
  
  75ebb48a
05 2月, 2020 1 次提交

add WITH_NCCL option for cmake. (#22384) · 7bc4b095

由 Wilber 提交于 2月 05, 2020

cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL

添加了PADDLE_WITH_NCCL定义

单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

7bc4b095

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致