提交 · ac89174e5ad69d84d4713ef4b2f01f737088be4e · PaddlePaddle / Paddle

26 3月, 2021 1 次提交

[NPU] support GarbageCollector for npu (#31874) · ac89174e

由 Leo Chen 提交于 3月 26, 2021

* support GarbageCollector for npu

* fix typo

* fix gather_grad

* disable NPUDefaultStreamGarbageCollector on NPU

ac89174e

02 3月, 2021 1 次提交
- V
  Refactor HCCLCommContext to be compatible with Paddle (#31359) · 45765d6e
  由 Void Main 提交于 3月 02, 2021
```
Refactor HCCLCommContext to be compatible with Paddle (#31359)
```
  45765d6e
09 2月, 2021 3 次提交

[feature] support npu allocator, part 2 (#30972) · 1201cd2e

由 Leo Chen 提交于 2月 09, 2021

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

1201cd2e

L
[feature] support npu operator (#30951) · 7e049108
由 Leo Chen 提交于 2月 09, 2021
```
[feature] support npu operator
```
7e049108
L
[feature] support npu allocator (#30840) · 81138239
由 Leo Chen 提交于 2月 09, 2021
```
[feature] support npu allocator
```
81138239

11 1月, 2021 1 次提交
- A
  
  Add tf32 switch for cuDNN (#29192) · 924aac22
  由 AshburnLee 提交于 1月 11, 2021
  
  924aac22
28 12月, 2020 1 次提交
- L
  
  [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926) · 3d1741b7
  由 liuyuhui 提交于 12月 28, 2020
  
  3d1741b7
26 12月, 2020 1 次提交
- L
  
  [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574) · 4427df37
  由 liuyuhui 提交于 12月 26, 2020
  
  4427df37
23 12月, 2020 1 次提交
- J
  
  [oneDNN] Unit test for checking oneDNN caching (#29606) · c9e874fc
  由 Jacek Czaja 提交于 12月 23, 2020
  
  c9e874fc
15 12月, 2020 1 次提交
- A
  
  Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732) · efea540c
  由 AshburnLee 提交于 12月 15, 2020
  
  efea540c
14 12月, 2020 2 次提交
- A
  
  Added verbose oneDNN lib version (#29378) · 62d44836
  由 arlesniak 提交于 12月 14, 2020
  
  62d44836
- J
  
  [oneDNN] Making ThreadID info in caching key optional (#29272) · f6cca625
  由 Jacek Czaja 提交于 12月 14, 2020
  
  f6cca625
25 11月, 2020 1 次提交
- W
  remove eigen threadpool for the speed up · b2c8a007
  由 wawltor 提交于 11月 25, 2020
```
remove eigen threadpool for the speed up
```
  b2c8a007
23 11月, 2020 1 次提交
- J
  
  extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758) · bd1d6d3b
  由 Jacek Czaja 提交于 11月 23, 2020
  
  bd1d6d3b
02 11月, 2020 1 次提交

Retry CUDA Initialization to Fix Random Failure, test=develop (#28323) · acc11c2a

由 Huihuang Zheng 提交于 11月 02, 2020

This PR is follow up of #28213. On that PR we tried to decrease GPU usage, however the CI still randomly failed. So I added retry logic for the initialization of nccl and cusolver. If the initialization failed, we can retry to avoid the random failure.

acc11c2a

24 9月, 2020 1 次提交

use iwyu clean include (#27267) · df43905f

由 wanghuancoder 提交于 9月 24, 2020

* use iwyu clean include, test=develop, test=win

* compilation error, test=develop

* fix compilation error2, test=develop

* fix compilation error3, test=develop

* fix compilation error4, test=develop

* fix compilation error5, test=develop

* fix compilation error6, test=develop

* fix compilation error7, test=develop

* fix compilation error8, test=develop

* fix compilation error8, test=develop

* fix compilation error10, test=develop

* fix compilation error11, test=develop

df43905f

17 9月, 2020 1 次提交
- J
  enhance reduce op which can reduce tensor with arbitrary rank · 63203c4a
  由 Jack Zhou 提交于 9月 17, 2020
```
enhance reduce op which can reduce tensor with arbitrary rank 
```
  63203c4a
21 8月, 2020 2 次提交

A
Add mechanism for blocking oneDNN cache clearing (#26502) · f3909020
由 Adam 提交于 8月 21, 2020
```
* Add mechanism for blocking oneDNN cache clearing

* Review changes and Add thread guards
```
f3909020

support Baidu Kunlun AI Accelerator (#25959) · 138ecf24

由 QingshuChen 提交于 8月 21, 2020

* support Baidu AI Accelerator
  * test=kunlun

* minor
 * test=kunlun

* support xpu op in separate file
 * test=kunlun

* update XPU error message and remove duplicated code

 * test=kunlun

* minor
 * test=kunlun

* minor
 * test=kunlun

138ecf24

03 7月, 2020 1 次提交
- G
  fix PADDLE_ENFORCE (#25297) · fb70682f
  由 GaoWei8 提交于 7月 03, 2020
```
* fix PADDLE_ENFORCE and refine the description
test=develop
```
  fb70682f
14 5月, 2020 1 次提交
- P
  Hide globals & redesign restore PR (#24279) · db2b6b65
  由 pawelpiotrowicz 提交于 5月 14, 2020
```
test=develop
```
  db2b6b65
24 4月, 2020 1 次提交

Add cholesky_op (#23543) · a8c0fb4e

由 Guo Sheng 提交于 4月 24, 2020

* Add cholesky_op forward part. test=develop

* Complete cholesky_op forward part. test=develop

* Add cholesky_op backward part. test=develop

* Complete cholesky_op backward part. test=develop

* Refine cholesky_op error check and docs. test=develop

* Add grad_check unit test for cholesky_op. test=develop

* Fix sample code in cholesky doc. test=develop

* Refine some error messages of cholesky_op. test=develop

* Refine some error messages of cholesky_op. test=develop

* Remove unused input in cholesky_grad. test=develop

* Remove unused input in cholesky_grad. test=develop

* Fix stream for cusolverDnSetStream. test=develop

* Update PADDLE_ENFORCE_CUDA_SUCCESS from cholesky_op to adapt to latest code.
test=develop

* Add CUSOLVER ERROR in enforce.h
test=develop

* Fix the missing return value in cholesky. test=develop

a8c0fb4e

23 4月, 2020 1 次提交
- 石
  
  declare the stream::Priority as enum class, test=develop (#24013) · 34d7d6ae
  由石晓伟提交于 4月 23, 2020
  
  34d7d6ae
20 4月, 2020 1 次提交

Optimize the error messages of paddle CUDA API (#23816) · 78170037

由 Zhou Wei 提交于 4月 20, 2020

* Optimize the error messages of paddle CUDA API, test=develop

* fix the error messages of paddle CUDA API, test=develop

* Refactoring PADDLE_ENFORCE_CUDA_SUCCESS, and apply to curand/cudnn/cublas/NCCL,test=develop

* remove build_ex_string,test=develop

* merge conflict,test=develop

78170037

17 4月, 2020 1 次提交

石

DeviceContext Split, test=develop (#23737) · 2d01cc85

由石晓伟提交于 4月 17, 2020

* supports thread-binding stream, test=develop

* avoid using thread_local variables in dtor, test=develop

* modify the stream priority enum, test=develop

2d01cc85

01 4月, 2020 1 次提交
- 石
  
  reverts the commit 23177, test=develop (#23363) · 5c59d213
  由石晓伟提交于 4月 01, 2020
  
  5c59d213
30 3月, 2020 1 次提交
- 石
  
  supports thread-binding stream, test=develop (#23177) · 75ebb48a
  由石晓伟提交于 3月 30, 2020
  
  75ebb48a
05 2月, 2020 1 次提交

add WITH_NCCL option for cmake. (#22384) · 7bc4b095

由 Wilber 提交于 2月 05, 2020

cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL

添加了PADDLE_WITH_NCCL定义

单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

7bc4b095

08 1月, 2020 1 次提交

Refine stack op to improve xlnet performance, test=develop (#22142) · 3d4f2aa6

由 zhaoyuchen2018 提交于 1月 08, 2020

stack's wait cost a lot of cpu time, use cuda kernel to do memory copy
will reduce cpu time.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

3d4f2aa6

29 11月, 2019 1 次提交
- J
  
  [MKL-DNN] LRN and Pool2d (FWD) NHWC support (#21375) · cd43c444
  由 Jacek Czaja 提交于 11月 29, 2019
  
  cd43c444
18 11月, 2019 1 次提交

Fix warn of gcc8 (#21205) · cdb3d279

由 Zeng Jinle 提交于 11月 18, 2019

* fix warnings oof gcc 8 compilation, test=develop

* fix boost::bad_get, test=develop

* refine PADDLE_ENFORCE, test=develop

cdb3d279

14 11月, 2019 1 次提交

Improve topk performance. (#21087) · b93870e6

由 zhaoyuchen2018 提交于 11月 13, 2019

* Improve topk performance.

give 200000 data to compute topk,
before opt: cost 1s
after opt: cost 0.0028s.

* Refine return value.
* Add cuda util funtions.
* Fix ComputeBlockSize bug & refine comments.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

b93870e6

28 9月, 2019 1 次提交

Enable users to create custom cpp op outside framework. (#19256) · 1a3eef02

由 qingqing01 提交于 9月 28, 2019

* How to write custom op needs to follow framework OP spec.
* Package fluid_framework.so and headers into whl.
* Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir.
* Export some C-APIs to merge OpInfo between core.so and custom_op.so.
* Add unit testing.
* Update API.spec.

1a3eef02

24 9月, 2019 1 次提交
- Z
  
  fix cuda dev_ctx allocator cmake deps, test=develop (#19953) · 37f76407
  由 Zeng Jinle 提交于 9月 24, 2019
  
  37f76407
22 9月, 2019 1 次提交

Add lock to cudnn handle calls (#19845) · c7f36e7c

由 Zeng Jinle 提交于 9月 22, 2019

* refine reallocate of workspace size, test=develop

* add lock to cudnn handle calls, test=develop

c7f36e7c

18 9月, 2019 1 次提交
- Z
  
  refine reallocate of workspace size, test=develop (#19843) · 5eb381a3
  由 Zeng Jinle 提交于 9月 18, 2019
  
  5eb381a3
11 9月, 2019 1 次提交

Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320

由 Huihuang Zheng 提交于 9月 11, 2019

TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.

We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.

Also added data_feed_proto to operator to fix CI in CPU compilation

12542320

03 9月, 2019 1 次提交
- T
  refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603) · 75d15719
  由 Tao Luo 提交于 9月 03, 2019
```
test=develop
```
  75d15719
08 7月, 2019 1 次提交

add mkldnn shapeblob cache clear strategy (#18513) · fe32879d

由 Tao Luo 提交于 7月 08, 2019

* add mkldnn shapeblob cache clear strategy

test=develop

* refine with comments

test=develop

* make cache clear strategy more safey

test=develop

* add lock for GetShapeBlobSize

test=develop

fe32879d

03 7月, 2019 1 次提交
- T
  add shape_blob for cache mkldnn primitive (#18454) · 3f3112ce
  由 Tao Luo 提交于 7月 03, 2019
```
test=develop
```
  3f3112ce

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功