提交 · 7e18106a29d4ebcfdb921994901fe0e9f23dfb94 · BaiXuePrincess / Paddle

15 9月, 2021 1 次提交
- L
  add nvidia cusparse library, test=develop (#35675) · 7e18106a
  由 Liu-xiandong 提交于 9月 15, 2021
```
Put Nvidia's cusparse library into paddle.
```
  7e18106a
09 9月, 2021 1 次提交

Add matrix_rank Op and it's GPU and CPU kernel (#34823) · eb1fbf12

由 0x45f 提交于 9月 09, 2021

* init matrix_rank op, add matrix_rank CPU code and test

* add GPU kernel, remove svd_eigen.h

* add CPU kernel when tol is tensor

* add cpu and gpu code when tol is tensor

* fix CI-ROCM error

* add matrix_rank API describe, fix PR-CI-Py3 error

* fix PR-CI-Windows error, add matrix_rank API test

* delete useless comments

* fix review

* add my code in svd_helper.h

* update doc commets

* remove spaces

eb1fbf12

02 9月, 2021 1 次提交

Add SVD Op and it's GPU and CPU kernel (#34953) · 7e5fb462

由 xiongkun 提交于 9月 02, 2021

* Add SVD Op and it's GPU and CPU kernel

* Remove CUDAPlace in test_svd_op, make the test available in CPU package

* modfity the file

* fix windows bug/ fix ROCM / fix test timeout

* for pass the CIs

* improve error report

* for code review

* some modification to test_svd_op

* change python code style

* expose the svd interface for document

7e5fb462

13 8月, 2021 1 次提交

New Einsum API (#33821) · 8c8667f0

由 Tongxin Bai 提交于 8月 13, 2021

* OP dot: refactor CPU kernels and get better loop performance.

* Minor fix on code format.

* Fixed minor errors.

* Add new API: einsum

* Update the Einsum unit test.

One case failed with matmul_v2, where the dtype is int64:

a = np.arange(2 * 3 * 1).reshape(2, 3, 1)
b = np.arange(1)
paddle.einsum("...i, ...i", a, b)

* Test cases in test_einsum test floating point dtypes only.

As of now Paddle only supports float/double dtypes in matmul, which is
one of building blocks of this Einsum implementation. We decide not to
test einsum against other dtypes.

* Polish format.

* More formatting.

* Format...

* Einsum: improve test coverage.

* Einsum: bug fixes and more testcases for testing error messages

* Einsum: fix format..

* Einsum: fixed typo and format.

* Einsum: format again...

* Einsum: applied suggested changes.

* Einsum API: improve API documentation.

* Einsum API: apply suggested changes.

* Einsum API: Add dygraph only note.

* Einsum API: Add dygraph only note.

* Einsum API: fixed unittest.

8c8667f0

29 7月, 2021 1 次提交

add fix op run order pass (#34427) · 79e758c6

由 Zeng Jinle 提交于 7月 29, 2021

* add fix op run order pass

* add ut for fix_op_run_order

* fix ci error

* improve coverage

* improve coverge again and fix cpu test case

* follow some comments

79e758c6

07 7月, 2021 1 次提交
- F
  
  add no tensorrt warning (#33874) · 758dd7bb
  由 feng_shuai 提交于 7月 07, 2021
  
  758dd7bb
24 6月, 2021 1 次提交
- Z
  Modify the search order of dynamic library (#33722) · 6801b6e2
  由 Zhou Wei 提交于 6月 24, 2021
```
* Modify the search order of dynamic library

* Modify the search order of dynamic library
```
  6801b6e2
02 6月, 2021 1 次提交
- Q
  
  [ROCM] update paddle inference cmake, test=develop (#33260) · e7541209
  由 Qi Li 提交于 6月 02, 2021
  
  e7541209
07 5月, 2021 1 次提交
- L
  Fix compile error on jetson platform (#32748) · 8ce6b393
  由 LielinJiang 提交于 5月 07, 2021
```
* fix compile error on jetson platform
```
  8ce6b393
06 5月, 2021 1 次提交

[ROCM] bugfix for unittest (#32392) · 31392627

由 ronnywang 提交于 5月 06, 2021

* fix test_unpool_op

* fix test_inplace_addto_strategy

* fix test_conv2d_fusion_op

* fix test_imperative_lod_tensor_to_selected_rows, test_imperative_selected_rows_to_lod_tensor

* fix test_dot_op

* fix test_correlation_op

* fix tracer

* fix test_memcpy_op

31392627

29 4月, 2021 1 次提交
- L
  Add op read_file and decode_jpeg (#32564) · b22f6d69
  由 LielinJiang 提交于 4月 29, 2021
```
* add op read_file and decode_jpeg
```
  b22f6d69
25 4月, 2021 1 次提交
- P
  [Paddle-TRT] Add trt runtime version check (#32443) · b0556764
  由 Pei Yang 提交于 4月 25, 2021
```
* add trt runtime version check

* use different wrap, and change to major version check
```
  b0556764
21 4月, 2021 1 次提交

【NPU】Merge NPU ccl code (#32381) · c3158527

由 zhang wenhui 提交于 4月 21, 2021

* add allreduce and broadcast without test (#31024)

add allreduce and broadcast without test

* Refactor HCCLCommContext to be compatible with Paddle (#31359)

Refactor HCCLCommContext to be compatible with Paddle (#31359)

* [NPU] add npu kernel for communication op (#31437)

* add allreduce and broadcast without test

* add c_broadcast_test case

* build c_comm_init and c_create_group operators

* make the whole thing compile

* add broadcast and init op test case but run failed

* make unit test compile

* fix broadcast test bug and change into hcom for ccl

* change c_comm_init and c_create_group ops accordingly

* make tests compile

* transfer code to 27

* compiled successfully in 28, but run failed

* test broadcast in 28, but failed

* make hcom primitives work

* change hccl data type for base.h

* fix broadcast bug

* make attributes work

* fix group name bug

* add allreduce but test failed

* allreduce bug for qiuliang

* allreduce finished

* add allgather and reducescatter

* merge all op code

* add allgather test

* finish run all ccl op test exclude send/recv

* all all op and test exclude send/recv

* send_v2_npu.cc recv_v2_npiu.cc compiled

* fix ccl core dump bug and test allgather, reducescatter, broadcast op

* fix allreduce bug just for test

* hcom send&recv test pass, without hcom_destroy

* for qiuliang test

* Ascend Send&Recv Test Pass

* all op (ex send/recv) ok

* fix bug

* merge all ccl op

* style merge to PaddlePaddle

* merge style

* new merge style

* merge style 2

* insert an empty at the end

* disable ctest for hcom to pass ci
Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>

* Add auto-increasing tag id for Hcom OPs (#31702)

* add c_reduce_sum op (#31793)

add c_reduce_sum op

* update Ascendrc hccl to 20.3 (#32126)

update Ascendrc hccl to 20.3 (#32126)

* fix merge code

* change cmake.txt1

* [NPU] Support npu kernel for c sync stream op (#31386)

* sync stream npu op

* add with_ascend_acl

* update c++ unittest

* compile all failed

* try to pre commit

* after pre commit

* merge&compile&test hccl successfully!

* fix code style

* fix code style

* fix bugs about hccl

* fix some bugs

* fix code style

* fix style

* fix style

* fix

* fixed

* merge develop
Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>
Co-authored-by: Nxiayanming <41795079@qq.com>

c3158527

15 4月, 2021 1 次提交

[ROCM] bugfix for unit tests (#32258) · 90133d24

由 furnace 提交于 4月 15, 2021

* [ROCM] bugfix for test_conv_transpose_nn_grad

* [ROCM] bugfix for test_batch_norm_op_v2

* [ROCM] bugfix for test_empty_like_op

* [ROCM] bugfix for test_conv_transpose_nn_grad

90133d24

09 4月, 2021 2 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

A
[CustomOp]Support MacOS platform and Remove libpaddle_custom_op.so dependency (#31976) · d815fbf9
由 Aurelius84 提交于 4月 09, 2021
```
* Remove old custom OP to reduce whl package volume

* [Custom OP]Remove old custom OP to reduce whl package volume

* support macos
```
d815fbf9

06 4月, 2021 1 次提交
- T
  
  Del cudnn6 code2 (#31986) · b8b82b72
  由 tianshuo78520a 提交于 4月 06, 2021
  
  b8b82b72
02 4月, 2021 1 次提交
- R
  
  [ROCM] fix softmax_with_cross_entropy_op (#31982) · 9e06a641
  由 ronnywang 提交于 4月 02, 2021
  
  9e06a641
19 3月, 2021 1 次提交
- R
  
  [ROCM] fix test_rnn_op (#31735) · c9e1d9dc
  由 ronnywang 提交于 3月 19, 2021
  
  c9e1d9dc
22 2月, 2021 2 次提交
- Q
  
  [ROCM] update fluid platform for rocm39 (part4), test=develop (#30936) · 33429630
  由 Qi Li 提交于 2月 22, 2021
  
  33429630
- Z
  [2.0Custom OP]Support New Custom OP on Windows (#31063) · adaec007
  由 Zhou Wei 提交于 2月 22, 2021
```
* [2.0.1]Support New Custom OP on windows

* fix CI

* fix code style

* fix CI

* fix CI

* fix coverage

* fix CI

* fix CI
```
  adaec007
04 2月, 2021 1 次提交
- W
  use iwyu clean include second time, test=develop (#30829) · 35c5b23f
  由 wanghuancoder 提交于 2月 04, 2021
```
* use iwyu clean include second time, test=develop
```
  35c5b23f
28 1月, 2021 1 次提交
- Q
  [ROCM] update fluid platform for rocm35 (part1), test=develop (#30639) · f89da4ab
  由 Qi Li 提交于 1月 28, 2021
```
* [ROCM] update fluid platform for rocm35 (part1), test=develop

* address review comments, test=develop
```
  f89da4ab
20 1月, 2021 1 次提交

use nvtx push pop in timeline (#30567) · 90773473

由 wanghuancoder 提交于 1月 20, 2021

* delete empty line of pybing.cc, test=develop

* use nvtx push pop in timeline, test=develop

* change year, test=develop

* add #ifdef PADDLE_WITH_CUDA, test=develop

* add #ifndef WIN32, test=develop

* is_pushed to is_pushed_, test=develop

90773473

06 1月, 2021 1 次提交
- Z
  Polish and Optimize the print/repr information of Layer (#29998) · 30888ca3
  由 Zhou Wei 提交于 1月 06, 2021
```
* Polish and Optimize the print/repr message of all layer

* fix some code format
```
  30888ca3
25 12月, 2020 1 次提交

[Complex] Add support for complex grad accumulated (#29889) · 1a304e6c

由 Chen Weihang 提交于 12月 25, 2020

* add support for complex grad accumulated

* add unittest for coverage

* update test dtype

* remove useless blank line

1a304e6c

16 12月, 2020 1 次提交

添加rocm平台支持代码 (#29342) · 76738504

由 Y_Xuan 提交于 12月 16, 2020

* 添加rocm平台支持代码

* 修改一些问题

* 修改一些歧义并添加备注

* 修改代码格式

* 解决冲突后的代码修改

* 修改operators.cmake

* 修改格式

* 修正错误

* 统一接口

* 修改日期

76738504

01 12月, 2020 1 次提交

add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199) · 8f45d142

由 chentianyu03 提交于 12月 01, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

8f45d142

27 11月, 2020 1 次提交
- Z
  
  fix CUDA 11 error on windows (#29101) · e668cb07
  由 Zhou Wei 提交于 11月 27, 2020
  
  e668cb07
23 11月, 2020 1 次提交
- P
  change avg pooling and global pooling to trt layer in dynamic shape mode (#28702) · 994673bf
  由 Pei Yang 提交于 11月 23, 2020
```
* change avg pooling and global pooling to trt layer

* add support for static shape global pooling

* modify trt errmsg
```
  994673bf
17 11月, 2020 1 次提交
- L
  
  bug fix, test=develop (#28674) · 80d20246
  由 lilong12 提交于 11月 17, 2020
  
  80d20246
03 11月, 2020 1 次提交

TensorRT中ernie模型推理性能优化，支持变长输入 (#28367) · ea851796

由 Shang Zhizhou 提交于 11月 03, 2020

* fp16 result ok

* change -DWITH_NVINFER_PLUGIN toconfig.EnableTensorRtOSS

* auto detect special slice op converter for ernie with trt oss

* ernie oss only support fp16

* fix special_slice_plugin serialize bug

* matmul in tensorrt ok

* ernie unittest ok

* add matmul tensorrt unittest

* remove demo code

ea851796

21 10月, 2020 1 次提交
- Z
  
  fix dynamic_loader more safe and error message on windows (#28117) · 5d700021
  由 Zhou Wei 提交于 10月 21, 2020
  
  5d700021
19 10月, 2020 1 次提交
- P
  
  reduce trt warning message (#28011) · a0b2f936
  由 Pei Yang 提交于 10月 19, 2020
  
  a0b2f936
14 10月, 2020 1 次提交
- Z
  tune backward filter algorithm for float16 (#27529) · d5cc144c
  由 Zhang Ting 提交于 10月 14, 2020
```
* use exhaustive_search for float16

* tune algo only when dtype is float16
```
  d5cc144c
28 9月, 2020 1 次提交
- L
  add ncclSend and ncclRecv (#27621) · 5218b7af
  由 lilong12 提交于 9月 28, 2020
```
* include ncclRecv and ncclSend, test=develop
```
  5218b7af
27 9月, 2020 1 次提交

add support to float64 input of warpctc op. (#27399) · 1501a80f

由 Li Fuchen 提交于 9月 27, 2020

* add float64 input to ctc_loss

* modified error message of  warpctc

* update repo and tag of warpctc

* add test for warpctc with float64 input

* modified warpctc.cmake to make sure build always

* resolved sample code bug of warpctc

* add core.ops in warpctc dygraph

* fix a bug of test

1501a80f

24 9月, 2020 2 次提交

S
fix tensorrt 6 build error. test=develop (#27511) · 8f7bb52b
由 Shibo Tao 提交于 9月 24, 2020
```
* fix tensorrt 6 build error. test=develop

* fix. test=develop

* bug fix

* test=develop
```
8f7bb52b

use iwyu clean include (#27267) · df43905f

由 wanghuancoder 提交于 9月 24, 2020

* use iwyu clean include, test=develop, test=win

* compilation error, test=develop

* fix compilation error2, test=develop

* fix compilation error3, test=develop

* fix compilation error4, test=develop

* fix compilation error5, test=develop

* fix compilation error6, test=develop

* fix compilation error7, test=develop

* fix compilation error8, test=develop

* fix compilation error8, test=develop

* fix compilation error10, test=develop

* fix compilation error11, test=develop

df43905f

23 9月, 2020 1 次提交
- S
  [bug fix]:Memory increases after adapting the cudnn version to cudnn8 (#27436) · c17f9cf2
  由 Shang Zhizhou 提交于 9月 23, 2020
```
* [bug fix]:Memory increases after adapting the cudnn version to 8

* [bug fix]cudnnGetConvolutionForwardAlgorithm not defined
```
  c17f9cf2

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致