提交 · 16590799e10894ed1b2a39cb6d25c43374fea4e6 · PaddlePaddle / Paddle

24 11月, 2021 1 次提交

[Paddle-Inference] Matmul_int8_convert: tensor*tensor (#37285) · 16590799

由 Wangzheee 提交于 11月 24, 2021

* matmul_convert_int8

* matmul_convert_int8

* matmulconvert_int8

* Matmul_int8_convert: tensor*tensor

* Matmul_int8_convert: tensor*tensor

* Matmul_int8_convert: tensor*tensor

16590799

08 11月, 2021 1 次提交

Use cuda virtual memory management and merge blocks (#36189) · a1ec1d5a

由 wanghuancoder 提交于 11月 08, 2021

* Use cuda virtual memory management and merge blocks, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* window dll, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* use autogrowthv2 for system allocator, test=develop

* remove ~CUDAVirtualMemAllocator(), test=develop

* refine, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* fix bug, test=develop

* revert system allocator, test =develop

* revert multiprocessing, test=develop

* fix AutoGrowthBestFitAllocatorV2 mutxt, test=develop

* catch cudaErrorInitializationError when create allocator, test=develop

* fix cuMemSetAccess use, test=develop

* refine cuda api use, test=develop

* refine, test=develop

* for test, test=develop

* for test, test=develop

* switch to v2, test=develop

* refine virtual allocator, test=develop

* Record cuMemCreate and cuMemRelease, test=develop

* refine, test=develop

* avoid out of bounds, test=develop

* rename allocator, test=develop

* refine, test=develop

* use PADDLE_ENFORCE_CUDA_SUCCESS, test=develop

* for test,test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

a1ec1d5a

19 10月, 2021 1 次提交
- X
  
  add rocm support for fft api (#36415) · 1d5746bd
  由 Xiaoxu Chen 提交于 10月 19, 2021
  
  1d5746bd
15 10月, 2021 1 次提交
- F
  
  dynamic load mkl as a fft backend when it is avaialble and requested (#36414) · f45e6cf6
  由 Feiyu Chan 提交于 10月 15, 2021
  
  f45e6cf6
22 9月, 2021 1 次提交
- [2.2]support extern third_party lapack API on Linux/Windows/Mac (#35690) · ae65257d
  由 zhouweiwei2014 提交于 9月 22, 2021
```
* support extern third_party lapack on Linux/Windows/Mac

* fix ci
```
  ae65257d
18 9月, 2021 1 次提交

由 Feiyu Chan 提交于 9月 18, 2021

* 1. add interface for fft;
2. add data type predicate;
3. fix paddle.roll.

* add fft c2c cufft kernel

* implement argument checking & op calling parts for fft_c2c and fftn_c2c

* add operator and opmaker definitions

* only register float and double for cpu.

* add common code for implementing FFT, add pocketfft as a dependency

* add fft c2c cufft kernel function

* fix bugs in python interface

* add support for c2r, r2c operators, op makers, kernels and kernel functors.

* test and fix bugs

* 1. fft_c2c function: add support for onesided=False;
2. add complex<float>, complex<double> support for concat and flip.

* 1. fft: fix python api bugs;
2. shape_op: add support for complex data types.

* fft c2c cufft kernel done with complie and link

* fix shape_op, add mkl placeholder

* remove mkl

* complete fft c2c in gpu

* 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft;
2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation.

* complete fft c2c on gpu in ND

* complete fft c2c on gpu in ND

* complete fft c2c backward in ND

* fix MKL-based implementation

* Add frame op and CPU/GPU kernels.

* Add frame op forward unittest.

* Add frame op forward unittest.

* Remove axis parameter in FrameFunctor.

* Add frame op grad CPU/GPU kernels and unittest.

* Add frame op grad CPU/GPU kernels and unittest.

* Update doc string.

* Update after review and remove librosa requirement in unittest.

* Update grad kernel.

* add fft_c2r op

* Remove data allocation in TransCompute function.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* last fft c2r functor

* fix C2R and R2C for cufft, becase the direction is not an option in these cases.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* fix bugs in python APIs

* fix fft_c2r grad kernal

* fix bugs in python APIs

* add cuda fft c2r grad kernal functor

* clean code

* fix fft_c2r python API

* fill fft r2c result with conjugate symmetry (#19)

fill fft r2c result with conjugate symmetry

* add placeholder for unittests (#24)

* simple parameterize test function by auto generate test case from parm list (#25)

* miscellaneous fixes for python APIs (#26)

* add placeholder for unittests

* resize fft inputs before computation is n or s is provided.

* add complex kernels for pad and pad_grad

* simplify argument checking.

* add type promotion

* add int to float or complex promotion

* fix output data type for static mode

* fix fft's input dtype dispatch, import fft to paddle

* fix typos in axes checking (#27)

* fix typos in axes checking

* fix argument checking (#28)

* fix argument checking

* Add C2R Python layer normal and abnormal use cases (#29)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30)

* Documentation of the common interfaces of c2r and c2c (#31)

* Documentation of the common interfaces of c2r and c2c

* clean c++ code  (#32)

* clean code

* Add numpy-based implementation of spectral ops (#33)

* add numpy reference implementation of spectral ops

* Add fft_c2r numpy based implementation for unittest. (#34)

* add fft_c2r numpy implementation

* Add deframe op and stft/istft api. (#23)

* Add frame api

* Add deframe op and kernels.

* Add stft and istft apis.

* Add deframe api. Update stft and istft apis.

* Fix bug in frame_from_librosa function when input dims >= 3

* Rename deframe to overlap_add.

* Update istft.

* Update after code review.

* Add overlap_add op and stft/istft api unittest (#35)

* Add overlap_add op unittest.

* Register complex kernels of squeeze/unsquuze op.

* Add stft/istft api unittest.

* Add unittest for fft helper functions (#36)

* add unittests for fft helper functions. add complex kernel for roll op.

* complete static graph unittest for all public api (#37)

* Unittest of op with FFT C2C, C2R and r2c added (#38)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* Documentation of the common interfaces of c2r and c2c

* Unittest of op with FFT C2C, C2R and r2c added
Co-authored-by: lijiaqi <lijiaqi0612@163.com>

* add fft related options to CMakeLists.txt

* fix typos and clean code (#39)

* fix invisible character in mkl branch and fix error in error message

* clean code: remove docstring from unittest for signal.py.

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40)

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.

* fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41)

1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r;
3. fix unittest to catch UnImplementedError and RuntimeError;
4. fix compile error by avoid using thrust when cuda is not available.
5.  fix sample code, use paddle.fft instead of paddle.tensor.fft

* remove inclusion of thrust, add __all__ list for fft (#42)

* Add api doc and update unittest. (#43)

* Add doc strings.
* Update overlap_add op unittest

* fix MKL-based FFT implementation (#44)

* fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R

* remove code for debug (#45)

* use dynload for cufft (#46)

* use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms.

* add complex support for fill_zeros_like

* use dynload for cufft

* Update doc and unittest. (#47)

* Add doc of frame op and overlap_add op.

* Update unittest.

* use dynload for cufft (#48)

1. use dynload for cufft
2. fix unittest;
3. temporarily disable Rocm.

* fix conflicts and merge upstream (#49)

fix conflicts and merge upstream

* fix compile error: only link dyload_cuda when cuda is available (#50)

* fix compile error: only link dyload_cuda when cuda is available

* fix dynload for cufft on windows (#51)

1. fix dynload for cufft on windows;
2. fix unittests.

* add NOMINMAX to compile on windows (#52)

 add NOMINMAX to compile on windows

* explicitly specify capture mode for lambdas (#55)

 explicitly specify capture mode for lambdas

* fix fft sample (#53)

* fix fft sample

* update scipy and numpy version for unittests of fft (#56)

update scipy and numpy version for unittests of fft

* Add static graph unittests of frame and overlap_add api. (#57)

* Remove cache of cuFFT & Disable ONEMKL (#59)

1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm
2. remove cache of cufft plans;
3. enhance error checking.
4. default WITH_ONEMKL to OFF
Co-authored-by: Njeff41404 <jeff41404@gmail.com>
Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>
Co-authored-by: NKP <109694228@qq.com>
Co-authored-by: lijiaqi <lijiaqi0612@163.com>
Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>
Co-authored-by: Nlijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>

11518a43

15 9月, 2021 1 次提交
- L
  add nvidia cusparse library, test=develop (#35675) · 7e18106a
  由 Liu-xiandong 提交于 9月 15, 2021
```
Put Nvidia's cusparse library into paddle.
```
  7e18106a
07 5月, 2021 1 次提交
- L
  Fix compile error on jetson platform (#32748) · 8ce6b393
  由 LielinJiang 提交于 5月 07, 2021
```
* fix compile error on jetson platform
```
  8ce6b393
29 4月, 2021 1 次提交
- L
  Add op read_file and decode_jpeg (#32564) · b22f6d69
  由 LielinJiang 提交于 4月 29, 2021
```
* add op read_file and decode_jpeg
```
  b22f6d69
21 4月, 2021 1 次提交

【NPU】Merge NPU ccl code (#32381) · c3158527

由 zhang wenhui 提交于 4月 21, 2021

* add allreduce and broadcast without test (#31024)

add allreduce and broadcast without test

* Refactor HCCLCommContext to be compatible with Paddle (#31359)

Refactor HCCLCommContext to be compatible with Paddle (#31359)

* [NPU] add npu kernel for communication op (#31437)

* add allreduce and broadcast without test

* add c_broadcast_test case

* build c_comm_init and c_create_group operators

* make the whole thing compile

* add broadcast and init op test case but run failed

* make unit test compile

* fix broadcast test bug and change into hcom for ccl

* change c_comm_init and c_create_group ops accordingly

* make tests compile

* transfer code to 27

* compiled successfully in 28, but run failed

* test broadcast in 28, but failed

* make hcom primitives work

* change hccl data type for base.h

* fix broadcast bug

* make attributes work

* fix group name bug

* add allreduce but test failed

* allreduce bug for qiuliang

* allreduce finished

* add allgather and reducescatter

* merge all op code

* add allgather test

* finish run all ccl op test exclude send/recv

* all all op and test exclude send/recv

* send_v2_npu.cc recv_v2_npiu.cc compiled

* fix ccl core dump bug and test allgather, reducescatter, broadcast op

* fix allreduce bug just for test

* hcom send&recv test pass, without hcom_destroy

* for qiuliang test

* Ascend Send&Recv Test Pass

* all op (ex send/recv) ok

* fix bug

* merge all ccl op

* style merge to PaddlePaddle

* merge style

* new merge style

* merge style 2

* insert an empty at the end

* disable ctest for hcom to pass ci
Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>

* Add auto-increasing tag id for Hcom OPs (#31702)

* add c_reduce_sum op (#31793)

add c_reduce_sum op

* update Ascendrc hccl to 20.3 (#32126)

update Ascendrc hccl to 20.3 (#32126)

* fix merge code

* change cmake.txt1

* [NPU] Support npu kernel for c sync stream op (#31386)

* sync stream npu op

* add with_ascend_acl

* update c++ unittest

* compile all failed

* try to pre commit

* after pre commit

* merge&compile&test hccl successfully!

* fix code style

* fix code style

* fix bugs about hccl

* fix some bugs

* fix code style

* fix style

* fix style

* fix

* fixed

* merge develop
Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>
Co-authored-by: Nxiayanming <41795079@qq.com>

c3158527

28 1月, 2021 1 次提交
- Q
  [ROCM] update fluid platform for rocm35 (part1), test=develop (#30639) · f89da4ab
  由 Qi Li 提交于 1月 28, 2021
```
* [ROCM] update fluid platform for rocm35 (part1), test=develop

* address review comments, test=develop
```
  f89da4ab
20 1月, 2021 1 次提交

use nvtx push pop in timeline (#30567) · 90773473

由 wanghuancoder 提交于 1月 20, 2021

* delete empty line of pybing.cc, test=develop

* use nvtx push pop in timeline, test=develop

* change year, test=develop

* add #ifdef PADDLE_WITH_CUDA, test=develop

* add #ifndef WIN32, test=develop

* is_pushed to is_pushed_, test=develop

90773473

16 12月, 2020 1 次提交

添加rocm平台支持代码 (#29342) · 76738504

由 Y_Xuan 提交于 12月 16, 2020

* 添加rocm平台支持代码

* 修改一些问题

* 修改一些歧义并添加备注

* 修改代码格式

* 解决冲突后的代码修改

* 修改operators.cmake

* 修改格式

* 修正错误

* 统一接口

* 修改日期

76738504

24 4月, 2020 1 次提交

Add cholesky_op (#23543) · a8c0fb4e

由 Guo Sheng 提交于 4月 24, 2020

* Add cholesky_op forward part. test=develop

* Complete cholesky_op forward part. test=develop

* Add cholesky_op backward part. test=develop

* Complete cholesky_op backward part. test=develop

* Refine cholesky_op error check and docs. test=develop

* Add grad_check unit test for cholesky_op. test=develop

* Fix sample code in cholesky doc. test=develop

* Refine some error messages of cholesky_op. test=develop

* Refine some error messages of cholesky_op. test=develop

* Remove unused input in cholesky_grad. test=develop

* Remove unused input in cholesky_grad. test=develop

* Fix stream for cusolverDnSetStream. test=develop

* Update PADDLE_ENFORCE_CUDA_SUCCESS from cholesky_op to adapt to latest code.
test=develop

* Add CUSOLVER ERROR in enforce.h
test=develop

* Fix the missing return value in cholesky. test=develop

a8c0fb4e

05 2月, 2020 1 次提交

add WITH_NCCL option for cmake. (#22384) · 7bc4b095

由 Wilber 提交于 2月 05, 2020

cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL

添加了PADDLE_WITH_NCCL定义

单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

7bc4b095

05 9月, 2019 1 次提交

Integrate NVRTC to support compiling CUDA kernel at runtime (#19422) · 42b5bec6

由 Yiqun Liu 提交于 9月 05, 2019

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

42b5bec6

05 8月, 2019 1 次提交

fix warpctc.dll not found issue (#18761) · a43a763b

由 liuwei1031 提交于 8月 05, 2019

* fix warpctc.dll not found issue, test=develop

* revert the linux platform change, test=develop

* delete warpctc_lib_path.h.in, test=develop

* add SetPySitePackagePath function

* fix warpctc.dylib not found issue on Mac, test=develop

* improve the paddle lib path setting logic, test=develop

* fix mac ci issue caused by test_warpctc_op unittest, test=develop

* tweak code, test=develop

a43a763b

29 7月, 2019 1 次提交
- H
  
  Try to modify external gflags to solve CI compilation (#18872) · 0d3f16f5
  由 Huihuang Zheng 提交于 7月 29, 2019
  
  0d3f16f5
27 7月, 2019 1 次提交
- H
  Merge cuda 9/10 dockerfile with root dockerfile (#18693) · cfce4994
  由 Huihuang Zheng 提交于 7月 27, 2019
```
Also fix a dependency error which may cause compile error
```
  cfce4994
07 5月, 2019 1 次提交
- T
  remove unused FLAGS_warpctc_dir (#17162) · ff1661f1
  由 Tao Luo 提交于 5月 07, 2019
```
* remove unused FLAGS_warpctc_dir

test=develop

* remove FLAGS_warpctc_dir

test=develop
```
  ff1661f1
03 4月, 2019 1 次提交
- C
  Revert "Model data cryption link all lib (#16555)" · 0b2aec14
  由 Chen Weihang 提交于 4月 03, 2019
```
test=develop
This reverts commit c38c7c56.
```
  0b2aec14
02 4月, 2019 1 次提交

Model data cryption link all lib (#16555) · c38c7c56

由 Chen Weihang 提交于 4月 02, 2019

* link the libwbaes.so into paddle

* polish detail, test=develop

* try fix mac_pr_ci error, test=develop

* add compile option, test=develop

* fix ci error, test=develop

* ignore failed to find mac lib, test=develop

* change cdn to bj, cdn can't get the latest version

* trigger ci, test=develop

* temporary delete win32 lib linking, test=develop

* change https to http, test=develop

* turn compile option on to off

* turn compile option off to on, test=develop

* try lib compiled by gcc4.8, test=develop

* update lib version, test=develop

* link other lib, test=develop

* add setup config

* delete false, test=develop

* delete no_soname, test=develop

* recover so name set

* fix, test=develop

* adjust make config, test=develop

* remove link to wbaes, test=develop

* remove useless define, test=develop

c38c7c56

18 12月, 2018 3 次提交
- P
  
  test=develop · ed5bd5e5
  由 peizhilin 提交于 12月 18, 2018
  
  ed5bd5e5
- P
  include the mkl fix only · b601f2de
  由 peizhilin 提交于 12月 18, 2018
```
test=develop
```
  b601f2de
- P
  
  add mkl,ctc support for windows · 5a6d7fe2
  由 peizhilin 提交于 12月 18, 2018
  
  5a6d7fe2
27 8月, 2018 3 次提交
- D
  
  add unstack_op · 0153c21d
  由 dzhwinter 提交于 8月 27, 2018
  
  0153c21d
- D
  
  fix concat synchronization bug · 6cc78705
  由 dzhwinter 提交于 8月 27, 2018
  
  6cc78705
- D
  platform module (#12932) · d361624c
  由 dzhwinter 提交于 8月 27, 2018
```
* platform module

* Update profiler.h
```
  d361624c
24 8月, 2018 1 次提交
- D
  
  windows port · 34f8c9b6
  由 dzhwinter 提交于 8月 24, 2018
  
  34f8c9b6
20 8月, 2018 1 次提交
- D
  cudnn windows support (#12757) · 00463fdf
  由 dzhwinter 提交于 8月 20, 2018
```
* cudnn widndows

* "add comment"

* "windows support"

* "fix cmake error"
```
  00463fdf
17 8月, 2018 2 次提交
- D
  
  "windows support" · 64ce1210
  由 dzhwinter 提交于 8月 17, 2018
  
  64ce1210
- D
  
  "windows support" · 59160e8d
  由 dzhwinter 提交于 8月 17, 2018
  
  59160e8d
23 6月, 2018 1 次提交
- Y
  No NCCL on macOS (#11652) · 2625178a
  由 Yi Wang 提交于 6月 22, 2018
```
* Make paddle no longer depend on boost

* Update enforce.h
```
  2625178a
20 6月, 2018 1 次提交
- T
  
  enable dynamic load mklml lib on fluid · f503f129
  由 tensor-tang 提交于 6月 20, 2018
  
  f503f129
16 4月, 2018 2 次提交
- L
  
  auto find tensorrt library · d4682247
  由 Luo Tao 提交于 4月 16, 2018
  
  d4682247
- Y
  
  add tensorrt build support(#9891) · 18665979
  由 Yan Chunwei 提交于 4月 16, 2018
  
  18665979
28 2月, 2018 1 次提交
- Y
  Fix the compilation on CUDA 9.1/GCC 5.3 · 22b5c07a
  由 Yu Yang 提交于 2月 28, 2018
```
* Make CUPTI_LIB_PATH not passing by macro.
* Add missing header
```
  22b5c07a
26 2月, 2018 1 次提交
- X
  
  Extend current profiler for timeline and more features. · b9ec24c6
  由 Xin Pan 提交于 2月 24, 2018
  
  b9ec24c6
14 2月, 2018 1 次提交

compile with nccl2 (#8411) · 87f4311a

由 Yang Yang(Tony) 提交于 2月 13, 2018

* compile with nccl2

* add ncclGroup; it is necessary in nccl2

* add back libnccl-dev

87f4311a

10 2月, 2018 1 次提交
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功