提交 · c17f9cf25fd42ab868983a85c03d8c9a2b4a007d · 机器未来 / Paddle

23 9月, 2020 1 次提交
- S
  [bug fix]:Memory increases after adapting the cudnn version to cudnn8 (#27436) · c17f9cf2
  由 Shang Zhizhou 提交于 9月 23, 2020
```
* [bug fix]:Memory increases after adapting the cudnn version to 8

* [bug fix]cudnnGetConvolutionForwardAlgorithm not defined
```
  c17f9cf2
18 9月, 2020 1 次提交
- G
  fix cudnn dyload (#27308) · 1a755971
  由 GaoWei8 提交于 9月 18, 2020
```
* fix cudnn dyload error
```
  1a755971
07 9月, 2020 1 次提交
- G
  Add padding cudnn interface (#26370) · 4ff16eb2
  由 GaoWei8 提交于 9月 07, 2020
```
* add lstm cudnn of padding data and refine cudnn codes
```
  4ff16eb2
03 9月, 2020 1 次提交
- W
  
  [cuda11 support] add support for cublas load of same function name (parameter diff) (#26963) · 3eacced9
  由 wangchaochaohu 提交于 9月 03, 2020
  
  3eacced9
19 8月, 2020 1 次提交
- G
  
  remove scope in cudnn lstm (#25188) · 1fbee267
  由 GaoWei8 提交于 8月 19, 2020
  
  1fbee267
07 8月, 2020 1 次提交
- P
  Fix TRT plugin registry without TRT lib (#25982) · beb0ca5f
  由 Pei Yang 提交于 8月 07, 2020
```
* fix trt plugin registry without trt lib

* support trt4

* refine code style
```
  beb0ca5f
05 8月, 2020 2 次提交

Z
[CUDNN8 support] : support CUDNN8 (#25664) · 358bc06c
由 Zhaolong Xing 提交于 8月 05, 2020
```
* cunn8 support
test=develop

* fix ci error
test=develop
```
358bc06c

Fix registering trt plugin (#25744) · b717895f

由 Pei Yang 提交于 8月 05, 2020

* develop dynamic shape serilization

* add test param for gelu

* fix bugs

* delete redundant comments

* debug

* fix conflict. test=develop

* fix bug. test=develop

* add trt dynamic shape serialized support

* fix ernie serialized bug
test=develop

* fix codestyle
test=develop

* fix bug
test=develop

* fix bug.test=develop

* modify cmakelist test=develop

* fix bug
test=develop

* fix error message.  test=develop

* fix trt register plugin based on pr#25003

* add trt dynload

* fix deserialization bug of not finding plugin registration

* refine code style

* recover engine key in tensorrt_subgraph_pass

* for ci coverage

* add unittest for deserialization
Co-authored-by: Nhaozech <chenhaoze94@gmail.com>

b717895f

20 7月, 2020 1 次提交

Polish install error hint message (#25531) · a6abd92d

由 Chen Weihang 提交于 7月 20, 2020

* polish install error hint msg, test=develop

* fix variable error, test=develop

* polish hint messgae again

a6abd92d

15 7月, 2020 1 次提交
- G
  refine PADDLE_ENFORCE (#25456) · c10dcff1
  由 GaoWei8 提交于 7月 15, 2020
```
* Refine PADDLE_ENFORCE in paddle/fluid/platform
test=develop
```
  c10dcff1
09 7月, 2020 2 次提交
- C
  
  remove WITH_DSO compile option (#25444) · 172d4ecb
  由 Chen Weihang 提交于 7月 09, 2020
  
  172d4ecb
- Z
  
  add the c++ part of Imperative QAT. test=develop (#25446) · bb45af02
  由 Zhen Wang 提交于 7月 09, 2020
  
  bb45af02
07 7月, 2020 1 次提交
- G
  Refine PADDLE_ENFORCE (#25369) · ea7e5325
  由 GaoWei8 提交于 7月 07, 2020
```
* refine PADDLE_ENFORCE
test=develop
```
  ea7e5325
03 7月, 2020 1 次提交
- G
  fix PADDLE_ENFORCE (#25297) · fb70682f
  由 GaoWei8 提交于 7月 03, 2020
```
* fix PADDLE_ENFORCE and refine the description
test=develop
```
  fb70682f
02 7月, 2020 1 次提交

Refactor dynamic dso search functions (#25214) · 5a959f6e

由 Chen Weihang 提交于 7月 02, 2020

* refactor dynamic dso search func, test=develop

* polish details, test=develop

* polish detail based review comments, test=develop

* revert string type change, test=develop

5a959f6e

24 6月, 2020 1 次提交

Add default cudnn lib path (#25175) · 353ea9e8

由 Chen Weihang 提交于 6月 24, 2020

* add default cudnn lib path, test=develop

* change default path in func, test=develop

* move to linux branch, test=develop

* fix var error in other plat, test=develop

353ea9e8

05 6月, 2020 1 次提交

Support SelelctedRows allreduce in multi-cards imperative mode (#24690) · 4a702ef3

由 Chen Weihang 提交于 6月 05, 2020

* support selectedrows allreduce in multi-cards dygraph, test=develop

* remove useless import modules in unittests, test=develop

* add nccl cmake to get nccl version, test=develop

* add if-condition to compiled correctly, test=develop

* add detail version parseing for old nccl, test=develop

* polish camke details, test=develop

* fix remove test cmake error, test=develop

* fix cmake condition, test=develop

* change unittest camke list, test=develop

* fix unittest cmake rule, test=develop, test=framep0

4a702ef3

18 5月, 2020 1 次提交

Add some check for CUDA Driver API and NVRTC (#22719) · 560c8153

由 Yiqun Liu 提交于 5月 18, 2020

* Add the check for whether CUDA Driver and NVRTC is available for the runtime system.

* Call cuInit to initialize the CUDA Driver API before all CUDA callings.
test=develop

* Change the behavior when libnvrtc.so can not be found, printing a warning instead of exiting.
test=develop

* Do not initialize CUDA Driver API for windows and macos.
test=develop

* Remove the call of cuInit when entering paddle and enable the test_code_generator.
test=develop

* Add some built-in functions for __half.
test=develop

* Change save_intermediate_out to false in unittest.
test=develop

* Fix error reference to tempropary variable when seting including path for device_code.
test=develop

560c8153

08 5月, 2020 1 次提交
- G
  Remove cusolver potrfBatched support on Windows. (#24338) · 4a5de144
  由 Guo Sheng 提交于 5月 08, 2020
```
test=develop
test=win_gpu
```
  4a5de144
30 4月, 2020 1 次提交

Fix cusolver loader for Windows (#24157) · 1fc6cc50

由 Guo Sheng 提交于 4月 30, 2020

* Fix cusolver loader for Windows in dynamic_loader.cc. test=develop

* Fix missing CUSOLVER_ROUTINE_EACH_R1.
test=gpu
test=develop

* Add unsupprot for cusolver on Windows temporarily. test=develop

* Fix GetCusolverDsoHandle error message. test=develop

1fc6cc50

27 4月, 2020 1 次提交
- Y
  
  Add the implementation of inverse (#23310) · ecfddebb
  由 Yiqun Liu 提交于 4月 27, 2020
  
  ecfddebb
24 4月, 2020 1 次提交

Add cholesky_op (#23543) · a8c0fb4e

由 Guo Sheng 提交于 4月 24, 2020

* Add cholesky_op forward part. test=develop

* Complete cholesky_op forward part. test=develop

* Add cholesky_op backward part. test=develop

* Complete cholesky_op backward part. test=develop

* Refine cholesky_op error check and docs. test=develop

* Add grad_check unit test for cholesky_op. test=develop

* Fix sample code in cholesky doc. test=develop

* Refine some error messages of cholesky_op. test=develop

* Refine some error messages of cholesky_op. test=develop

* Remove unused input in cholesky_grad. test=develop

* Remove unused input in cholesky_grad. test=develop

* Fix stream for cusolverDnSetStream. test=develop

* Update PADDLE_ENFORCE_CUDA_SUCCESS from cholesky_op to adapt to latest code.
test=develop

* Add CUSOLVER ERROR in enforce.h
test=develop

* Fix the missing return value in cholesky. test=develop

a8c0fb4e

10 4月, 2020 2 次提交
- L
  test=develop, add addmm op (#23384) · 1c08a213
  由 littletomatodonkey 提交于 4月 10, 2020
```
add addmm op
```
  1c08a213
- T
  
  solve mklml memory leak (#23557) · e4f1b1c5
  由 Tao Luo 提交于 4月 10, 2020
  
  e4f1b1c5
05 2月, 2020 1 次提交

add WITH_NCCL option for cmake. (#22384) · 7bc4b095

由 Wilber 提交于 2月 05, 2020

cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL

添加了PADDLE_WITH_NCCL定义

单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

7bc4b095

03 1月, 2020 1 次提交

Add the first implememtation of fusion_group op (#19621) · d4832077

由 Yiqun Liu 提交于 1月 03, 2020

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

* Add DeviceCodePool to manage all device codes.

* Add the first implementation fusion_group op.

* Add unit-test for fusion_group op.

* Add the check of result.

* Add the check of nvrtc in unit-test.
test=develop

* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop

* Disable fusion_group op for mac and windows.
test=develop

* Make the compiling of device code return status instead of hanging up.
test=develop

* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.

* Unify fusion_group_op's input and output names.
test=develop

* Add the check of CUDA driver library in unittest.
test=develop

* Refine the calling of PADDLE_ENFORCE.
test=develop

d4832077

01 12月, 2019 1 次提交
- J
  
  nhwc optimization for batchnorm (#21090) · 5e813b53
  由 Jie Fang 提交于 12月 01, 2019
  
  5e813b53
30 9月, 2019 1 次提交
- D
  Improve elementwise operators performance in same dimensions. (#19763) · 425279a5
  由 danleifeng 提交于 9月 30, 2019
```
Improve elementwise operators performance in same dimensions
```
  425279a5
28 9月, 2019 2 次提交

Enable users to create custom cpp op outside framework. (#19256) · 1a3eef02

由 qingqing01 提交于 9月 28, 2019

* How to write custom op needs to follow framework OP spec.
* Package fluid_framework.so and headers into whl.
* Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir.
* Export some C-APIs to merge OpInfo between core.so and custom_op.so.
* Add unit testing.
* Update API.spec.

1a3eef02

fix pool2d pool3d,support asymmetric padding and channel_last (#19739) · 24010472

由 liym27 提交于 9月 28, 2019

* fix pool2d pool3d:
1. support asymmetric padding;
2. support padding algorithm:"SAME" and "VALID";
3. support channel_last: data_format NHWC and NDHWC;
4. support inferring shape when input with negative dims in compile time;
5. change doc of python API and c++;
6. fix bug in cuda kernel when Attr(adaptive) is true.

test=develop,test=document_preview

* fix 'tensors' to 'Tensors'. test=develop,test=document_preview

* add test for converage ValueError.test=develop,test=document_preview

* resolve conflict in test_pool2d. test=develop

24010472

14 9月, 2019 1 次提交
- Y
  Fix the definition issue when used mkl_scsrmm and mkl_dcsrmm functions. (#19774) · 0d6ea529
  由 Yihua Xu 提交于 9月 13, 2019
```
test=develop
```
  0d6ea529
05 9月, 2019 1 次提交

Integrate NVRTC to support compiling CUDA kernel at runtime (#19422) · 42b5bec6

由 Yiqun Liu 提交于 9月 05, 2019

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

42b5bec6

02 9月, 2019 1 次提交
- Z
  
  fix the compilation issue on windows caused by mkl_CSRMM (#19533) · 84c72801
  由 zhouwei25 提交于 9月 02, 2019
  
  84c72801
20 8月, 2019 1 次提交

Use sparse matrix to implement fused emb_seq_pool operator (#19064) · b9203958

由 Yihua Xu 提交于 8月 20, 2019

* Implement the operator with sprase matrix multiply

* Update the URL of mklml library.

test=develop

* Disable MKLML implematation when using no-linux.

test=develop

* Ignore the deprecated status for windows

test=develop

b9203958

12 8月, 2019 1 次提交
- W
  add tensorrt support for windows (#19084) · 80b7ef6f
  由 wopeizl 提交于 8月 12, 2019
```
* add tensorrt support for windows
```
  80b7ef6f
05 8月, 2019 1 次提交

fix warpctc.dll not found issue (#18761) · a43a763b

由 liuwei1031 提交于 8月 05, 2019

* fix warpctc.dll not found issue, test=develop

* revert the linux platform change, test=develop

* delete warpctc_lib_path.h.in, test=develop

* add SetPySitePackagePath function

* fix warpctc.dylib not found issue on Mac, test=develop

* improve the paddle lib path setting logic, test=develop

* fix mac ci issue caused by test_warpctc_op unittest, test=develop

* tweak code, test=develop

a43a763b

29 7月, 2019 1 次提交
- H
  
  Try to modify external gflags to solve CI compilation (#18872) · 0d3f16f5
  由 Huihuang Zheng 提交于 7月 29, 2019
  
  0d3f16f5
27 7月, 2019 1 次提交
- H
  Merge cuda 9/10 dockerfile with root dockerfile (#18693) · cfce4994
  由 Huihuang Zheng 提交于 7月 27, 2019
```
Also fix a dependency error which may cause compile error
```
  cfce4994
27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

03 6月, 2019 1 次提交
- W
  revise the cudnn conv choose algorithm to improve the performance(mask rcnn benchmark) (#17753) · c10157a5
  由 wangchaochaohu 提交于 6月 03, 2019
```
* revise conv layer cudnn algo choose test=develop

* update for code style test=develop

* update for code style test=develop
```
  c10157a5

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致