提交 · ea7e532598448ea5912e8f7c63a7a034376242e3 · BaiXuePrincess / Paddle

07 7月, 2020 1 次提交
- G
  Refine PADDLE_ENFORCE (#25369) · ea7e5325
  由 GaoWei8 提交于 7月 07, 2020
```
* refine PADDLE_ENFORCE
test=develop
```
  ea7e5325
03 7月, 2020 1 次提交
- G
  fix PADDLE_ENFORCE (#25297) · fb70682f
  由 GaoWei8 提交于 7月 03, 2020
```
* fix PADDLE_ENFORCE and refine the description
test=develop
```
  fb70682f
02 7月, 2020 1 次提交

Refactor dynamic dso search functions (#25214) · 5a959f6e

由 Chen Weihang 提交于 7月 02, 2020

* refactor dynamic dso search func, test=develop

* polish details, test=develop

* polish detail based review comments, test=develop

* revert string type change, test=develop

5a959f6e

24 6月, 2020 1 次提交

Add default cudnn lib path (#25175) · 353ea9e8

由 Chen Weihang 提交于 6月 24, 2020

* add default cudnn lib path, test=develop

* change default path in func, test=develop

* move to linux branch, test=develop

* fix var error in other plat, test=develop

353ea9e8

05 6月, 2020 1 次提交

Support SelelctedRows allreduce in multi-cards imperative mode (#24690) · 4a702ef3

由 Chen Weihang 提交于 6月 05, 2020

* support selectedrows allreduce in multi-cards dygraph, test=develop

* remove useless import modules in unittests, test=develop

* add nccl cmake to get nccl version, test=develop

* add if-condition to compiled correctly, test=develop

* add detail version parseing for old nccl, test=develop

* polish camke details, test=develop

* fix remove test cmake error, test=develop

* fix cmake condition, test=develop

* change unittest camke list, test=develop

* fix unittest cmake rule, test=develop, test=framep0

4a702ef3

18 5月, 2020 1 次提交

Add some check for CUDA Driver API and NVRTC (#22719) · 560c8153

由 Yiqun Liu 提交于 5月 18, 2020

* Add the check for whether CUDA Driver and NVRTC is available for the runtime system.

* Call cuInit to initialize the CUDA Driver API before all CUDA callings.
test=develop

* Change the behavior when libnvrtc.so can not be found, printing a warning instead of exiting.
test=develop

* Do not initialize CUDA Driver API for windows and macos.
test=develop

* Remove the call of cuInit when entering paddle and enable the test_code_generator.
test=develop

* Add some built-in functions for __half.
test=develop

* Change save_intermediate_out to false in unittest.
test=develop

* Fix error reference to tempropary variable when seting including path for device_code.
test=develop

560c8153

08 5月, 2020 1 次提交
- G
  Remove cusolver potrfBatched support on Windows. (#24338) · 4a5de144
  由 Guo Sheng 提交于 5月 08, 2020
```
test=develop
test=win_gpu
```
  4a5de144
30 4月, 2020 1 次提交

Fix cusolver loader for Windows (#24157) · 1fc6cc50

由 Guo Sheng 提交于 4月 30, 2020

* Fix cusolver loader for Windows in dynamic_loader.cc. test=develop

* Fix missing CUSOLVER_ROUTINE_EACH_R1.
test=gpu
test=develop

* Add unsupprot for cusolver on Windows temporarily. test=develop

* Fix GetCusolverDsoHandle error message. test=develop

1fc6cc50

27 4月, 2020 1 次提交
- Y
  
  Add the implementation of inverse (#23310) · ecfddebb
  由 Yiqun Liu 提交于 4月 27, 2020
  
  ecfddebb
24 4月, 2020 1 次提交

Add cholesky_op (#23543) · a8c0fb4e

由 Guo Sheng 提交于 4月 24, 2020

* Add cholesky_op forward part. test=develop

* Complete cholesky_op forward part. test=develop

* Add cholesky_op backward part. test=develop

* Complete cholesky_op backward part. test=develop

* Refine cholesky_op error check and docs. test=develop

* Add grad_check unit test for cholesky_op. test=develop

* Fix sample code in cholesky doc. test=develop

* Refine some error messages of cholesky_op. test=develop

* Refine some error messages of cholesky_op. test=develop

* Remove unused input in cholesky_grad. test=develop

* Remove unused input in cholesky_grad. test=develop

* Fix stream for cusolverDnSetStream. test=develop

* Update PADDLE_ENFORCE_CUDA_SUCCESS from cholesky_op to adapt to latest code.
test=develop

* Add CUSOLVER ERROR in enforce.h
test=develop

* Fix the missing return value in cholesky. test=develop

a8c0fb4e

10 4月, 2020 2 次提交
- L
  test=develop, add addmm op (#23384) · 1c08a213
  由 littletomatodonkey 提交于 4月 10, 2020
```
add addmm op
```
  1c08a213
- T
  
  solve mklml memory leak (#23557) · e4f1b1c5
  由 Tao Luo 提交于 4月 10, 2020
  
  e4f1b1c5
05 2月, 2020 1 次提交

add WITH_NCCL option for cmake. (#22384) · 7bc4b095

由 Wilber 提交于 2月 05, 2020

cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL

添加了PADDLE_WITH_NCCL定义

单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

7bc4b095

03 1月, 2020 1 次提交

Add the first implememtation of fusion_group op (#19621) · d4832077

由 Yiqun Liu 提交于 1月 03, 2020

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

* Add DeviceCodePool to manage all device codes.

* Add the first implementation fusion_group op.

* Add unit-test for fusion_group op.

* Add the check of result.

* Add the check of nvrtc in unit-test.
test=develop

* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop

* Disable fusion_group op for mac and windows.
test=develop

* Make the compiling of device code return status instead of hanging up.
test=develop

* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.

* Unify fusion_group_op's input and output names.
test=develop

* Add the check of CUDA driver library in unittest.
test=develop

* Refine the calling of PADDLE_ENFORCE.
test=develop

d4832077

01 12月, 2019 1 次提交
- J
  
  nhwc optimization for batchnorm (#21090) · 5e813b53
  由 Jie Fang 提交于 12月 01, 2019
  
  5e813b53
30 9月, 2019 1 次提交
- D
  Improve elementwise operators performance in same dimensions. (#19763) · 425279a5
  由 danleifeng 提交于 9月 30, 2019
```
Improve elementwise operators performance in same dimensions
```
  425279a5
28 9月, 2019 2 次提交

Enable users to create custom cpp op outside framework. (#19256) · 1a3eef02

由 qingqing01 提交于 9月 28, 2019

* How to write custom op needs to follow framework OP spec.
* Package fluid_framework.so and headers into whl.
* Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir.
* Export some C-APIs to merge OpInfo between core.so and custom_op.so.
* Add unit testing.
* Update API.spec.

1a3eef02

fix pool2d pool3d,support asymmetric padding and channel_last (#19739) · 24010472

由 liym27 提交于 9月 28, 2019

* fix pool2d pool3d:
1. support asymmetric padding;
2. support padding algorithm:"SAME" and "VALID";
3. support channel_last: data_format NHWC and NDHWC;
4. support inferring shape when input with negative dims in compile time;
5. change doc of python API and c++;
6. fix bug in cuda kernel when Attr(adaptive) is true.

test=develop,test=document_preview

* fix 'tensors' to 'Tensors'. test=develop,test=document_preview

* add test for converage ValueError.test=develop,test=document_preview

* resolve conflict in test_pool2d. test=develop

24010472

14 9月, 2019 1 次提交
- Y
  Fix the definition issue when used mkl_scsrmm and mkl_dcsrmm functions. (#19774) · 0d6ea529
  由 Yihua Xu 提交于 9月 13, 2019
```
test=develop
```
  0d6ea529
05 9月, 2019 1 次提交

Integrate NVRTC to support compiling CUDA kernel at runtime (#19422) · 42b5bec6

由 Yiqun Liu 提交于 9月 05, 2019

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

42b5bec6

02 9月, 2019 1 次提交
- Z
  
  fix the compilation issue on windows caused by mkl_CSRMM (#19533) · 84c72801
  由 zhouwei25 提交于 9月 02, 2019
  
  84c72801
20 8月, 2019 1 次提交

Use sparse matrix to implement fused emb_seq_pool operator (#19064) · b9203958

由 Yihua Xu 提交于 8月 20, 2019

* Implement the operator with sprase matrix multiply

* Update the URL of mklml library.

test=develop

* Disable MKLML implematation when using no-linux.

test=develop

* Ignore the deprecated status for windows

test=develop

b9203958

12 8月, 2019 1 次提交
- W
  add tensorrt support for windows (#19084) · 80b7ef6f
  由 wopeizl 提交于 8月 12, 2019
```
* add tensorrt support for windows
```
  80b7ef6f
05 8月, 2019 1 次提交

fix warpctc.dll not found issue (#18761) · a43a763b

由 liuwei1031 提交于 8月 05, 2019

* fix warpctc.dll not found issue, test=develop

* revert the linux platform change, test=develop

* delete warpctc_lib_path.h.in, test=develop

* add SetPySitePackagePath function

* fix warpctc.dylib not found issue on Mac, test=develop

* improve the paddle lib path setting logic, test=develop

* fix mac ci issue caused by test_warpctc_op unittest, test=develop

* tweak code, test=develop

a43a763b

29 7月, 2019 1 次提交
- H
  
  Try to modify external gflags to solve CI compilation (#18872) · 0d3f16f5
  由 Huihuang Zheng 提交于 7月 29, 2019
  
  0d3f16f5
27 7月, 2019 1 次提交
- H
  Merge cuda 9/10 dockerfile with root dockerfile (#18693) · cfce4994
  由 Huihuang Zheng 提交于 7月 27, 2019
```
Also fix a dependency error which may cause compile error
```
  cfce4994
27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

03 6月, 2019 2 次提交
- W
  revise the cudnn conv choose algorithm to improve the performance(mask rcnn benchmark) (#17753) · c10157a5
  由 wangchaochaohu 提交于 6月 03, 2019
```
* revise conv layer cudnn algo choose test=develop

* update for code style test=develop

* update for code style test=develop
```
  c10157a5
- C
  polish error doc (#17772) · 863c7516
  由 chengduo 提交于 6月 03, 2019
```
test=develop
```
  863c7516
07 5月, 2019 1 次提交
- T
  remove unused FLAGS_warpctc_dir (#17162) · ff1661f1
  由 Tao Luo 提交于 5月 07, 2019
```
* remove unused FLAGS_warpctc_dir

test=develop

* remove FLAGS_warpctc_dir

test=develop
```
  ff1661f1
03 4月, 2019 1 次提交
- C
  Revert "Model data cryption link all lib (#16555)" · 0b2aec14
  由 Chen Weihang 提交于 4月 03, 2019
```
test=develop
This reverts commit c38c7c56.
```
  0b2aec14
02 4月, 2019 1 次提交

Model data cryption link all lib (#16555) · c38c7c56

由 Chen Weihang 提交于 4月 02, 2019

* link the libwbaes.so into paddle

* polish detail, test=develop

* try fix mac_pr_ci error, test=develop

* add compile option, test=develop

* fix ci error, test=develop

* ignore failed to find mac lib, test=develop

* change cdn to bj, cdn can't get the latest version

* trigger ci, test=develop

* temporary delete win32 lib linking, test=develop

* change https to http, test=develop

* turn compile option on to off

* turn compile option off to on, test=develop

* try lib compiled by gcc4.8, test=develop

* update lib version, test=develop

* link other lib, test=develop

* add setup config

* delete false, test=develop

* delete no_soname, test=develop

* recover so name set

* fix, test=develop

* adjust make config, test=develop

* remove link to wbaes, test=develop

* remove useless define, test=develop

c38c7c56

04 3月, 2019 2 次提交

由 dzhwinter 提交于 2月 27, 2019

* staged.

* polish code

* polish code. test=develop

* polish code. test=develop

* api change. test=develop

* fix default value. test=develop

* fix default value. test=develop

4449e855

Y
Optimize gelu operation with mkl erf. · b48d56e8
由 Yihua Xu 提交于 2月 26, 2019
```
test=develop
```
b48d56e8

27 2月, 2019 1 次提交

由 dzhwinter 提交于 2月 27, 2019

* staged.

* polish code

* polish code. test=develop

* polish code. test=develop

* api change. test=develop

* fix default value. test=develop

* fix default value. test=develop

225c11a9

26 2月, 2019 1 次提交
- Y
  Optimize gelu operation with mkl erf. · 73967886
  由 Yihua Xu 提交于 2月 26, 2019
```
test=develop
```
  73967886
22 2月, 2019 2 次提交

T
Revert 15770 develop a6910f90 gelu mkl opt (#15872) · ee2321de
由 tensor-tang 提交于 2月 22, 2019
```
* Revert "Optimze Gelu with MKL Erf function (#15770)"

This reverts commit 676995c8.

* test=develop
```
ee2321de

Optimze Gelu with MKL Erf function (#15770) · 676995c8

由 Yihua Xu 提交于 2月 22, 2019

* Optimize for gelu operator

* Set up the low accuracy mode of MKL ERF function.

test=develop

* Only enable MKLML ERF when OS is linux

* Use the speical mklml version included vmsErf function to verify gelu mkl kernel.

test=develop

* Add the CUDA macro to avoid NVCC's compile issue.

test=develop

* Add the TODO comments for mklml library modification.

test=develop

* Clean Code

test=develop

* Add the comment of marco for NVCC compiler.

test=develop

676995c8

28 1月, 2019 1 次提交
- T
  add jit kernel hsum, hmax and softmax refer code · 81177258
  由 tensor-tang 提交于 1月 25, 2019
```
test=develop
```
  81177258
26 12月, 2018 1 次提交
- P
  add cuda dso support for windows · 1e7f83e6
  由 peizhilin 提交于 12月 26, 2018
```
test=develop
```
  1e7f83e6

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致