提交 · 7bc4b095008058fdde05e3e9337e31845f1ce9b5 · Crayon鑫 / Paddle

05 2月, 2020 1 次提交

add WITH_NCCL option for cmake. (#22384) · 7bc4b095

由 Wilber 提交于 2月 05, 2020

cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL

添加了PADDLE_WITH_NCCL定义

单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

7bc4b095

31 1月, 2020 1 次提交

[DNNL] Fix accuracy in INT8 FC (#22404) · 269db0d1

由 Michał Gallus 提交于 1月 31, 2020

* Enable quantize to reorder to nchw as well

* Correct FC MKL-DNN input dim requirements to accept 3D

* Improve DNNL FC format, error and 3D input handling

test=develop

* Improve error checking in FC

test=develop

* Improve PADDLE_ENFORCE messages in fc-related files

* Remove data layout attribute from obligatory pass args

test=develop

* Fix message in fc_mkldnn_pass to be logically correct

test=develop

269db0d1

10 1月, 2020 1 次提交
- W
  fix the bug of profile update (#22207) · 621d3e0b
  由 wangchaochaohu 提交于 1月 11, 2020
```
* fix the bug of profile update test=develop
```
  621d3e0b
09 1月, 2020 3 次提交
- 石
  
  [Feature] Lite subgraph (#22114) · ad0dfb17
  由石晓伟提交于 1月 09, 2020
  
  ad0dfb17
- Y
  Polish the PADDLE_ENFORCE in fusion_group pass related codes. (#22144) · 96980c22
  由 Yiqun Liu 提交于 1月 09, 2020
```
* Polish the PADDLE_ENFORCE in fusion_group pass related codes.
test=develop

* Correct the unittest because of the change relu_grad's formula.
test=develop
```
  96980c22
- W
  add support for nested profiling event and printing in different level (#22061) · c3876cf8
  由 wangchaochaohu 提交于 1月 09, 2020
```
* add support for nested profiling event and printing in different level
```
  c3876cf8
08 1月, 2020 2 次提交
- Z
  Refine stack op to improve xlnet performance, test=develop (#22142) · 3d4f2aa6
  由 zhaoyuchen2018 提交于 1月 08, 2020
```
stack's wait cost a lot of cpu time, use cuda kernel to do memory copy
will reduce cpu time.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
```
  3d4f2aa6
- Z
  
  fix allocator strategy comment, test=develop, test=document_fix (#22121) · 4c2df8e4
  由 Zeng Jinle 提交于 1月 08, 2020
  
  4c2df8e4
07 1月, 2020 2 次提交
- B
  
  Add explanation on conv grad for dims<3 (#22125) · 7872d06f
  由 bingyanghuang 提交于 1月 07, 2020
  
  7872d06f
- C
  
  replace CUDNN_ENFORCE with PADDLE_ENFORCE_CUDA_SUCCESS, test=develop (#22109) · ba8414d3
  由 Chen Weihang 提交于 1月 07, 2020
  
  ba8414d3
06 1月, 2020 3 次提交
- J
  
  [MKL-DNN] Conv grad and Batch Norm grad NHWC support (#22088) · b0b27ff6
  由 Jacek Czaja 提交于 1月 06, 2020
  
  b0b27ff6
- Z
  
  polish allocator strategy doc, test=develop, test=document_fix (#22095) · 95872494
  由 Zeng Jinle 提交于 1月 06, 2020
  
  95872494
- Z
  
  ag allocator by default, test=develop (#21837) · d9f5d1eb
  由 Zeng Jinle 提交于 1月 06, 2020
  
  d9f5d1eb
05 1月, 2020 1 次提交
- J
  
  [MKL-DNN] Pool & LRN Grad Ops NHWC support (#21747) · ad8a9cb8
  由 Jacek Czaja 提交于 1月 05, 2020
  
  ad8a9cb8
03 1月, 2020 1 次提交

Add the first implememtation of fusion_group op (#19621) · d4832077

由 Yiqun Liu 提交于 1月 03, 2020

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

* Add DeviceCodePool to manage all device codes.

* Add the first implementation fusion_group op.

* Add unit-test for fusion_group op.

* Add the check of result.

* Add the check of nvrtc in unit-test.
test=develop

* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop

* Disable fusion_group op for mac and windows.
test=develop

* Make the compiling of device code return status instead of hanging up.
test=develop

* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.

* Unify fusion_group_op's input and output names.
test=develop

* Add the check of CUDA driver library in unittest.
test=develop

* Refine the calling of PADDLE_ENFORCE.
test=develop

d4832077

01 1月, 2020 1 次提交
- C
  
  polish default error msg & cublas error hint, test=develop (#22032) · 2e908225
  由 Chen Weihang 提交于 1月 01, 2020
  
  2e908225
30 12月, 2019 2 次提交
- C
  
  Add error message for cublas inItizalize failed (#21995) · 35ff1568
  由 Chen Weihang 提交于 12月 30, 2019
  
  35ff1568
- C
  
  fix no hint problem when use ENFORCE for cuda, test=develop (#21994) · fbb42173
  由 Chen Weihang 提交于 12月 30, 2019
  
  fbb42173
15 12月, 2019 1 次提交
- C
  Rename paddle throw error macro (#21657) · 1fd1f06f
  由 Chen Weihang 提交于 12月 15, 2019
```
* rename paddle throw error macro, test=develop

* fix new error use case, test=develop
```
  1fd1f06f
10 12月, 2019 1 次提交

MKL-DNN 1.0 Update (#20162) · e81f0228

由 Adam 提交于 12月 10, 2019

* MKLDNN v1.0 rebase to Paddle 1.6
test=develop

* Add hacky paddle::string::to_string() implementation

* vectorize<int64-t>() -> vectorize() cleanup
test=develop

* PADDLE_ENFORCE and void_cast fixes
test=develop

* Rebase changes
test=develop

* Cosmetics
test=develop

* Delete MKL from mkldnn.cmake
test=develop

* CMake debug commands
test=develop

* Delete MKLDNN_VERBOSE and rebase fixes
test=develop

* Rebase fixes
test=develop

* Temporarily disable int8 resnet101 vgg16 and vgg19 tests
test=develop

* Add libmkldnn.so.1 to python setup
test=develop

* Add libmkldnn.so.1 to inference_lib cmake after rebase
test=develop

* Post rebase fixes + FC int8 changes
test=develop

* Fix LRN NHWC
test=develop

* Fix NHWC conv3d
test=develop

* Windows build fix + next conv3d fix
test=develop

* Fix conv2d on AVX2 machines
test=develop

e81f0228

06 12月, 2019 1 次提交
- Z
  
  refine dev_ctx.Wait() exception throw, test=develop (#21600) · 97e76cb9
  由 Zeng Jinle 提交于 12月 06, 2019
  
  97e76cb9
05 12月, 2019 2 次提交
- H
  Refine a Warning Which Can Occur Not Only During Init (#21546) · b241c732
  由 Huihuang Zheng 提交于 12月 05, 2019
```
As the title
```
  b241c732
- W
  Add Branch to avoid CPU profiler warning print (#21556) · 932aca16
  由 wangchaochaohu 提交于 12月 05, 2019
```
* fix profiler warning message in cpu profile mode test=develop
```
  932aca16
04 12月, 2019 1 次提交
- P
  make config option DisableGlogInfo() able to mute all inference logs (#21318) · 122b37ce
  由 Pei Yang 提交于 12月 04, 2019
```
* make DisableGlogInfo able to mute all logs in inference. 
```
  122b37ce
03 12月, 2019 2 次提交
- Z
  NV jetson(nano, tx2, xavier) inference compile support (#21393) · c5f0293c
  由 Zhaolong Xing 提交于 12月 03, 2019
```
* add jeston compile support
test=develop

* refine the cmake
test=develop
```
  c5f0293c
- H
  Add warning message when initialize GLOG failed. (#21487) · a71f53d7
  由 Huihuang Zheng 提交于 12月 03, 2019
```
Add warning message when initialize GLOG failed
```
  a71f53d7
02 12月, 2019 1 次提交

fix -Wno-error=sign-compare warning in gcc8 (#21434) · 01fa4ead

由 Tao Luo 提交于 12月 02, 2019

* fix -Wno-error=sign-compare warning in gcc8

test=develop

* fix warning in distributed codes

test=develop

01fa4ead

01 12月, 2019 1 次提交
- J
  
  nhwc optimization for batchnorm (#21090) · 5e813b53
  由 Jie Fang 提交于 12月 01, 2019
  
  5e813b53
29 11月, 2019 1 次提交
- J
  
  [MKL-DNN] LRN and Pool2d (FWD) NHWC support (#21375) · cd43c444
  由 Jacek Czaja 提交于 11月 29, 2019
  
  cd43c444
28 11月, 2019 2 次提交
- W
  Profile refine (#21258) · 8293f21a
  由 wangchaochaohu 提交于 11月 28, 2019
```
* fix profile api high version test=develop
```
  8293f21a
- W
  
  fix the profiling bug test=develop (#21396) · e0e205ea
  由 wangchaochaohu 提交于 11月 28, 2019
  
  e0e205ea
25 11月, 2019 1 次提交
- Z
  
  remove warning LNK4006 and warning LNK4221 (#21226) · 345b67b5
  由 zhouwei25 提交于 11月 25, 2019
  
  345b67b5
24 11月, 2019 1 次提交
- G
  
  optimize nhwc for tensor core in ConvOp and ConvGradOp (#20597) · ed2a1852
  由 gongweibao 提交于 11月 24, 2019
  
  ed2a1852
18 11月, 2019 2 次提交

Fix warn of gcc8 (#21205) · cdb3d279

由 Zeng Jinle 提交于 11月 18, 2019

* fix warnings oof gcc 8 compilation, test=develop

* fix boost::bad_get, test=develop

* refine PADDLE_ENFORCE, test=develop

cdb3d279

fix sporadically hang issue on windows(#21201) · d8b6cf2b

由 liuwei1031 提交于 11月 18, 2019

cudaStreamSynchronize randomly hang when used in multi-thread environment, replace it with cudaStreamQuery API on windows

d8b6cf2b

14 11月, 2019 2 次提交

Improve topk performance. (#21087) · b93870e6

由 zhaoyuchen2018 提交于 11月 13, 2019

* Improve topk performance.

give 200000 data to compute topk,
before opt: cost 1s
after opt: cost 0.0028s.

* Refine return value.
* Add cuda util funtions.
* Fix ComputeBlockSize bug & refine comments.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

b93870e6

C

change cuda enforce & add example (#21142) · b3a3e6f6
由 Chen Weihang 提交于 11月 14, 2019

b3a3e6f6

13 11月, 2019 1 次提交
- C
  
  add examples for resource exhausted error, test=develop (#21140) · 27fa9c10
  由 Chen Weihang 提交于 11月 13, 2019
  
  27fa9c10
12 11月, 2019 1 次提交
- C
  Further simplify the C++ error info stack (#21093) · edd6680a
  由 Chen Weihang 提交于 11月 12, 2019
```
* simplify C++ error stack by rewrite Place, test=develop

* polish assignment overload func, test=develop
```
  edd6680a
08 11月, 2019 1 次提交

Add transpose2 INT8 for mkl-dnn (#19424) · 77c20835

由 joanna.wozna.intel 提交于 11月 08, 2019

* Add transpose2 INT8 for mkl-dnn

test=develop

* Fix test_transpose_int8_mkldnn

test=develop

* Revert "Merge branch 'develop' into transpose_int8_mkldnn_2"

This reverts commit 34011bdb, reversing
changes made to 2ce6473f.

* Revert "Revert "Merge branch 'develop' into transpose_int8_mkldnn_2""

This reverts commit 23754dd7.

* Add template to TransposeMKLDNNHandler

test=develop

* Resolve conflict

test=develop

* Restore get_size and refactor

test=develop

77c20835

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致