提交 · 2fe0758f67d1ae5de4d96d39403d24311782ac5a · BaiXuePrincess / Paddle

31 3月, 2020 1 次提交

fix nccl comm double free bug (#23344) · 0471476a

由 Yi Liu 提交于 3月 31, 2020

As nccl comm is not created by CUDADeviceContext, it should be destroyed by the creator as the best practice of RAII.

0471476a

30 3月, 2020 1 次提交
- 石
  
  supports thread-binding stream, test=develop (#23177) · 75ebb48a
  由石晓伟提交于 3月 30, 2020
  
  75ebb48a
05 2月, 2020 1 次提交

add WITH_NCCL option for cmake. (#22384) · 7bc4b095

由 Wilber 提交于 2月 05, 2020

cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL

添加了PADDLE_WITH_NCCL定义

单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

7bc4b095

08 1月, 2020 1 次提交

Refine stack op to improve xlnet performance, test=develop (#22142) · 3d4f2aa6

由 zhaoyuchen2018 提交于 1月 08, 2020

stack's wait cost a lot of cpu time, use cuda kernel to do memory copy
will reduce cpu time.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

3d4f2aa6

10 12月, 2019 1 次提交

MKL-DNN 1.0 Update (#20162) · e81f0228

由 Adam 提交于 12月 10, 2019

* MKLDNN v1.0 rebase to Paddle 1.6
test=develop

* Add hacky paddle::string::to_string() implementation

* vectorize<int64-t>() -> vectorize() cleanup
test=develop

* PADDLE_ENFORCE and void_cast fixes
test=develop

* Rebase changes
test=develop

* Cosmetics
test=develop

* Delete MKL from mkldnn.cmake
test=develop

* CMake debug commands
test=develop

* Delete MKLDNN_VERBOSE and rebase fixes
test=develop

* Rebase fixes
test=develop

* Temporarily disable int8 resnet101 vgg16 and vgg19 tests
test=develop

* Add libmkldnn.so.1 to python setup
test=develop

* Add libmkldnn.so.1 to inference_lib cmake after rebase
test=develop

* Post rebase fixes + FC int8 changes
test=develop

* Fix LRN NHWC
test=develop

* Fix NHWC conv3d
test=develop

* Windows build fix + next conv3d fix
test=develop

* Fix conv2d on AVX2 machines
test=develop

e81f0228

06 12月, 2019 1 次提交
- Z
  
  refine dev_ctx.Wait() exception throw, test=develop (#21600) · 97e76cb9
  由 Zeng Jinle 提交于 12月 06, 2019
  
  97e76cb9
29 11月, 2019 1 次提交
- J
  
  [MKL-DNN] LRN and Pool2d (FWD) NHWC support (#21375) · cd43c444
  由 Jacek Czaja 提交于 11月 29, 2019
  
  cd43c444
18 11月, 2019 1 次提交

fix sporadically hang issue on windows(#21201) · d8b6cf2b

由 liuwei1031 提交于 11月 18, 2019

cudaStreamSynchronize randomly hang when used in multi-thread environment, replace it with cudaStreamQuery API on windows

d8b6cf2b

14 11月, 2019 1 次提交

Improve topk performance. (#21087) · b93870e6

由 zhaoyuchen2018 提交于 11月 13, 2019

* Improve topk performance.

give 200000 data to compute topk,
before opt: cost 1s
after opt: cost 0.0028s.

* Refine return value.
* Add cuda util funtions.
* Fix ComputeBlockSize bug & refine comments.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

b93870e6

24 9月, 2019 1 次提交
- Z
  
  fix cuda dev_ctx allocator cmake deps, test=develop (#19953) · 37f76407
  由 Zeng Jinle 提交于 9月 24, 2019
  
  37f76407
22 9月, 2019 1 次提交

Add lock to cudnn handle calls (#19845) · c7f36e7c

由 Zeng Jinle 提交于 9月 22, 2019

* refine reallocate of workspace size, test=develop

* add lock to cudnn handle calls, test=develop

c7f36e7c

11 9月, 2019 1 次提交

Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320

由 Huihuang Zheng 提交于 9月 11, 2019

TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.

We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.

Also added data_feed_proto to operator to fix CI in CPU compilation

12542320

12 8月, 2019 1 次提交
- G
  Polish fleet API to support cuda collective mode and nccl2 mode. (#18966) · 29d87812
  由 gongweibao 提交于 8月 12, 2019
```
Polish fleet API to support cuda collective mode and nccl2 mode
```
  29d87812
11 7月, 2019 1 次提交

add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy (#18580) · 076f8331

由 Tao Luo 提交于 7月 11, 2019

* add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy

test=develop

* enhance MkldnnPostReset

test=develop

* add comments for mkldnn_cache_capacity field

test=develop

076f8331

08 7月, 2019 1 次提交

add mkldnn shapeblob cache clear strategy (#18513) · fe32879d

由 Tao Luo 提交于 7月 08, 2019

* add mkldnn shapeblob cache clear strategy

test=develop

* refine with comments

test=develop

* make cache clear strategy more safey

test=develop

* add lock for GetShapeBlobSize

test=develop

fe32879d

03 7月, 2019 1 次提交
- T
  add shape_blob for cache mkldnn primitive (#18454) · 3f3112ce
  由 Tao Luo 提交于 7月 03, 2019
```
test=develop
```
  3f3112ce
02 7月, 2019 1 次提交

rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() (#18453) · 8f5fffca

由 Leo Zhao 提交于 7月 02, 2019

* rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id()

test=develop

* update session id definition and adjust logic for default behavior

test=develop

* reset logic in mkldnn reuse as most of cases work in default.

test=develop

8f5fffca

27 6月, 2019 1 次提交
- M
  Reset DeviceContext after quantization warmup (#18182) · 84096932
  由 Michał Gallus 提交于 6月 27, 2019
```
test=develop
```
  84096932
18 6月, 2019 1 次提交
- C
  Remove nccl dep when the number of GPU is 1 (#18158) · 4978db2c
  由 chengduo 提交于 6月 18, 2019
```
* remove nccl dep when the number of GPU is 1
test=develop
```
  4978db2c
10 6月, 2019 1 次提交
- Z
  Remove attribute in Allocator::Allocate (#17878) · 3ece61f7
  由 Zeng Jinle 提交于 6月 10, 2019
```
* remove attribute in Allocator::Allocate, test=develop

* fix travis ci error, test=develop
```
  3ece61f7
07 6月, 2019 1 次提交
- Z
  Fix cuda/cudnn version detection error (#17853) · 3925bd81
  由 Zeng Jinle 提交于 6月 07, 2019
```
* fix cuda/cudnn version detection error, test=develop

* fix again, test=develop
```
  3925bd81
28 3月, 2019 1 次提交
- G
  
  Add DGC(Deep Gradient Compression) interface. (#15841) · eb83abea
  由 gongweibao 提交于 3月 28, 2019
  
  eb83abea
25 3月, 2019 1 次提交
- N
  fix ci bug: cudnn handler in multi card · a1d11bb1
  由 nhzlx 提交于 3月 25, 2019
```
test=develop
```
  a1d11bb1
20 3月, 2019 1 次提交
- N
  
  git cherry-pick from feature/anakin-engine: update anakin subgraph #16278 · 07dcf285
  由 nhzlx 提交于 3月 20, 2019
  
  07dcf285
19 3月, 2019 1 次提交
- Z
  add allocator flags · 22715487
  由 zhhsplendid 提交于 3月 19, 2019
```
test=develop
```
  22715487
16 3月, 2019 1 次提交
- Q
  Fix windows compiling (#16230) · 86e912c5
  由 qingqing01 提交于 3月 16, 2019
```
test=develop
```
  86e912c5
15 3月, 2019 1 次提交

Support sync batch norm. (#16121) · 8ad672a2

由 qingqing01 提交于 3月 15, 2019

* Support Sync Batch Norm.
* Note, do not enable it in one device.

Usage:

build_strategy = fluid.BuildStrategy()
build_strategy.sync_batch_norm = True
binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
        loss_name=loss_mean.name,
        build_strategy=build_strategy)

8ad672a2

22 2月, 2019 1 次提交
- S
  Change *(smart_ptr.get()) -> *smart_ptr · 74672d1a
  由 Sylwester Fraczek 提交于 2月 07, 2019
```
reason: dereferencing smart pointer is the same as the underlying pointer
test=develop
```
  74672d1a
19 2月, 2019 1 次提交
- S
  fix many warning · 209b3557
  由 sneaxiy 提交于 2月 19, 2019
```
test=develop
```
  209b3557
16 1月, 2019 1 次提交
- M
  
  Add single GPU support to imperative · 315b133e
  由 minqiyang 提交于 1月 16, 2019
  
  315b133e
11 1月, 2019 3 次提交

C
fix thread safe bug · c4eced98
由 chengduozh 提交于 1月 11, 2019
```
test=develop
```
c4eced98
C
Revert "Remove workspace_handle in conv_cudnn (#15186)" · 358e657f
由 chengduozh 提交于 1月 11, 2019
```
test=develop
This reverts commit 064512aa.
```
358e657f

Remove workspace_handle in conv_cudnn (#15186) · 064512aa

由 chengduo 提交于 1月 10, 2019

* remove workspace_handle in conv2d_cudnn
test=develop

* remove workspace_handle
test=develop

* fix bug
test=develop

* make test_conv2d_op SERIAL
test=develop

* save memory in conv_cudnn
test=develop

* enhance thread safety
test=develop

* enhance temporary allocator
test=develop

* Add excess fraction
test=develop

* follow comments
test=develop

* fix bug and code refine
test=develop

* fix memory size check
test=develop

* rename reuse_tmp_allocation_excess_fraction
test=develop

064512aa

08 1月, 2019 2 次提交
- S
  Revert "Revert "Remove op handle lock"" · ed409ac9
  由 sneaxiy 提交于 1月 08, 2019
```
test=develop
```
  ed409ac9
- Z
  Revert "Remove op handle lock" · dacfaaa9
  由 Zeng Jinle 提交于 1月 08, 2019
```
test=develop
```
  dacfaaa9
07 1月, 2019 1 次提交
- S
  
  fix_cudnn_compatible_check · 9793a0b6
  由 sneaxiy 提交于 1月 07, 2019
  
  9793a0b6
02 1月, 2019 1 次提交
- S
  remove_op_handle_lock · d0a8a1e9
  由 sneaxiy 提交于 1月 02, 2019
```
test=develop
```
  d0a8a1e9
29 12月, 2018 1 次提交
- S
  remove tensor core lock · d25395fc
  由 sneaxiy 提交于 12月 29, 2018
```
test=develop
```
  d25395fc
25 12月, 2018 1 次提交

Move GetTensor to tensor_util (#15011) · b9fb03cf

由 chengduo 提交于 12月 25, 2018

* refine tensor
test=develop

* refine tensor
test=develop

* fix device_context log
test=develop

b9fb03cf

21 12月, 2018 1 次提交

[Feature] Add Temporary Allocator (#14875) · 79bd6dfa

由 chengduo 提交于 12月 21, 2018

* Add Temporal Allocator

* add Temporay Allocator to DeviceContext
test=develop

* code refine
test=develop

* fix mean_iou
test=develop

* Add DeviceTemporaryAllocator
test=develop

* fix conv_op bug
test=develop

* small fix
test=develop

* code refine
test=develop

* log refine
test=develop

* fix unit test
test=develop

* move double check

* refine concat_and_split
test=develop

* add limit_of_temporary_allocation
test=develop

* fix name
test=develop

79bd6dfa

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致