提交 · 07a4d8f8d6383db637a164c264f35cefea696c35 · Crayon鑫 / Paddle

15 8月, 2019 1 次提交
- A
  Add generalized Conv+Activation MKLDNN fuse pass creation (#19072) · b837689e
  由 Adam 提交于 8月 15, 2019
```
test=develop
```
  b837689e
12 8月, 2019 2 次提交
- G
  Polish fleet API to support cuda collective mode and nccl2 mode. (#18966) · 29d87812
  由 gongweibao 提交于 8月 12, 2019
```
Polish fleet API to support cuda collective mode and nccl2 mode
```
  29d87812
- W
  add tensorrt support for windows (#19084) · 80b7ef6f
  由 wopeizl 提交于 8月 12, 2019
```
* add tensorrt support for windows
```
  80b7ef6f
09 8月, 2019 1 次提交
- Z
  optimize error message for "embedding" and "cross_entropy" OP (#18765) · c2063217
  由 Zhang Ting 提交于 8月 09, 2019
```
* optimize error message, test=develop

* optimize error message, test=develop
```
  c2063217
05 8月, 2019 1 次提交

fix warpctc.dll not found issue (#18761) · a43a763b

由 liuwei1031 提交于 8月 05, 2019

* fix warpctc.dll not found issue, test=develop

* revert the linux platform change, test=develop

* delete warpctc_lib_path.h.in, test=develop

* add SetPySitePackagePath function

* fix warpctc.dylib not found issue on Mac, test=develop

* improve the paddle lib path setting logic, test=develop

* fix mac ci issue caused by test_warpctc_op unittest, test=develop

* tweak code, test=develop

a43a763b

01 8月, 2019 2 次提交

Fix gpu_info PADDLE_ENFORCE_GT when fraction_of_gpu_memory_to_use=1.0 (#18950) · 08fa98f7

由 Zeng Jinle 提交于 8月 01, 2019

* fix gpu_info, test=develop

* fix reserving gpu memory calculation bug, add fraction=1 unittest, test=develop

* fix bug again for reserving size, test=develop

08fa98f7

- Removed passing X from FWD to GRAD via device context (#18911) · 5cf2d385

由 Jacek Czaja 提交于 8月 01, 2019

test=develop

- Extracted key generation from FWD and GRAD into separate function

test=develop

- Compilation fix

test=develop

- another compilation

test=develop

5cf2d385

31 7月, 2019 1 次提交
- H
  GPU allocation uses fraction of available memory (#18896) · ea6ee76f
  由 Huihuang Zheng 提交于 7月 31, 2019
```
GPU allocation uses fraction of available memory, also fix the GetUsed without lock
```
  ea6ee76f
30 7月, 2019 1 次提交
- J
  [MKL-DNN] Fix int8 performance regression (#18758) · cfcb96d2
  由 Jacek Czaja 提交于 7月 30, 2019
```
test=develop

- optimization of TID to string

test=develop
```
  cfcb96d2
29 7月, 2019 1 次提交
- H
  
  Try to modify external gflags to solve CI compilation (#18872) · 0d3f16f5
  由 Huihuang Zheng 提交于 7月 29, 2019
  
  0d3f16f5
27 7月, 2019 1 次提交
- H
  Merge cuda 9/10 dockerfile with root dockerfile (#18693) · cfce4994
  由 Huihuang Zheng 提交于 7月 27, 2019
```
Also fix a dependency error which may cause compile error
```
  cfce4994
25 7月, 2019 1 次提交

change ComputeINT8 to template version to remove checking dst_datatype code (#18756) · 9ecd8ee7

由 lidanqing 提交于 7月 25, 2019

* change INT8 to template so that checking dst_dt with if-else could be removed. CI will be enabled after fixing reviews

* reverse user_residual_memory_p and user_bias_memory_p declaration scope
test=develop

9ecd8ee7

23 7月, 2019 2 次提交

[MKL-DNN] Extended LRN with reusing via Acquire API (#18675) · 95c1816e

由 Jacek Czaja 提交于 7月 23, 2019

test=develop

- compileation fix

- Yet another compilation fix

- Even yet another compilation fix

- Surprise! Again compilation fix

- lint fixes

test=develop

- Fix to workspace acquire of LRN

test=develop

- Fix to hash of BWD LRN

test=develop

- fix to lrn BWD PD acquire

test=develop

- Fixing LRN PD creation

test=develop

- cosmetic fix in comment

test=develop

- Fixes after review

test=develop

95c1816e

C
Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664) · fd3aad6c
由 chengduo 提交于 7月 23, 2019
```
* support sparse gradients
test=develop
```
fd3aad6c

19 7月, 2019 1 次提交
- J
  MKL-DNN upgrade to 0.20 (#18370) · 0d8e6c9b
  由 Jacek Czaja 提交于 7月 19, 2019
```
test=develop
```
  0d8e6c9b
18 7月, 2019 2 次提交

Optimize the content of error reporting information, print error code and... · 772e0956

由 zhouwei25 提交于 7月 18, 2019

Optimize the content of error reporting information, print error code and official document web sites (#18671)

optimize the error reporting information of cuda related API
index on develop: 130ac177 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into develop

772e0956

Feature/auto_growth_allocator (#18561) · ae58afc5

由 Zeng Jinle 提交于 7月 18, 2019

* feature/auto_growth_allocator, test=develop

* add unittest of AlignedAllocator, test=develop

* try to turn on auto_growth to test on CI, test=develop

* fix segmentation fault in mixed_vector.h, test=develop

* add unittests, test=develop

ae58afc5

16 7月, 2019 2 次提交

L

print out error code of cudaGetDeviceProperties if failed (#18643) · 75953096
由 liuwei1031 提交于 7月 16, 2019

75953096

[MKL-DNN] Reimplemented pool2d mkl-dnn to use Acquire API (#18585) · 71d883b8

由 Jacek Czaja 提交于 7月 16, 2019

* - Added partial draft of pooling acquire

- Workspace support

- compilation fix

- Added draft of pooling backward reimplementation

- Segfault fix

- reverted 'any' for diff_dst crewation in pooling

- Lint fixes

test=develop

- lint fixes

test=develop

- Further lint fixes

test=develop

* - Fixes after review

test=develop

* - Lint fixes

test=develop

* - Even more lint fixes

test=develop

71d883b8

11 7月, 2019 2 次提交
- T
  add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy (#18580) · 076f8331
  由 Tao Luo 提交于 7月 11, 2019
```
* add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy

test=develop

* enhance MkldnnPostReset

test=develop

* add comments for mkldnn_cache_capacity field

test=develop
```
  076f8331
- G
  
  Polish backwards optimizer dependency codes and use more default values. (#18255) · c0a82748
  由 gongweibao 提交于 7月 11, 2019
  
  c0a82748
10 7月, 2019 2 次提交
- Z
  Clean unused code of dim and place (#18565) · be24e5b3
  由 Zeng Jinle 提交于 7月 10, 2019
```
* clean code of dim and place, test=develop

* fix failed unittests, test=develop
```
  be24e5b3
- J
  
  Activations MKLDNN ops refactoring (#18191) · 8869d7f7
  由 Jacek Czaja 提交于 7月 10, 2019
  
  8869d7f7
09 7月, 2019 2 次提交
- J
  Fix/gcc 4.8 ubt link error (#18558) · 667f88f9
  由 Jiabin Yang 提交于 7月 09, 2019
```
* test=develop, fix docker with paddle nccl problem

* test=develop, fix/gcc_4.8_ubt_link_error

* test=develop, fix code format
```
  667f88f9
- P
  
  Add mkldnn int8 mul-op kernel (#17834) · 0caa08ea
  由 Physher 提交于 7月 09, 2019
  
  0caa08ea
08 7月, 2019 1 次提交

add mkldnn shapeblob cache clear strategy (#18513) · fe32879d

由 Tao Luo 提交于 7月 08, 2019

* add mkldnn shapeblob cache clear strategy

test=develop

* refine with comments

test=develop

* make cache clear strategy more safey

test=develop

* add lock for GetShapeBlobSize

test=develop

fe32879d

04 7月, 2019 1 次提交
- C
  Enhance execution error info (#18482) · 55baeced
  由 chengduo 提交于 7月 04, 2019
```
* enhance execution error info
test=develop
```
  55baeced
03 7月, 2019 1 次提交
- T
  add shape_blob for cache mkldnn primitive (#18454) · 3f3112ce
  由 Tao Luo 提交于 7月 03, 2019
```
test=develop
```
  3f3112ce
02 7月, 2019 2 次提交

rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() (#18453) · 8f5fffca

由 Leo Zhao 提交于 7月 02, 2019

* rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id()

test=develop

* update session id definition and adjust logic for default behavior

test=develop

* reset logic in mkldnn reuse as most of cases work in default.

test=develop

8f5fffca

supports collective training with programs (#18392) · a873fa84

由 Yi Liu 提交于 7月 02, 2019

1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis

a873fa84

01 7月, 2019 1 次提交

Fix bug in quantize kernel which cause crash in vgg16/19 model (#17964) · 4bc2987d

由 Brian Liu 提交于 7月 01, 2019

* Fix bug in quantize kernel which cause crash in vgg16/19 model

test=develop

* refine the code to reduce verbose code; test=develop

* remove useless code; test=develop

4bc2987d

28 6月, 2019 1 次提交

Fix potential mkldnn concat/pool/conv kernel issues (#18393) · 681d3553

由 Leo Zhao 提交于 6月 28, 2019

1. some key generation method is not aligned with PR#17965
2. enlarge ptr lifetime to avoid memory release if SetBlob fails
   otherwise it will get core dump.

test=develop

681d3553

27 6月, 2019 4 次提交

H
add dependecy of collective_helper (#18365) · 9931bc64
由 HaoRen 提交于 6月 27, 2019
```
* add dependecy of collective_helper

* test=develop
fix dependecy of collective_helper
```
9931bc64
M
Reset DeviceContext after quantization warmup (#18182) · 84096932
由 Michał Gallus 提交于 6月 27, 2019
```
test=develop
```
84096932

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

[MKL-DNN] Extending reusing to Elementwise_add_mkldnn op (#18146) · c2efdfd5

由 Jacek Czaja 提交于 6月 27, 2019

* - Reusing of reuder used in elementwise_add_mkldnn

- Added MKL-DNN sum prim reusing

test=develop

- Compilation fixes

test=develop

- Yet another compilation fix

test=develop

- Yet another compilation fix

test=develo

- Yet another linking fix

test=develop

- Final compilation fix

test=develop

- lint fixes

test=develop

- Lint fixes

test=develop

* - Fixes after review

test=develop

c2efdfd5

18 6月, 2019 1 次提交
- C
  Remove nccl dep when the number of GPU is 1 (#18158) · 4978db2c
  由 chengduo 提交于 6月 18, 2019
```
* remove nccl dep when the number of GPU is 1
test=develop
```
  4978db2c
14 6月, 2019 1 次提交
- G
  
  Fix reinitialized ncclid error! (#18025) · f5caf344
  由 gongweibao 提交于 6月 14, 2019
  
  f5caf344
11 6月, 2019 2 次提交

[MKL-DNN] Thread-Safety for MKL-DNN reusing Part 1 (#17965) · 84bb45c0

由 Jacek Czaja 提交于 6月 11, 2019

* - removed is_reusing_

* - Added TID to keys for reusing apart from softmax PD

* - compilation fix

* - Yet another compilation fix

* - Batch Norm and Conv adapted

* - Fix to softmax MT

* - Fixes to MT code of MKL-DNN

* - Lint fixes

test=develop

84bb45c0

Pipeline Concurrency (#17402) · 969e6378

由 hutuxian 提交于 6月 11, 2019

Add Pipeline Concurrency Train Mode:
- Cpp: pipeline_trainer & section_worker
- Python: PipelineOptimizer
- Add a new data_feed type: PrivateInstantDataFeed
- Add a test demo of pipeline trainer and the test model is gnn
- Do not support win32 now

969e6378

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致