提交 · 3a59a7a11faecf9bbfcd0d4651e47c39d3e8eee2 · BaiXuePrincess / Paddle

07 2月, 2020 1 次提交

Enable the detection of subgraph composed of grad ops (#21223) · dcfb6038

由 Yiqun Liu 提交于 2月 07, 2020

* Add the first implememtation of fusion_group op #19621 (#3)

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

* Add DeviceCodePool to manage all device codes.

* Add the first implementation fusion_group op.

* Add unit-test for fusion_group op.

* Add the check of result.

* Add the check of nvrtc in unit-test.
test=develop

* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop

* Disable fusion_group op for mac and windows.
test=develop

* Make the compiling of device code return status instead of hanging up.
test=develop

* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.

* Unify fusion_group_op's input and output names.
test=develop

* Add the check of CUDA driver library in unittest.
test=develop

* Enable generating code for a given subgraph. #21126 (#4)

* Enable generating code for a given subgraph.

* Support sorting the subgraph.

* Remove the rearange of expressions because we use the sorted subgraph directly.

* Enable generating code for a subgraph which is composed of grad ops.

* Use expression information to check the accuracy in unittest.

* Separate load and store from computation expressions.
test=develop

* Improve the loading statements in generated codes.
test=develop

* Remove unused arguments from formal list.
test=develop

* Enable the detection of subgraph of grad ops.

* Generate code for detected subgraph in fusion_group_pass.

* Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
test=develop

* Fix a bug when checking whether the shape of all inputs are the same.

* Add debug information.

* Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5)

test=develop

* Call subgraph_detector in fusion_group pass.
test=develop

* Disable fusion_group when WITH_GPU is OFF.
test=develop

* Refine all PADDLE_ENFORCE message.
test=develop

* Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
test=develop

* Follow review comments.
test=develop

dcfb6038

10 1月, 2020 1 次提交

Add bn and relu fuse pass (#22048) · 46189b16

由 Zhen Wang 提交于 1月 10, 2020

* add bn and relu fuse pass

* add op attr assert and dtype assert

* fix some inputs&&outputs bugs for the fused op and pattern.

* add the unittest for fuse_bn_act_pass. test=develop

* use normative enforce statements. test=develop

* add the cpu test. test=develop

* add the support of batch_size=1 for the bn with relu op. test=develop

* add the error type for paddle throws. test=develop

* add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop

46189b16

13 11月, 2019 1 次提交

Add examples for error message writing specification - PreconditionNotMet,... · 8414575b

由 Chen Weihang 提交于 11月 13, 2019

Add examples for error message writing specification - PreconditionNotMet, Unimplemented, Unavailable (#21137)

* add examples for error spec, test=develop

* change ENFORCE to ENFORCE_**, test=develop

8414575b

16 9月, 2019 1 次提交
- C
  Fix warning info of build_strategy (#19805) · 82814970
  由 chengduo 提交于 9月 16, 2019
```
* fix warning info
test=develop

* fix bug of all_reduce_deps_pass
test=develop
```
  82814970
13 9月, 2019 1 次提交

Open fuse all reduce option (#19765) · 056fdedd

由 chengduo 提交于 9月 13, 2019

* Open fuse all reduce op
test=develop

* Add Fuse optimization op log

* Add log in fuse_optimizer op pass and fuse all_reduce op pass

* replace with boost::optional<bool>
test=develop

* Polish code
test=develop

* fix code coverage
test=develop

056fdedd

11 9月, 2019 2 次提交
- C
  Open fuse broadcast option (#18833) · e506c99c
  由 chengduo 提交于 9月 11, 2019
```
* fix vlog level and fuse option type
test=develop
```
  e506c99c
- C
  Enable fused_all_reduce_op_handle support GPU and CPU Gradients (#19418) · 5866a7a5
  由 chengduo 提交于 9月 11, 2019
```
* Enable fused_all_reduce_op_handle support GPU and CPU Gradients
```
  5866a7a5
04 9月, 2019 1 次提交

Enable ngraph through build_strategy (#19266) · a3a4b6e5

由 baojun 提交于 9月 04, 2019

* enable ngraph throught build_strategy test=develop

* add unittest test=develop

* put use_ngraph unconditional test=develop

* remove paddle_enforce test=develop

* remove paddle_enforce test=develop

* fix copyright test=develop

* limit for ngraph only test=develop

a3a4b6e5

02 8月, 2019 1 次提交
- C
  Disable fuse optimization option (#18924) · e7da0940
  由 chengduo 提交于 8月 02, 2019
```
* Disable fuse optimization
test=develop
```
  e7da0940
29 7月, 2019 1 次提交

Remove legacy C++ memory optimization codes (#18834) · 8008ab4e

由 Zeng Jinle 提交于 7月 29, 2019

* remove legacy memory optimization codes, test=develop

* follow huihuang's comments,test=develop

* follow luotao's comments, test=develop

8008ab4e

27 7月, 2019 1 次提交
- C
  Open fuse optimization ops (#18741) · 4140fe11
  由 chengduo 提交于 7月 27, 2019
```
* open fuse optimization ops
test=develop
```
  4140fe11
26 7月, 2019 1 次提交

Feature/mem opt pass refactor (#18735) · a802da65

由 Zeng Jinle 提交于 7月 26, 2019

* first version memory optimize pass, test=develop

* remove move_tensor_sharing_pass, test=develop

* refine code comments, add unittests, test=develop

* turn off memory_optimize by default, test=develop

* follow huihuang's comments, test=develop

* follow chengduoZH's comments, test=develop

* fix grammar error, add const qualifier, fix pass_test exception message, test=develop

* follow chengduoZH's comments 2nd, test=develop

a802da65

23 7月, 2019 1 次提交
- C
  Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664) · fd3aad6c
  由 chengduo 提交于 7月 23, 2019
```
* support sparse gradients
test=develop
```
  fd3aad6c
11 7月, 2019 2 次提交

G

Polish backwards optimizer dependency codes and use more default values. (#18255) · c0a82748
由 gongweibao 提交于 7月 11, 2019

c0a82748

Feature/buffer_shared_inplace (#17911) · d3003a16

由 Zeng Jinle 提交于 7月 11, 2019

* feature/buffer_shared_inplace, test=develop

* refine code, test=develop

* fix elementwise_add op cpu inplace and sum inplace bug, test=develop

* add unittest and debug log, test=develop

* fix parallel_executor scope bug, polish code, test=develop

* fix sum op, activation op, single_in_place_inference bug, test=develop

* remove kLocalExecScopeName, test=develop

* fix unittest,test=develop

* fix out_var first version bug, test=develop

* follow comments,test=develop

d3003a16

14 6月, 2019 1 次提交
- G
  
  Fix reinitialized ncclid error! (#18025) · f5caf344
  由 gongweibao 提交于 6月 14, 2019
  
  f5caf344
06 6月, 2019 1 次提交
- G
  
  Add backward and optimizer operator dependency pass. (#17746) · fbbdc9cc
  由 gongweibao 提交于 6月 06, 2019
  
  fbbdc9cc
27 5月, 2019 1 次提交
- G
  
  Add multi-ncclcomm and 2D ncclallreduce support. (#17263) · 65bbf950
  由 gongweibao 提交于 5月 27, 2019
  
  65bbf950
20 5月, 2019 1 次提交
- T
  remove unused expected_kernel_cache_pass (#17486) · 32da5e9c
  由 Tao Luo 提交于 5月 20, 2019
```
test=develop
```
  32da5e9c
14 5月, 2019 1 次提交

make parallel_executor support FLAGS_use_mkldnn (#17341) · 68ec0a6f

由 Tao Luo 提交于 5月 14, 2019

* make parallel_executor support FLAGS_use_mkldnn

test=develop

* add warning when set mkldnn_enabled_op_types_ in non-mkldnn env

test=develop

68ec0a6f

08 5月, 2019 1 次提交
- C
  Code Clean: Move all pass to paddle::framework::ir (#17228) · 04bd413a
  由 chengduo 提交于 5月 08, 2019
```
* move pass to ir

* polish code
test=develop

* fix dependency
test=develop
```
  04bd413a
06 5月, 2019 1 次提交

Add use_cuda to inplace pass (#17205) · ee2028a1

由 Zeng Jinle 提交于 5月 05, 2019

* add use_cuda to inplace pass,test=develop

* add test softmax_with_xe_inplace test,test=develop

ee2028a1

23 4月, 2019 1 次提交
- C
  Add fuse momenutum ops (#16745) · a2be4b4d
  由 chengduo 提交于 4月 23, 2019
```
* Add fuse momenutum ops
```
  a2be4b4d
21 4月, 2019 1 次提交

Refine model gpu memory (#16993) · 1202d3fc

由 Zeng Jinle 提交于 4月 21, 2019

* speedup gc and inplace softmax_with_cross_entropy_grad
test=develop

* refine models gpu mem
Merge skip vars and warning messages of mem opt
remove relu mem opt
test=develop

* follow comments
test=develop

1202d3fc

12 4月, 2019 1 次提交
- C
  Refine Fuse Optimize Ops (#16810) · e9409665
  由 chengduo 提交于 4月 12, 2019
```
* fix bug of fuse optimize ops
```
  e9409665
11 4月, 2019 1 次提交

Add an option to enable the cache of expected kernel in train phase. (#16724) · 112f1614

由 Yiqun Liu 提交于 4月 11, 2019

* Add an option to enable the cache of expected kernel in train phase.
test=develop

* Change the default value of cache_expected_kernel to true.

112f1614

08 4月, 2019 2 次提交
- G
  
  Fix DGC bug. (#16697) · 8b793d0e
  由 gongweibao 提交于 4月 08, 2019
  
  8b793d0e
- Y
  Enable the runtime_context_cache pass in train phase (#16640) · 3fe8cb0d
  由 Yiqun Liu 提交于 4月 08, 2019
```
* Try to enable the runtime_context_cache pass in train phase.

* Put the append of runtime_context_cache pass ahead of multi_dev passes.
test=develop
```
  3fe8cb0d
03 4月, 2019 1 次提交
- C
  
  Fix the bug of AllReduceDepPass (#16393) · ea2a2f77
  由 chengduo 提交于 4月 02, 2019
  
  ea2a2f77
28 3月, 2019 2 次提交

C
Fuse Adam And SGD ops (#15933) · 1096746c
由 chengduo 提交于 3月 28, 2019
```
* fuse optimizer
```
1096746c

Fix the interface of Pass::Apply (#16484) · ed61d67c

由 chengduo 提交于 3月 27, 2019

* modify the interface of Pass::Allay
test=develop

* Polish code
test=develop

* Fix Travis CI
test=develop

* fix Pass::Apply interface
test=develop

* Fix Travis CI
test=develop

ed61d67c

22 3月, 2019 1 次提交

[Speed]Refine ParallelExecutor (#16190) · a6a3b2fb

由 chengduo 提交于 3月 22, 2019

* refine parallelExecutor
test=develop

* Polish op_handle
test=develop

* Remove unnecessary op_handle
test=develop

* Fix Travis CI
test=develop

* Fix fetch bug
test=develop

* Remove WaitInputVarGenerated

* Fix OpHandleBase::Run
test=develop

* debug
test=develop

* use origin fetch_op_handle
test=develop

* Revert op_handle_base.cc
test=develop

* Polish code
test=develop

* Fix OpHandleBase::Run
test=develop

* code refine

* test CI and CE
test=develop

* fix OpHandle::Run
test=develop

* refine AllReduceOpHandle
test=develop

* Polish code
test=develop

a6a3b2fb

20 3月, 2019 1 次提交

Fuse AllReduce (#15921) · f26ba5bd

由 chengduo 提交于 3月 19, 2019

* fuse all_reduce
test=develop

* add fuse_parameter_groups_size
test=develop

* Polish code
test=develop

* Fix travis-ci
test=develop

* Add SetGroupAccordingToLayers and SetGroupAccordingToGroupSize
test=develop

* Add SetGroupAccordingToMemorySize
test=develop

* fix multi_devices_graph
test=develop

* reset params_grads
test=develop

* Polish code
test=develop

f26ba5bd

15 3月, 2019 1 次提交

Support sync batch norm. (#16121) · 8ad672a2

由 qingqing01 提交于 3月 15, 2019

* Support Sync Batch Norm.
* Note, do not enable it in one device.

Usage:

build_strategy = fluid.BuildStrategy()
build_strategy.sync_batch_norm = True
binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
        loss_name=loss_mean.name,
        build_strategy=build_strategy)

8ad672a2

07 3月, 2019 1 次提交
- Q
  
  fix compile problem · 446fdf95
  由 Qiao Longfei 提交于 3月 07, 2019
  
  446fdf95
05 3月, 2019 1 次提交
- Q
  
  code format test=develop · 4e218dab
  由 Qiao Longfei 提交于 3月 05, 2019
  
  4e218dab
23 2月, 2019 1 次提交
- Q
  
  refine code test=develop · 2b7931d5
  由 Qiao Longfei 提交于 2月 23, 2019
  
  2b7931d5
22 2月, 2019 2 次提交
- X
  polish · 19d78f67
  由 Xin Pan 提交于 2月 22, 2019
```
test=develop
```
  19d78f67
- X
  resolve conflicts · 32d5a160
  由 Xin Pan 提交于 2月 22, 2019
```
test=develop
```
  32d5a160
21 2月, 2019 1 次提交
- X
  allow compiler to use graph · 26e32e09
  由 Xin Pan 提交于 1月 17, 2019
```
test=develop
```
  26e32e09

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致