提交 · 847aa172ae650a05087d0ced0260b5cb7229f8ca · BaiXuePrincess / Paddle

29 12月, 2020 1 次提交

[Kunlun] 2.0 cherry-pick:Support for Baidu Kunlun XPU multi card training (#29713) · 847aa172

由 liuyuhui 提交于 12月 29, 2020

* [Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)

* [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)

* [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29926)

* add bkcl.so in whl for kunlun (#29947)

* [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29961)
Co-authored-by: NQingshuChen <qingshu.chen714@gmail.com>

847aa172

27 10月, 2020 1 次提交
- Z
  add Fuse bn add act pass (#28196) · fdc06f21
  由 Zhang Ting 提交于 10月 27, 2020
```
* add fuse_bn_add_act pass
```
  fdc06f21
24 9月, 2020 1 次提交

use iwyu clean include (#27267) · df43905f

由 wanghuancoder 提交于 9月 24, 2020

* use iwyu clean include, test=develop, test=win

* compilation error, test=develop

* fix compilation error2, test=develop

* fix compilation error3, test=develop

* fix compilation error4, test=develop

* fix compilation error5, test=develop

* fix compilation error6, test=develop

* fix compilation error7, test=develop

* fix compilation error8, test=develop

* fix compilation error8, test=develop

* fix compilation error10, test=develop

* fix compilation error11, test=develop

df43905f

21 9月, 2020 1 次提交

[Feature] Enhance inplace addto strategy for gradient accumulation in static graph (#27112) · aba759ba

由 Leo Chen 提交于 9月 21, 2020

* support use add instead of sum to do gradient accumulation

* add inplace addto pass

* add grad_add op and inplace addto pass

* remove debug code

* code refine

* fix bug when sereral sum ops inserts at same op_idx

* fix Flags type

* add addto attribute for conv3d

* fix ut

* code clean

* fix type

aba759ba

23 2月, 2020 1 次提交
- T
  
  fix typo words (#22653) · d2ba91aa
  由 tianshuo78520a 提交于 2月 23, 2020
  
  d2ba91aa
11 2月, 2020 1 次提交

Compile without nccl deps. [1/2] (#22509) · a90fa540

由 Wilber 提交于 2月 11, 2020

支持不依赖nccl进行编译。[1/2]

多卡下，如果没有打开WITH_NCCL开关编译，多卡不能通信，则只能选择一张卡使用。
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

a90fa540

07 2月, 2020 1 次提交

Enable the detection of subgraph composed of grad ops (#21223) · dcfb6038

由 Yiqun Liu 提交于 2月 07, 2020

* Add the first implememtation of fusion_group op #19621 (#3)

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

* Add DeviceCodePool to manage all device codes.

* Add the first implementation fusion_group op.

* Add unit-test for fusion_group op.

* Add the check of result.

* Add the check of nvrtc in unit-test.
test=develop

* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop

* Disable fusion_group op for mac and windows.
test=develop

* Make the compiling of device code return status instead of hanging up.
test=develop

* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.

* Unify fusion_group_op's input and output names.
test=develop

* Add the check of CUDA driver library in unittest.
test=develop

* Enable generating code for a given subgraph. #21126 (#4)

* Enable generating code for a given subgraph.

* Support sorting the subgraph.

* Remove the rearange of expressions because we use the sorted subgraph directly.

* Enable generating code for a subgraph which is composed of grad ops.

* Use expression information to check the accuracy in unittest.

* Separate load and store from computation expressions.
test=develop

* Improve the loading statements in generated codes.
test=develop

* Remove unused arguments from formal list.
test=develop

* Enable the detection of subgraph of grad ops.

* Generate code for detected subgraph in fusion_group_pass.

* Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
test=develop

* Fix a bug when checking whether the shape of all inputs are the same.

* Add debug information.

* Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5)

test=develop

* Call subgraph_detector in fusion_group pass.
test=develop

* Disable fusion_group when WITH_GPU is OFF.
test=develop

* Refine all PADDLE_ENFORCE message.
test=develop

* Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
test=develop

* Follow review comments.
test=develop

dcfb6038

10 1月, 2020 1 次提交

Add bn and relu fuse pass (#22048) · 46189b16

由 Zhen Wang 提交于 1月 10, 2020

* add bn and relu fuse pass

* add op attr assert and dtype assert

* fix some inputs&&outputs bugs for the fused op and pattern.

* add the unittest for fuse_bn_act_pass. test=develop

* use normative enforce statements. test=develop

* add the cpu test. test=develop

* add the support of batch_size=1 for the bn with relu op. test=develop

* add the error type for paddle throws. test=develop

* add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop

46189b16

26 9月, 2019 1 次提交
- C
  disable fuse_all_optimizer_ops (#19966) · 2450d15b
  由 chengduo 提交于 9月 26, 2019
```
test=develop
```
  2450d15b
13 9月, 2019 1 次提交

Open fuse all reduce option (#19765) · 056fdedd

由 chengduo 提交于 9月 13, 2019

* Open fuse all reduce op
test=develop

* Add Fuse optimization op log

* Add log in fuse_optimizer op pass and fuse all_reduce op pass

* replace with boost::optional<bool>
test=develop

* Polish code
test=develop

* fix code coverage
test=develop

056fdedd

11 9月, 2019 1 次提交
- C
  Open fuse broadcast option (#18833) · e506c99c
  由 chengduo 提交于 9月 11, 2019
```
* fix vlog level and fuse option type
test=develop
```
  e506c99c
12 8月, 2019 1 次提交
- C
  open fuse_all_optimizer_ops (#19087) · e044e842
  由 chengduo 提交于 8月 12, 2019
```
test=develop
```
  e044e842
02 8月, 2019 1 次提交
- C
  Disable fuse optimization option (#18924) · e7da0940
  由 chengduo 提交于 8月 02, 2019
```
* Disable fuse optimization
test=develop
```
  e7da0940
29 7月, 2019 1 次提交

Remove legacy C++ memory optimization codes (#18834) · 8008ab4e

由 Zeng Jinle 提交于 7月 29, 2019

* remove legacy memory optimization codes, test=develop

* follow huihuang's comments,test=develop

* follow luotao's comments, test=develop

8008ab4e

27 7月, 2019 1 次提交
- C
  Open fuse optimization ops (#18741) · 4140fe11
  由 chengduo 提交于 7月 27, 2019
```
* open fuse optimization ops
test=develop
```
  4140fe11
26 7月, 2019 1 次提交

Feature/mem opt pass refactor (#18735) · a802da65

由 Zeng Jinle 提交于 7月 26, 2019

* first version memory optimize pass, test=develop

* remove move_tensor_sharing_pass, test=develop

* refine code comments, add unittests, test=develop

* turn off memory_optimize by default, test=develop

* follow huihuang's comments, test=develop

* follow chengduoZH's comments, test=develop

* fix grammar error, add const qualifier, fix pass_test exception message, test=develop

* follow chengduoZH's comments 2nd, test=develop

a802da65

11 7月, 2019 2 次提交

G

Polish backwards optimizer dependency codes and use more default values. (#18255) · c0a82748
由 gongweibao 提交于 7月 11, 2019

c0a82748

Feature/buffer_shared_inplace (#17911) · d3003a16

由 Zeng Jinle 提交于 7月 11, 2019

* feature/buffer_shared_inplace, test=develop

* refine code, test=develop

* fix elementwise_add op cpu inplace and sum inplace bug, test=develop

* add unittest and debug log, test=develop

* fix parallel_executor scope bug, polish code, test=develop

* fix sum op, activation op, single_in_place_inference bug, test=develop

* remove kLocalExecScopeName, test=develop

* fix unittest,test=develop

* fix out_var first version bug, test=develop

* follow comments,test=develop

d3003a16

24 6月, 2019 1 次提交

Clean build strategy (#18148) · 5489216e

由 chengduo 提交于 6月 24, 2019

* clean build_strategy
test=develop

* DataBalanceOpHandle has been removed
test=develop

* debug

* update build_strategy.
test=develop

5489216e

14 6月, 2019 1 次提交
- G
  
  Fix reinitialized ncclid error! (#18025) · f5caf344
  由 gongweibao 提交于 6月 14, 2019
  
  f5caf344
06 6月, 2019 1 次提交
- G
  
  Add backward and optimizer operator dependency pass. (#17746) · fbbdc9cc
  由 gongweibao 提交于 6月 06, 2019
  
  fbbdc9cc
27 5月, 2019 1 次提交
- G
  
  Add multi-ncclcomm and 2D ncclallreduce support. (#17263) · 65bbf950
  由 gongweibao 提交于 5月 27, 2019
  
  65bbf950
20 5月, 2019 1 次提交
- T
  remove unused expected_kernel_cache_pass (#17486) · 32da5e9c
  由 Tao Luo 提交于 5月 20, 2019
```
test=develop
```
  32da5e9c
14 5月, 2019 1 次提交

make parallel_executor support FLAGS_use_mkldnn (#17341) · 68ec0a6f

由 Tao Luo 提交于 5月 14, 2019

* make parallel_executor support FLAGS_use_mkldnn

test=develop

* add warning when set mkldnn_enabled_op_types_ in non-mkldnn env

test=develop

68ec0a6f

11 4月, 2019 1 次提交

Add an option to enable the cache of expected kernel in train phase. (#16724) · 112f1614

由 Yiqun Liu 提交于 4月 11, 2019

* Add an option to enable the cache of expected kernel in train phase.
test=develop

* Change the default value of cache_expected_kernel to true.

112f1614

10 4月, 2019 1 次提交
- L
  
  disable memory_optimize and inpalce strategy by default, test=develop (#16760) · 2e07c19a
  由 liuwei1031 提交于 4月 10, 2019
  
  2e07c19a
08 4月, 2019 1 次提交

Enable the runtime_context_cache pass in train phase (#16640) · 3fe8cb0d

由 Yiqun Liu 提交于 4月 08, 2019

* Try to enable the runtime_context_cache pass in train phase.

* Put the append of runtime_context_cache pass ahead of multi_dev passes.
test=develop

3fe8cb0d

02 4月, 2019 1 次提交
- C
  Add Stream for fetch op handle (#16600) · b75a69ba
  由 chengduo 提交于 4月 02, 2019
```
* expose fuse broadcast ops
```
  b75a69ba
28 3月, 2019 2 次提交

C
Fuse Adam And SGD ops (#15933) · 1096746c
由 chengduo 提交于 3月 28, 2019
```
* fuse optimizer
```
1096746c

Fix the interface of Pass::Apply (#16484) · ed61d67c

由 chengduo 提交于 3月 27, 2019

* modify the interface of Pass::Allay
test=develop

* Polish code
test=develop

* Fix Travis CI
test=develop

* fix Pass::Apply interface
test=develop

* Fix Travis CI
test=develop

ed61d67c

20 3月, 2019 1 次提交

Fuse AllReduce (#15921) · f26ba5bd

由 chengduo 提交于 3月 19, 2019

* fuse all_reduce
test=develop

* add fuse_parameter_groups_size
test=develop

* Polish code
test=develop

* Fix travis-ci
test=develop

* Add SetGroupAccordingToLayers and SetGroupAccordingToGroupSize
test=develop

* Add SetGroupAccordingToMemorySize
test=develop

* fix multi_devices_graph
test=develop

* reset params_grads
test=develop

* Polish code
test=develop

f26ba5bd

15 3月, 2019 1 次提交

Support sync batch norm. (#16121) · 8ad672a2

由 qingqing01 提交于 3月 15, 2019

* Support Sync Batch Norm.
* Note, do not enable it in one device.

Usage:

build_strategy = fluid.BuildStrategy()
build_strategy.sync_batch_norm = True
binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
        loss_name=loss_mean.name,
        build_strategy=build_strategy)

8ad672a2

06 3月, 2019 1 次提交

add IfElse test case for ir memory optimize (#15998) · 9cc6f400

由 liuwei1031 提交于 3月 05, 2019

* add ir memory optimize test case for IfElse op, test=develop

* fix some unitttest failure by force using the python memory_optimize, test=develop

* tweak comments, test=develop

* fix unittest, test=develop

* fix unittest, test=develop

9cc6f400

05 3月, 2019 2 次提交

Q

code format test=develop · 4e218dab
由 Qiao Longfei 提交于 3月 05, 2019

4e218dab

add IfElse test case for ir memory optimize (#15998) · caadd058

由 liuwei1031 提交于 3月 05, 2019

* add ir memory optimize test case for IfElse op, test=develop

* fix some unitttest failure by force using the python memory_optimize, test=develop

* tweak comments, test=develop

* fix unittest, test=develop

* fix unittest, test=develop

caadd058

21 2月, 2019 1 次提交
- X
  allow compiler to use graph · 26e32e09
  由 Xin Pan 提交于 1月 17, 2019
```
test=develop
```
  26e32e09
11 2月, 2019 1 次提交
- D
  
  add details. test=develop · 04e9776a
  由 dzhwinter 提交于 2月 11, 2019
  
  04e9776a
31 1月, 2019 1 次提交
- D
  
  follow comments. test=develop · 0a63234c
  由 dzhwinter 提交于 1月 31, 2019
  
  0a63234c
22 1月, 2019 1 次提交
- S
  turn on remove_unnecessary_lock · d8568acd
  由 sneaxiy 提交于 1月 22, 2019
```
test=develop
```
  d8568acd
21 1月, 2019 1 次提交
- D
  
  squash commits. test=develop · 8f3b2523
  由 dzhwinter 提交于 1月 21, 2019
  
  8f3b2523

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致