提交 · dcfb603897c30cffd70e0ec6142e1cecbd64a6a9 · 机器未来 / Paddle

07 2月, 2020 1 次提交

Enable the detection of subgraph composed of grad ops (#21223) · dcfb6038

由 Yiqun Liu 提交于 2月 07, 2020

* Add the first implememtation of fusion_group op #19621 (#3)

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

* Add DeviceCodePool to manage all device codes.

* Add the first implementation fusion_group op.

* Add unit-test for fusion_group op.

* Add the check of result.

* Add the check of nvrtc in unit-test.
test=develop

* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop

* Disable fusion_group op for mac and windows.
test=develop

* Make the compiling of device code return status instead of hanging up.
test=develop

* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.

* Unify fusion_group_op's input and output names.
test=develop

* Add the check of CUDA driver library in unittest.
test=develop

* Enable generating code for a given subgraph. #21126 (#4)

* Enable generating code for a given subgraph.

* Support sorting the subgraph.

* Remove the rearange of expressions because we use the sorted subgraph directly.

* Enable generating code for a subgraph which is composed of grad ops.

* Use expression information to check the accuracy in unittest.

* Separate load and store from computation expressions.
test=develop

* Improve the loading statements in generated codes.
test=develop

* Remove unused arguments from formal list.
test=develop

* Enable the detection of subgraph of grad ops.

* Generate code for detected subgraph in fusion_group_pass.

* Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
test=develop

* Fix a bug when checking whether the shape of all inputs are the same.

* Add debug information.

* Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5)

test=develop

* Call subgraph_detector in fusion_group pass.
test=develop

* Disable fusion_group when WITH_GPU is OFF.
test=develop

* Refine all PADDLE_ENFORCE message.
test=develop

* Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
test=develop

* Follow review comments.
test=develop

dcfb6038

05 2月, 2020 1 次提交

add WITH_NCCL option for cmake. (#22384) · 7bc4b095

由 Wilber 提交于 2月 05, 2020

cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL

添加了PADDLE_WITH_NCCL定义

单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

7bc4b095

17 1月, 2020 1 次提交
- T
  integrated HALF_ASYNC to communicator (#21869) · 82bc814a
  由 tangwei12 提交于 1月 17, 2020
```
* add half_async in the communicator
* fix DistributedStrategy
```
  82bc814a
13 1月, 2020 1 次提交
- C
  Polish fetch error message of parallel executor (#22206) · fc0b21e1
  由 Chen Weihang 提交于 1月 13, 2020
```
* polish error message of parallel executor, test=develop

* change PADDLE_ENFORCE, test=develop
```
  fc0b21e1
10 1月, 2020 1 次提交

Add bn and relu fuse pass (#22048) · 46189b16

由 Zhen Wang 提交于 1月 10, 2020

* add bn and relu fuse pass

* add op attr assert and dtype assert

* fix some inputs&&outputs bugs for the fused op and pattern.

* add the unittest for fuse_bn_act_pass. test=develop

* use normative enforce statements. test=develop

* add the cpu test. test=develop

* add the support of batch_size=1 for the bn with relu op. test=develop

* add the error type for paddle throws. test=develop

* add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop

46189b16

19 12月, 2019 1 次提交
- W
  
  fix batch_norm_grad infer shape=0 & add allreduce enforce shape, test=develop (#21801) · 17299b8d
  由 WangXi 提交于 12月 19, 2019
  
  17299b8d
18 12月, 2019 1 次提交

Fix Backward Bugs in Conditional Block (#21809) · 557bce77

由 Huihuang Zheng 提交于 12月 18, 2019

The fixed bugs:

1. The condition sub-graph is not pruned
2. When backward graph is extremely simple, the whole backward ops are pruned.

557bce77

15 12月, 2019 1 次提交
- W
  
  fix std::min type in nan_inf, test=develop (#21725) · 8754cbd1
  由 WangXi 提交于 12月 15, 2019
  
  8754cbd1
12 12月, 2019 1 次提交
- W
  
  Rewrite check nan inf tools (#21076) · 8a0f611b
  由 WangXi 提交于 12月 12, 2019
  
  8a0f611b
11 12月, 2019 1 次提交
- Z
  
  fix op_registry, add ignore op_function_impl.h, test=develop (#21654) · 6828f368
  由 Zeng Jinle 提交于 12月 11, 2019
  
  6828f368
06 12月, 2019 1 次提交

Polish op registry codes (#21561) · 0f888836

由 Zeng Jinle 提交于 12月 06, 2019

* polish infer shape registry, test=develop

* modify some operators registry, test=develop

0f888836

28 11月, 2019 1 次提交

Polish reference count pass (#21324) · 89966525

由 Zeng Jinle 提交于 11月 28, 2019

* fix ref_cnt pass, test=develop

* add cpp unittests to reference_count_pass, test=develop

* follow comments, test=develop

89966525

25 11月, 2019 1 次提交
- Z
  
  remove warning LNK4006 and warning LNK4221 (#21226) · 345b67b5
  由 zhouwei25 提交于 11月 25, 2019
  
  345b67b5
18 11月, 2019 1 次提交

Fix warn of gcc8 (#21205) · cdb3d279

由 Zeng Jinle 提交于 11月 18, 2019

* fix warnings oof gcc 8 compilation, test=develop

* fix boost::bad_get, test=develop

* refine PADDLE_ENFORCE, test=develop

cdb3d279

13 11月, 2019 2 次提交

Add examples for error message writing specification - PreconditionNotMet,... · 8414575b

由 Chen Weihang 提交于 11月 13, 2019

Add examples for error message writing specification - PreconditionNotMet, Unimplemented, Unavailable (#21137)

* add examples for error spec, test=develop

* change ENFORCE to ENFORCE_**, test=develop

8414575b

C
Add examples for error message writing specification - InvalidArgument (#21132) · 7e5f74b8
由 Chen Weihang 提交于 11月 13, 2019
```
* add examples for error msg spec, test=develop

* change ENFORCE to ENFORCE_**, test=develop

* fix error, test=develop
```
7e5f74b8

12 11月, 2019 1 次提交
- W
  
  Fix dgc buffer illegal & reuse velocity (#21012) · de5d3ff6
  由 WangXi 提交于 11月 12, 2019
  
  de5d3ff6
05 11月, 2019 1 次提交

Support NoNeedBufferVarsInference in dygraph backward (#20868) · 878a40f5

由 Zeng Jinle 提交于 11月 05, 2019

* support no need buffer vars in dygraph, test=develop

* fix inference compilation error, test=develop

* update no_need_buffer_vars_inference, test=develop

* add unittests for no_need_buffer_vars_context, test=develop

* refine no_need_buffer_vars by return ref, test=develop

* polish some codes, test=develop

878a40f5

01 11月, 2019 2 次提交
- Z
  
  refine pe when exception raises, test=develop (#20894) · b0c0ffb9
  由 Zeng Jinle 提交于 11月 01, 2019
  
  b0c0ffb9
- 1
  Optimize decay (#20816) · 20cdff0e
  由 123malin 提交于 11月 01, 2019
```
* update pserver decay blocks

* update distributed notify handler
```
  20cdff0e
31 10月, 2019 1 次提交

GradMaker for dygraph (#19706) · 8c4573a3

由 hong 提交于 10月 31, 2019

* refactor dygraph,test=develop

* fix failed unittest,test=develop

* polish code,test=develop

* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop

* polish vlog and profiler, test=develop

* try to fix preceding ops order,test=develop

* test transformer in windows ci, test=develop

* use python c-api to speed up tracer.trace,test=develop

* test=develop, fix docker with paddle nccl problem

* test=develop, add ut for debug string and gradient_accumulator

* test=develop, add tests for layer/gradient_accumulator/prepared_op

* test=develop, fix complie error for test_prepared_op

* test=develop, add more ut for dygraph

* test=develop, create API.spec for dygraph api change

* optimize grad maker; test=develop

* optimize grad maker

* test

* grad make optim; test=develop

* fix unittest bugs; test=develop

* add dygraph grad op maker and split_op

* grad op maker refactor; test=develop

* add dygraph grad maker; test=develop

* fix op deformable_conv_v1_op bug; test=develop

* fix deformable_conv prroi pool bugs;

* fix new op grad op maker bug; test=develop

* fix split by ref bug; test=develop

* fix dygraph auto prune bug; test=develop

* fix test_trace bug; test=develop

* fix fused emb seq pool bug; test=develop

* remove useless code in op_desc file; test=develop

* remove useless code, StrVarBaseNode; test=develop

* fix review issues; test=develop

* fix rank_loss grad maker; test=develop

* remove flag in VarBase; test=develop

* fix distributed_notify_op compile bug ; test=develop

* fix reshape op double grad; test=develop

* fix expand as op; test=develop

* add impertive type_defs.h for demo_train; test=develop

* fix inference lib cmake; test=develop

* fix inference lib; test=develop

* fix infernce_lib; test=develop

* fix inference cmake; test=develop

* fix inference lib; test=develop

* fix inference lib; test=develop

* remove condition dygraph grad maker, modify local name; test=develop

* fix split grad maker bug; test=develop

* fix pyramid_op bug; test=develop

* change travis time out limit; test=develop

* restore travis; test=develop

* change timeout limit; test=develop

8c4573a3

28 10月, 2019 1 次提交
- Z
  
  remove some unnecessary logs in pe, test=develop (#20848) · 98103d30
  由 Zeng Jinle 提交于 10月 28, 2019
  
  98103d30
18 10月, 2019 2 次提交
- W
  add support to gcc8, add docker env test=develop (#19807) · 9e594823
  由 wopeizl 提交于 10月 18, 2019
```
* add support to gcc8, add docker env test=develop
```
  9e594823
- W
  
  Fix dgc nan by stripping nccl from sparseReduce. (#20630) · 507afa8a
  由 WangXi 提交于 10月 17, 2019
  
  507afa8a
14 10月, 2019 2 次提交
- Z
  
  refine pe codes, test=develop (#20479) · a9c8bdad
  由 Zeng Jinle 提交于 10月 14, 2019
  
  a9c8bdad
- Z
  
  fix cuda dev_ctx by event, test=develop (#20553) · 76b32187
  由 Zeng Jinle 提交于 10月 14, 2019
  
  76b32187
30 9月, 2019 1 次提交
- C
  Add place deps for fused_all_reduce_op_handle (#20077) · bfa55c9d
  由 chengduo 提交于 9月 30, 2019
```
test=develop
```
  bfa55c9d
27 9月, 2019 1 次提交

the integrated communicator (#19849) · 8f0b3c05

由 tangwei12 提交于 9月 27, 2019

* add a base class for the Communicator
* add AsyncCommunicator Impl for async distributed training

8f0b3c05

26 9月, 2019 1 次提交
- C
  disable fuse_all_optimizer_ops (#19966) · 2450d15b
  由 chengduo 提交于 9月 26, 2019
```
test=develop
```
  2450d15b
24 9月, 2019 1 次提交
- C
  clean tensor array (#19930) · 55ce6969
  由 chengduo 提交于 9月 24, 2019
```
test=develop
```
  55ce6969
23 9月, 2019 1 次提交
- C
  Delete local execution scopes (#19749) · d7251a8e
  由 chengduo 提交于 9月 23, 2019
```
* Add RecordHistoryLocalExecScopes
test=develop
```
  d7251a8e
20 9月, 2019 1 次提交
- Z
  
  fix reduce and broadcast to avoid multi-stream, test=develop (#19889) · b754700f
  由 Zeng Jinle 提交于 9月 20, 2019
  
  b754700f
18 9月, 2019 1 次提交

[Bug fix] Disable memory reuse on feeded variables (#19835) · db26de83

由 Zeng Jinle 提交于 9月 18, 2019

* fix memory reuse bug on feeding variables, test=develop

* add comments to reference count members, test=develop

db26de83

16 9月, 2019 1 次提交
- C
  Fix warning info of build_strategy (#19805) · 82814970
  由 chengduo 提交于 9月 16, 2019
```
* fix warning info
test=develop

* fix bug of all_reduce_deps_pass
test=develop
```
  82814970
13 9月, 2019 1 次提交

Open fuse all reduce option (#19765) · 056fdedd

由 chengduo 提交于 9月 13, 2019

* Open fuse all reduce op
test=develop

* Add Fuse optimization op log

* Add log in fuse_optimizer op pass and fuse all_reduce op pass

* replace with boost::optional<bool>
test=develop

* Polish code
test=develop

* fix code coverage
test=develop

056fdedd

11 9月, 2019 3 次提交

Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320

由 Huihuang Zheng 提交于 9月 11, 2019

TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.

We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.

Also added data_feed_proto to operator to fix CI in CPU compilation

12542320

C
Open fuse broadcast option (#18833) · e506c99c
由 chengduo 提交于 9月 11, 2019
```
* fix vlog level and fuse option type
test=develop
```
e506c99c
C
Enable fused_all_reduce_op_handle support GPU and CPU Gradients (#19418) · 5866a7a5
由 chengduo 提交于 9月 11, 2019
```
* Enable fused_all_reduce_op_handle support GPU and CPU Gradients
```
5866a7a5

10 9月, 2019 2 次提交

Z

add logs to left var memory size, test=develop (#19722) · bb4f8dee
由 Zeng Jinle 提交于 9月 10, 2019

bb4f8dee

merge empty lod tensor, test=develop (#19228) · 25dcd74d

由 wangguanzhong 提交于 9月 10, 2019

* merge_empty_lod_tensor, test=develop

* fix multiclass_nms, test=develop

* refine API.spec, test=develop

* add unittest case for fetch, test=develop

* add lod tensor test, test=develop

* return index for multiclass_nms, test=develop

* add api for multiclass_nms2

* update API.spc, test=develop

* refine api doc, test=develop

* fix test_detection.py, test=develop

* polish code, test=develop

* add more unittest case, test=develop

25dcd74d

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致