提交 · c727ec4ae62dca15afc2ac2a9149ec913bf8d355 · 兽拳 / Paddle

08 9月, 2021 1 次提交
- W
  
  [NPU] add get_float_status op and refine NPU check_nan_inf (#35274) · c727ec4a
  由 WangXi 提交于 9月 08, 2021
  
  c727ec4a
05 8月, 2021 1 次提交

remove boost::algorithm::ends_with ，boost macro and boost::lexical_cast apis (#34310) · bb7b4c0c

由 chentianyu03 提交于 8月 05, 2021

* replace boost::algorithm::ends_with with self define ends_with function

* remove BOOST macro in certain operators

* remove boost::lexical_cast

* add test for string_helper

* add more test case for string_helper

* modify join_string func and test case

* fix build_strategy_test failed bug

* remove string_helper_test from parallel_UT_rule.py

bb7b4c0c

29 7月, 2021 1 次提交

add fix op run order pass (#34427) · 79e758c6

由 Zeng Jinle 提交于 7月 29, 2021

* add fix op run order pass

* add ut for fix_op_run_order

* fix ci error

* improve coverage

* improve coverge again and fix cpu test case

* follow some comments

79e758c6

28 7月, 2021 1 次提交
- J
  apply pass strategy to sub graph (#34158) · 5e27d16d
  由 jiangcheng 提交于 7月 28, 2021
```
When Graph has sub-graph, apply pass to it and all sub-graph. And add single test script .
```
  5e27d16d
15 7月, 2021 1 次提交
- A
  Upgrade Executor into ParallelExcutor to apply Graph Optimization in @to_static (#32283) · 2850391d
  由 Aurelius84 提交于 7月 15, 2021
```
* Refine Constructor logic of ParallelExecutor

* Replace executor into ParallelExecutor in run_program_op
```
  2850391d
22 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid framework for rocm (part3), test=develop (#31011) · 50967135
  由 Qi Li 提交于 2月 22, 2021
  
  50967135
18 1月, 2021 1 次提交
- L
  
  [Kunlun]PR3: add xpu executor, multi xpu card train function optimization (#30317) · 843dc3cd
  由 liuyuhui 提交于 1月 18, 2021
  
  843dc3cd
12 1月, 2021 1 次提交

Fix/distributed proto (#29981) · 25f80fd3

由 tangwei12 提交于 1月 12, 2021

* rename sendrecv.proto to namespace paddle.distributed

* split ps with distributed

25f80fd3

04 1月, 2021 1 次提交
- W
  
  Optimization grad merge performance (#29784) · ee16006b
  由 WangXi 提交于 1月 04, 2021
  
  ee16006b
24 12月, 2020 1 次提交

[Feature] one ps (3/4) (#29604) · 032414ca

由 tangwei12 提交于 12月 24, 2020

* oneps (3/4)
Co-authored-by: NMrChengmo <cmchengmo@163.com>
Co-authored-by: Nmalin10 <malin10@baidu.com>
Co-authored-by: Nchengmo <chengmo@baidu.com>

032414ca

27 10月, 2020 1 次提交
- Z
  add Fuse bn add act pass (#28196) · fdc06f21
  由 Zhang Ting 提交于 10月 27, 2020
```
* add fuse_bn_add_act pass
```
  fdc06f21
21 9月, 2020 1 次提交

[Feature] Enhance inplace addto strategy for gradient accumulation in static graph (#27112) · aba759ba

由 Leo Chen 提交于 9月 21, 2020

* support use add instead of sum to do gradient accumulation

* add inplace addto pass

* add grad_add op and inplace addto pass

* remove debug code

* code refine

* fix bug when sereral sum ops inserts at same op_idx

* fix Flags type

* add addto attribute for conv3d

* fix ut

* code clean

* fix type

aba759ba

02 9月, 2020 1 次提交

Add FetchAsyncOpHandle, and use it in FastThreadedExecutor (#26643) · 2d2c31a6

由 wanghuancoder 提交于 9月 02, 2020

* optimized transformation form tensor to numpy, test=develop

* Modify fetch op handle, from memcpy Sync to memcpy Async, test=develop

* modify CUDAPinnedPlace to CPUPlace, test=develop

* modify CPUPlace to CUDAPinnedPlace, and set default inplace to false, test=develop

* revert fetch_op_handle, add fetch_async_op_handle, test=develop

* revert fetch_op_handle, add fetch_async_op_handle, test=develop

* fix error msg report, test=develop

* fix bug in cpuplace, test=develop

* fix bug in unmerge and tensorarray modle, test=develop

* fix bug, double copy gpu memory, test=develop

* fix chenweihang¡¯s review advice, test=develop

2d2c31a6

07 7月, 2020 1 次提交

catch bad alloc exception (#25140) · 70d7d07f

由 hong 提交于 7月 07, 2020

* cat bad alloc exception; test=develop

* add unitest; test=develop

* move bad alloc catch to the first place; test=develop

* polish error message; test=develop

* polish error message; test=develop

* add mutex header; test=develop

70d7d07f

14 4月, 2020 1 次提交

Correct reader device index (#23802) · c4979136

由 Zeng Jinle 提交于 4月 14, 2020

* correct reader device index, test=develop

* fix async executor scope var initialization, test=develop

c4979136

09 4月, 2020 1 次提交

Remove: NGraph engine from PDPD repository (#23545) · 3baaee9a

由 mozga-intel 提交于 4月 09, 2020

* Remove the NGraph engine from PDPD repository
1. Each operator was removed from the operator's directory
2. Each test was removed from the unittest directory
3. The parallel executor support was removed from the PDPD
4. The CMake file was removed from the PDPD
5. The NG flags were removed from the repository
test=develop

* Remove ngraph from:
1. Cmake file
2. Python file
test=develop

3baaee9a

01 4月, 2020 1 次提交
- Z
  
  add reader dependency pass, test=develop (#23301) · 3a21980b
  由 Zeng Jinle 提交于 4月 01, 2020
  
  3a21980b
20 3月, 2020 1 次提交

Reader sequential and inference partial feed (#22699) · acfc9b8a

由 Zeng Jinle 提交于 3月 20, 2020

* sequential reader stage 1, test=develop

* fix ut, test=develop

* fix iterable=False reset bug, add some logs and polish code, test=develop

* inference feed partial data, test=develop

* Turn on keep_order=True for test, test=develop

* enhance ut to test more cases, test=develop

* test commit for reverting

* Revert "test commit for reverting", test=develop

This reverts commit 80aef42e.

* add ut of merged and unmerged results, test=develop

* add more uts for coverages and add en doc of api, test=develop

* follow comments, test=develop

* change note style, test=develop

acfc9b8a

13 2月, 2020 1 次提交
- Y
  Disable fusion_group for windows and mac in build_strategy. (#22549) · 96770f51
  由 Yiqun Liu 提交于 2月 13, 2020
```
test=develop
```
  96770f51
07 2月, 2020 1 次提交

Enable the detection of subgraph composed of grad ops (#21223) · dcfb6038

由 Yiqun Liu 提交于 2月 07, 2020

* Add the first implememtation of fusion_group op #19621 (#3)

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

* Add DeviceCodePool to manage all device codes.

* Add the first implementation fusion_group op.

* Add unit-test for fusion_group op.

* Add the check of result.

* Add the check of nvrtc in unit-test.
test=develop

* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop

* Disable fusion_group op for mac and windows.
test=develop

* Make the compiling of device code return status instead of hanging up.
test=develop

* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.

* Unify fusion_group_op's input and output names.
test=develop

* Add the check of CUDA driver library in unittest.
test=develop

* Enable generating code for a given subgraph. #21126 (#4)

* Enable generating code for a given subgraph.

* Support sorting the subgraph.

* Remove the rearange of expressions because we use the sorted subgraph directly.

* Enable generating code for a subgraph which is composed of grad ops.

* Use expression information to check the accuracy in unittest.

* Separate load and store from computation expressions.
test=develop

* Improve the loading statements in generated codes.
test=develop

* Remove unused arguments from formal list.
test=develop

* Enable the detection of subgraph of grad ops.

* Generate code for detected subgraph in fusion_group_pass.

* Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
test=develop

* Fix a bug when checking whether the shape of all inputs are the same.

* Add debug information.

* Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5)

test=develop

* Call subgraph_detector in fusion_group pass.
test=develop

* Disable fusion_group when WITH_GPU is OFF.
test=develop

* Refine all PADDLE_ENFORCE message.
test=develop

* Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
test=develop

* Follow review comments.
test=develop

dcfb6038

10 1月, 2020 1 次提交

Add bn and relu fuse pass (#22048) · 46189b16

由 Zhen Wang 提交于 1月 10, 2020

* add bn and relu fuse pass

* add op attr assert and dtype assert

* fix some inputs&&outputs bugs for the fused op and pattern.

* add the unittest for fuse_bn_act_pass. test=develop

* use normative enforce statements. test=develop

* add the cpu test. test=develop

* add the support of batch_size=1 for the bn with relu op. test=develop

* add the error type for paddle throws. test=develop

* add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop

46189b16

12 12月, 2019 1 次提交
- W
  
  Rewrite check nan inf tools (#21076) · 8a0f611b
  由 WangXi 提交于 12月 12, 2019
  
  8a0f611b
28 11月, 2019 1 次提交

Polish reference count pass (#21324) · 89966525

由 Zeng Jinle 提交于 11月 28, 2019

* fix ref_cnt pass, test=develop

* add cpp unittests to reference_count_pass, test=develop

* follow comments, test=develop

89966525

25 11月, 2019 1 次提交
- Z
  
  remove warning LNK4006 and warning LNK4221 (#21226) · 345b67b5
  由 zhouwei25 提交于 11月 25, 2019
  
  345b67b5
30 9月, 2019 1 次提交
- C
  Add place deps for fused_all_reduce_op_handle (#20077) · bfa55c9d
  由 chengduo 提交于 9月 30, 2019
```
test=develop
```
  bfa55c9d
23 9月, 2019 1 次提交
- C
  Delete local execution scopes (#19749) · d7251a8e
  由 chengduo 提交于 9月 23, 2019
```
* Add RecordHistoryLocalExecScopes
test=develop
```
  d7251a8e
04 9月, 2019 1 次提交

Enable ngraph through build_strategy (#19266) · a3a4b6e5

由 baojun 提交于 9月 04, 2019

* enable ngraph throught build_strategy test=develop

* add unittest test=develop

* put use_ngraph unconditional test=develop

* remove paddle_enforce test=develop

* remove paddle_enforce test=develop

* fix copyright test=develop

* limit for ngraph only test=develop

a3a4b6e5

29 7月, 2019 1 次提交

Remove legacy C++ memory optimization codes (#18834) · 8008ab4e

由 Zeng Jinle 提交于 7月 29, 2019

* remove legacy memory optimization codes, test=develop

* follow huihuang's comments,test=develop

* follow luotao's comments, test=develop

8008ab4e

26 7月, 2019 1 次提交

Feature/mem opt pass refactor (#18735) · a802da65

由 Zeng Jinle 提交于 7月 26, 2019

* first version memory optimize pass, test=develop

* remove move_tensor_sharing_pass, test=develop

* refine code comments, add unittests, test=develop

* turn off memory_optimize by default, test=develop

* follow huihuang's comments, test=develop

* follow chengduoZH's comments, test=develop

* fix grammar error, add const qualifier, fix pass_test exception message, test=develop

* follow chengduoZH's comments 2nd, test=develop

a802da65

23 7月, 2019 1 次提交
- C
  Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664) · fd3aad6c
  由 chengduo 提交于 7月 23, 2019
```
* support sparse gradients
test=develop
```
  fd3aad6c
11 7月, 2019 1 次提交

Feature/buffer_shared_inplace (#17911) · d3003a16

由 Zeng Jinle 提交于 7月 11, 2019

* feature/buffer_shared_inplace, test=develop

* refine code, test=develop

* fix elementwise_add op cpu inplace and sum inplace bug, test=develop

* add unittest and debug log, test=develop

* fix parallel_executor scope bug, polish code, test=develop

* fix sum op, activation op, single_in_place_inference bug, test=develop

* remove kLocalExecScopeName, test=develop

* fix unittest,test=develop

* fix out_var first version bug, test=develop

* follow comments,test=develop

d3003a16

10 7月, 2019 1 次提交
- Z
  Clean unused code of dim and place (#18565) · be24e5b3
  由 Zeng Jinle 提交于 7月 10, 2019
```
* clean code of dim and place, test=develop

* fix failed unittests, test=develop
```
  be24e5b3
06 6月, 2019 1 次提交
- G
  
  Add backward and optimizer operator dependency pass. (#17746) · fbbdc9cc
  由 gongweibao 提交于 6月 06, 2019
  
  fbbdc9cc
08 5月, 2019 1 次提交
- C
  Code Clean: Move all pass to paddle::framework::ir (#17228) · 04bd413a
  由 chengduo 提交于 5月 08, 2019
```
* move pass to ir

* polish code
test=develop

* fix dependency
test=develop
```
  04bd413a
23 4月, 2019 1 次提交
- C
  Add fuse momenutum ops (#16745) · a2be4b4d
  由 chengduo 提交于 4月 23, 2019
```
* Add fuse momenutum ops
```
  a2be4b4d
21 4月, 2019 1 次提交

Refine model gpu memory (#16993) · 1202d3fc

由 Zeng Jinle 提交于 4月 21, 2019

* speedup gc and inplace softmax_with_cross_entropy_grad
test=develop

* refine models gpu mem
Merge skip vars and warning messages of mem opt
remove relu mem opt
test=develop

* follow comments
test=develop

1202d3fc

18 4月, 2019 1 次提交
- G
  
  Polish DGC code (#16818) · cbdb8a17
  由 gongweibao 提交于 4月 18, 2019
  
  cbdb8a17
30 3月, 2019 1 次提交
- G
  Fix windows compilation error! (#16546) · fea91164
  由 gongweibao 提交于 3月 30, 2019
```
* fix compiled
test=develop

* follow comments test=develop
```
  fea91164
28 3月, 2019 2 次提交
- C
  Fuse Adam And SGD ops (#15933) · 1096746c
  由 chengduo 提交于 3月 28, 2019
```
* fuse optimizer
```
  1096746c
- G
  
  Add DGC(Deep Gradient Compression) interface. (#15841) · eb83abea
  由 gongweibao 提交于 3月 28, 2019
  
  eb83abea

兽拳 / Paddle 与 Fork 源项目一致

兽拳 / Paddle
与 Fork 源项目一致