提交 · 93d862b0adf224a0af547d1442c57fbd6d0e8efc · Crayon鑫 / Paddle

24 8月, 2021 1 次提交

Add auto completion module for auto parallel (#34813) · 93d862b0

由 Yulong Ao 提交于 8月 24, 2021

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* add dist

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update

* update

* delete unused proto

* resotre op_desc

* restore type_defs

* update var_desc

* remove dimss_mapping for proto_pybind

* update interface.py

* update framework.py

* update

* update

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* [WIP] Add the auto completion feature and related codes

* [WIP] Improve the auto completion and related codes

* [WIP] Make the auto completion to support data-parallel

* [WIP] Make the completion support mp and dp+mp

* [WIP] Refactor auto completion unit test for MLP

* [WIP] Refactor the implementation of DistributedOperatorImpl

* [WIP] Improve dims_mapping update rule and fix a bug

* [WIP] Support auto completion for one transformer decoder layer

* [WIP] Add a minor change

* [WIP] Fix a bug within the uint test

* Shard XShape tensor, add embedding completion and refactor code

* Add the distributed_operators dir to setup.py.in

* Improve the completion process and add the unittest for gpt

* fix process_mesh ut

* fix process_mesh ut

* update

* update, test=develop

* Add support for automatically completing distributed attrs of special ops

* update

* update

* update

* fix doc sample codes, test=develop

* improve coverage, test=develop

* add static_mode check, test=develop

* Model the cluster for cost model and physical mapping

* update, test=develop

* add set_placement, test=develop

* Add the check to make sure the candidate tensors' size is great than zero

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update, test=develop

* Auto mark dist attrs annotated by user

* update ndarray to nested list, test=develop

* update, test=develop

* Add auto-completion module for auto-parallel (based on PR#33804)

* Remove unnecessary files

* Remove unrelated files for the auto completion pr

* Update the unit test to improve the coverage

* Modify codes based on reviews

* Minor changes for CI

* Improve some codes based on new comments

* Fix bugs caused by shallow copy in attributes.py
* Imporve amend_distributed_attr_for_program in context.py
* Other changes for weihang's comments
Co-authored-by: Nsandyhouse <lilong12@baidu.com>

93d862b0

23 8月, 2021 1 次提交
- W
  
  trt convert ut add dynamic_shape and int8, etc. (#35061) · 17188e8d
  由 Wilber 提交于 8月 23, 2021
  
  17188e8d
20 8月, 2021 2 次提交
- Y
  
  [hybrid performance] Grad fuse for gradient merge under pipeline mode (#35004) · 4d9b2d6d
  由 Yuang Liu 提交于 8月 20, 2021
  
  4d9b2d6d
- W
  fix set_lod in data_feed (#35000) · 4416c793
  由 wangguanqun 提交于 8月 20, 2021
```
* add trainer desc config to distributed strategy

* code style modified

* data_feed set lod
```
  4416c793
18 8月, 2021 3 次提交

code refactoring for new executor (#34970) · 40d4d834

由 wanghuancoder 提交于 8月 18, 2021

* code refactoring, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

40d4d834

W
[Hybrid Performance] Move the cast op of AMP which cast fp32 param to fp16... · a9673b44
由 WangXi 提交于 8月 18, 2021
```
[Hybrid Performance] Move the cast op of AMP which cast fp32 param to fp16 param to the optimizer (#34965)
```
a9673b44

[CustomOp] Fix ext_tensor.cast failed bug (#34884) · 4d88cdb8

由 Chen Weihang 提交于 8月 18, 2021

* fix ext_tensor.cast failed bug

* remove useless deps

* fix windows cmake failed

* try to fix windows make failed

* fix make error on windwos

4d88cdb8

17 8月, 2021 2 次提交

Copy boost optional to Paddle (#34780) · 9be41447

由 chentianyu03 提交于 8月 17, 2021

* copy boost optional.hpp to paddle

* copy boost optional.hpp to paddle

* move directions

* del fluid/utils

* modify .hpp to .h

* move directions

* modify to paddle::optional

* add modification description

* format code stype for the files in paddle/utils

* format code stype

9be41447

Add some passes which can be applied to Program (#34730) · 8046e33d

由 Zeng Jinle 提交于 8月 17, 2021

* add inplace passes and tests

* update

* fix use_cuda undefined
fix compile error of op compat

* add more ut

* fix CPU CI error

* check adam unique

* fix mac/windows ci, improve coverage

* fix ci error

* follow weihang's comment

* fix BlockDesc::MoveFrom

* follow qiuliang's comment

* update

* follow huihuang's comments

8046e33d

16 8月, 2021 2 次提交
- F
  
  [CPU-PSLIB] Add config for scale_sparse_grad in config_fleet.py,test=develop (#34893) · d028214d
  由 Fan Zhang 提交于 8月 16, 2021
  
  d028214d
- J
  Fix elementwise_add quantization (#34820) · ae80df91
  由 joanna.wozna.intel 提交于 8月 16, 2021
```
* Remove force_fp32_output from elementwise_add quantization

* Fix cpu_quantize_placement test

* Review related changes
```
  ae80df91
13 8月, 2021 2 次提交
- Z
  Bug fix : Can't load multiple modules of custom c++ op (#34505) · fc6b4a50
  由 zyfncg 提交于 8月 13, 2021
```
* Fix a bug : can't load more than one custom op module

* Fix a bug : can't load more than one custom op module

* add test for load multiple modules of custom c++ op

* add config for Coverage CI
```
  fc6b4a50
- Z
  
  fix generator thread safety bug (#34888) · f421741c
  由 Zeng Jinle 提交于 8月 13, 2021
  
  f421741c
11 8月, 2021 4 次提交

W
[Paddle TRT]fix_fc_int8_convert; fix_reshape_convert (#34787) · 3429c04b
由 Wangzheee 提交于 8月 11, 2021
```
* fix_fc_reshape_convert

* fix
```
3429c04b

Add ext_tensor.slice() API (#34227) · 3f011d82

由 Hao Lin 提交于 8月 11, 2021

* Add ext_tensor.slice() API, test=develop

* Call Tensor::mutable_data first to fix bugs and add test for writing to sliced tensor

* Fix unit test bug

* Fix code format problem, test=develop

* Fix code format problem

* Fix code format problem

* strengthen unit test

* Use CustomTensorUtils::ShareDataFrom to simplify codes

3f011d82

L
add the basic apis for auto_parallel (#33804) · 3f962e77
由 lilong12 提交于 8月 11, 2021
```
* add auto_parallel apis
```
3f962e77

Add no need output to gc check list (#34754) · 17c1dae9

由 hong 提交于 8月 11, 2021

* add not used output var to gc_check_list; test=develop

* add useless output to gc check list; test=develop

17c1dae9

10 8月, 2021 1 次提交

copy boost/any.hpp to utils and replace boost::any with self defined any (#34613) · 12892929

由 chentianyu03 提交于 8月 10, 2021

* add any.hpp to utils and replace boost::any with self defined paddle::any

* add copy any.hpp to custom op depends

* modify any.hpp include path

* remove boost from setup.py.in

* add copy any.hpp to custom op depends

* move any.hpp to paddle/utils/ dirs

* move any.h to extension/include direction

* copy utils to right directions

12892929

06 8月, 2021 3 次提交
- H
  
  zero_copy_tensor unittest: support XPU. (#34670) · 52e38a00
  由 houj04 提交于 8月 06, 2021
  
  52e38a00
- Q
  support kunlun black list and add kl1 op (#34605) · 21beef91
  由 QingshuChen 提交于 8月 06, 2021
```
* support kunlun black list and add kl1 op

* xpu_op_list add device_context dependence
```
  21beef91
- Q
  
  fix npu compile error, test=develop (#34656) · c16421c2
  由 Qi Li 提交于 8月 06, 2021
  
  c16421c2
05 8月, 2021 3 次提交

New executor dev (#34407) · 012d12b5

由 hong 提交于 8月 05, 2021

* first test version

* add test exec;

* add data transfer; test=develop

* add new exec head;

* add memcpy; test=develop

* add python fetch

* add new test

* add graph node; test=develop

* remove useless new executor test; test=develop

* remove gperf dependency; test=develop

* fix compile bugs; test=develop

* remove useless code; test=develop

* remove useless code; test=develop

* add uni test; test=develop

* polish code; test=develop

* polish code; test=develop

* add interpreter cmakefile; test=develop

* remove useless code; test=develop

012d12b5

remove boost::algorithm::ends_with ，boost macro and boost::lexical_cast apis (#34310) · bb7b4c0c

由 chentianyu03 提交于 8月 05, 2021

* replace boost::algorithm::ends_with with self define ends_with function

* remove BOOST macro in certain operators

* remove boost::lexical_cast

* add test for string_helper

* add more test case for string_helper

* modify join_string func and test case

* fix build_strategy_test failed bug

* remove string_helper_test from parallel_UT_rule.py

bb7b4c0c

王

[pass_enhance]fix the mkldnn model performance drop problem. test=develop (#34625) · e47d8a57
由王明冬提交于 8月 05, 2021

e47d8a57

04 8月, 2021 2 次提交

李
Revert pull request 34212 (#34558) · 09892118
由李季提交于 8月 04, 2021
```
* revert commit id 34212
```
09892118

[NPU] Support npu kernel for assign_value op (#34568) · f39c3a5a

由 Sing_chan 提交于 8月 04, 2021

* [NPU] Support npu kernel for assign_value op

* move test_assign_value_op_npu.py into unittests/npu folder

* correce copyright year; add TestAssignApi class using NPUplace in test files

f39c3a5a

03 8月, 2021 2 次提交
- Q
  support Kunlun2 (#34459) · 2d0f3d9b
  由 QingshuChen 提交于 8月 03, 2021
```
* support Kunlun2

* support KL2

* support KL2
```
  2d0f3d9b
- polish sccahce (#34350) · 61e51c18
  由 zhouweiwei2014 提交于 8月 03, 2021
  
  61e51c18
02 8月, 2021 2 次提交

Add basic functions of Program Pass (#34524) · 145cdb5a

由 Zeng Jinle 提交于 8月 02, 2021

* add basic APIs

* add attr_types

* follow comments

* change pass attr types

* add set pass attribute codes

* refine PADDLE_THROW

145cdb5a

Fix Inference CE Error by Topo Order (#34521) · 508b40ec

由 Huihuang Zheng 提交于 8月 02, 2021

The comment background message is too long, see details at https://github.com/PaddlePaddle/Paddle/pull/34521

508b40ec

30 7月, 2021 3 次提交

H

Revert of PR34452 (#34516) · 72a9c8ff
由 Huihuang Zheng 提交于 7月 30, 2021

72a9c8ff

Added reshape, reshape2, squeeze and squeeze2 BF16/FP32 FWD/BWD kernels (#34219) · 22c4c189

由 jakpiase 提交于 7月 30, 2021

* test version of matmul_v2

* added matmul_v2 grad kernel

* minor changes

* minor changes

* minor change for CI approval

* CI fix

* CI fix

* added squeeze and squeeze2 kernels

* CI fix

* CI fix

* CI fix

* disabled tests when compiled with cuda

* added setting format_tag by strides

* added sigmoid BF16 FWD/BWD and gelu BF16 BWD

* changes after review

* Revert "added sigmoid BF16 FWD/BWD and gelu BF16 BWD"

This reverts commit 6e3f76720b545abfcff9f6052b46b73a1e745cae.

* Revert "Merge branch 'matmul_v2_grad' into squeeze2_op"

This reverts commit 06fcf67843a4a7884eccdf67a02a03575e1d4cb8, reversing
changes made to 6e3f76720b545abfcff9f6052b46b73a1e745cae.

* minor change

* added reshape1/2 kernels

* moved some functions into private block

* CI fix

* CI fix

* CI fix

22c4c189

W
add trainer desc config to distributed strategy (#34457) · e6aacd1e
由 wangguanqun 提交于 7月 30, 2021
```
* add trainer desc config to distributed strategy

* code style modified
```
e6aacd1e

29 7月, 2021 5 次提交
- Z
  add fix op run order pass (#34427) · 79e758c6
  由 Zeng Jinle 提交于 7月 29, 2021
```
* add fix op run order pass

* add ut for fix_op_run_order

* fix ci error

* improve coverage

* improve coverge again and fix cpu test case

* follow some comments
```
  79e758c6
- G
  
  Fix allreduce_sum potential bugs on NPU. (#34462) · 02cc3c5e
  由 gongweibao 提交于 7月 29, 2021
  
  02cc3c5e
- Y
  
  fix the allreduce fused bug, test=develop (#34446) · b56dbe08
  由 Yuang Liu 提交于 7月 29, 2021
  
  b56dbe08
- H
  Enable FLAGS_convert_all_blocks (#34452) · 76f94f88
  由 Huihuang Zheng 提交于 7月 29, 2021
```
As the title
```
  76f94f88
- L
  
  [NPU] Avoid cpu tensor freed before copying to npu completed (#34475) · d71b9ba7
  由 Leo Chen 提交于 7月 29, 2021
  
  d71b9ba7
28 7月, 2021 2 次提交
- J
  graph_to_program topology sort (#33949) · 167523e7
  由 jiangcheng 提交于 7月 28, 2021
```
See https://github.com/PaddlePaddle/Paddle/pull/33949 for details
```
  167523e7
- J
  graph_to_program save parameter and stop_gradient information (#33771) · 8a7dee31
  由 jiangcheng 提交于 7月 28, 2021
```
This PR added optional boolean is_parameter and stop_gradient in the VarDesc proto, and remove them during save_inference_model
```
  8a7dee31

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致