提交 · a647b80afcd1918ab32f89ed1316d7f599a43627 · 机器未来 / Paddle

01 9月, 2021 4 次提交
- T
  [HeterPs] merge dense && data norm && g2sum (#35029) · a647b80a
  由 Thunderbrook 提交于 9月 01, 2021
```
* merge dense

* log level

* tensor copy sync

* format
```
  a647b80a
- S
  [HybridParallel]Support finetinue model for PipelineParallel (#35287) · 264ff9ef
  由 ShenLiang 提交于 9月 01, 2021
```
* add cache for send_recv

* add eval_batch for pipeline

* add eval batch for pipelineparallel

* add style code
```
  264ff9ef
- W
  modify fetch logic, use D2H Stream (#35191) · c56d6978
  由 wanghuancoder 提交于 9月 01, 2021
```
* modify fetch logic, use D2H Stream, test=develop

* refine, test=develop
```
  c56d6978
- Q
  support KL label smooth (#35177) · 7ca28bb6
  由 QingshuChen 提交于 9月 01, 2021
```
* support KL label smooth

* update UT for KL label_smooth
```
  7ca28bb6
31 8月, 2021 2 次提交
- A
  Support CostInfo and MemProfiler in InterpreterCore (#34981) · 572bad8a
  由 Aurelius84 提交于 8月 31, 2021
```
* polish code

* fix unittest on windows

* refine pybind interface

* support statistic MemSize of AllocatorPool

* Replace mutex into atomic
```
  572bad8a
- 王
  
  fix the pass compat check position error, test=develop (#35272) · 54f07019
  由王明冬提交于 8月 31, 2021
  
  54f07019
30 8月, 2021 3 次提交
- C
  
  fix using boost::none as the init value when using paddle::optional (#35215) · e864667b
  由 chentianyu03 提交于 8月 30, 2021
  
  e864667b
- C
  [paddle-TRT]support matmul set to int8 in multihead (#34917) · 0043fa8c
  由 ceci3 提交于 8月 30, 2021
```
* update ernie int8
```
  0043fa8c
- A
  Abstract GenerateDeviceEventFlag to shield platforms (#35219) · 20cfa8ba
  由 Aurelius84 提交于 8月 30, 2021
```
* Abstract GenerateDeviceEventFlag to shield platforms

* Remove get_cuda_flags
```
  20cfa8ba
27 8月, 2021 2 次提交

Add fusion_gru and multi_gru to PTQ (Post-Training Quantization) (#33749) · 7debae3a

由 joanna.wozna.intel 提交于 8月 27, 2021

* Add calculation for gru op

* Correct the types

* Remove mkldnn only

* Correct mkldnn ifdef

* Remove mkldnn ifdef

* Separate mkldnn quantizer test

* Correct Windows test

* Check different cmake fix

* Revert cmake change

* Cmake change 2

* Cmake change 3

7debae3a

A
Polish DeviceEvent interface and Remove #ifdef in InterpreterCore (#35196) · 48bf7cbf
由 Aurelius84 提交于 8月 27, 2021
```
* add CPUDeiveEvent

* Polish DeviceEvent code

* Add DEVICE_EVENT_LIBS
```
48bf7cbf

26 8月, 2021 5 次提交

gc for newexecutor (#35085) · f1472039

由 wanghuancoder 提交于 8月 26, 2021

* gc for newexecutor, test=develop

* refine, test=develop

* add interpretercore_gc_helper.h,test=develop

* backup

* gc whit thread and device_event, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* fix bug, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* add CheckGC, test=develop

f1472039

Support Multi-Stream, Single-Thread in New Executor (#35024) · 678a259a

由 Aurelius84 提交于 8月 26, 2021

* Modify into QueueSync QueueAsync

* fix complie on MacOS

* fix pointer

* fix conflict

* polish unittest

* fix windows fetch error

* polish code according reviewer

* fix device_guard on CPU place

678a259a

W

[Inference] Replace unordered_map with map to support subgraph stability (#35147) · a1aae040
由 Wilber 提交于 8月 26, 2021

a1aae040
L

add temporary MultiThreadedWorkQueue (#35158) · e4a8815d
由 liutiexing 提交于 8月 26, 2021

e4a8815d
X

fix the bug of channel-wise quantization for ernie (#34948) · c71025eb
由 XGZhang 提交于 8月 26, 2021

c71025eb

25 8月, 2021 2 次提交
- W
  fix cmaklist for new executor (#35137) · 03cb3132
  由 wanghuancoder 提交于 8月 25, 2021
```
* fix cmaklist for new executor, test=develop

* refine, test=develop

* refine, test=develop
```
  03cb3132
- L
  
  high-performance SingleThreadedWorkQueue (#35086) · 751a7942
  由 liutiexing 提交于 8月 25, 2021
  
  751a7942
24 8月, 2021 5 次提交

add fetch, test=develop (#35019) · a5060b55

由 wanghuancoder 提交于 8月 24, 2021

* add fetch, test=develop

* fix fetch2op, test=develop

* fix fetch2op, test=develop

* refine, test=develop

* fix fetch ctx, test=develop

* add wait, test=develop

* rename fetch2 to fetch_v2, test=develop

* merge, test=develop

a5060b55

W

cache runtime ctx for executor, test=develop (#35108) · 3b0d8a7b
由 wanghuancoder 提交于 8月 24, 2021

3b0d8a7b
王

add the extra and quantization for op def, test=develop (#35076) · cb28753c
由王明冬提交于 8月 24, 2021

cb28753c
Z

add scope guard (#35103) · b0a1d122
由 Zeng Jinle 提交于 8月 24, 2021

b0a1d122

Add auto completion module for auto parallel (#34813) · 93d862b0

由 Yulong Ao 提交于 8月 24, 2021

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* add dist

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update

* update

* delete unused proto

* resotre op_desc

* restore type_defs

* update var_desc

* remove dimss_mapping for proto_pybind

* update interface.py

* update framework.py

* update

* update

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* [WIP] Add the auto completion feature and related codes

* [WIP] Improve the auto completion and related codes

* [WIP] Make the auto completion to support data-parallel

* [WIP] Make the completion support mp and dp+mp

* [WIP] Refactor auto completion unit test for MLP

* [WIP] Refactor the implementation of DistributedOperatorImpl

* [WIP] Improve dims_mapping update rule and fix a bug

* [WIP] Support auto completion for one transformer decoder layer

* [WIP] Add a minor change

* [WIP] Fix a bug within the uint test

* Shard XShape tensor, add embedding completion and refactor code

* Add the distributed_operators dir to setup.py.in

* Improve the completion process and add the unittest for gpt

* fix process_mesh ut

* fix process_mesh ut

* update

* update, test=develop

* Add support for automatically completing distributed attrs of special ops

* update

* update

* update

* fix doc sample codes, test=develop

* improve coverage, test=develop

* add static_mode check, test=develop

* Model the cluster for cost model and physical mapping

* update, test=develop

* add set_placement, test=develop

* Add the check to make sure the candidate tensors' size is great than zero

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update, test=develop

* Auto mark dist attrs annotated by user

* update ndarray to nested list, test=develop

* update, test=develop

* Add auto-completion module for auto-parallel (based on PR#33804)

* Remove unnecessary files

* Remove unrelated files for the auto completion pr

* Update the unit test to improve the coverage

* Modify codes based on reviews

* Minor changes for CI

* Improve some codes based on new comments

* Fix bugs caused by shallow copy in attributes.py
* Imporve amend_distributed_attr_for_program in context.py
* Other changes for weihang's comments
Co-authored-by: Nsandyhouse <lilong12@baidu.com>

93d862b0

23 8月, 2021 1 次提交
- W
  
  trt convert ut add dynamic_shape and int8, etc. (#35061) · 17188e8d
  由 Wilber 提交于 8月 23, 2021
  
  17188e8d
20 8月, 2021 2 次提交
- Y
  
  [hybrid performance] Grad fuse for gradient merge under pipeline mode (#35004) · 4d9b2d6d
  由 Yuang Liu 提交于 8月 20, 2021
  
  4d9b2d6d
- W
  fix set_lod in data_feed (#35000) · 4416c793
  由 wangguanqun 提交于 8月 20, 2021
```
* add trainer desc config to distributed strategy

* code style modified

* data_feed set lod
```
  4416c793
18 8月, 2021 3 次提交

code refactoring for new executor (#34970) · 40d4d834

由 wanghuancoder 提交于 8月 18, 2021

* code refactoring, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

40d4d834

W
[Hybrid Performance] Move the cast op of AMP which cast fp32 param to fp16... · a9673b44
由 WangXi 提交于 8月 18, 2021
```
[Hybrid Performance] Move the cast op of AMP which cast fp32 param to fp16 param to the optimizer (#34965)
```
a9673b44

[CustomOp] Fix ext_tensor.cast failed bug (#34884) · 4d88cdb8

由 Chen Weihang 提交于 8月 18, 2021

* fix ext_tensor.cast failed bug

* remove useless deps

* fix windows cmake failed

* try to fix windows make failed

* fix make error on windwos

4d88cdb8

17 8月, 2021 2 次提交

Copy boost optional to Paddle (#34780) · 9be41447

由 chentianyu03 提交于 8月 17, 2021

* copy boost optional.hpp to paddle

* copy boost optional.hpp to paddle

* move directions

* del fluid/utils

* modify .hpp to .h

* move directions

* modify to paddle::optional

* add modification description

* format code stype for the files in paddle/utils

* format code stype

9be41447

Add some passes which can be applied to Program (#34730) · 8046e33d

由 Zeng Jinle 提交于 8月 17, 2021

* add inplace passes and tests

* update

* fix use_cuda undefined
fix compile error of op compat

* add more ut

* fix CPU CI error

* check adam unique

* fix mac/windows ci, improve coverage

* fix ci error

* follow weihang's comment

* fix BlockDesc::MoveFrom

* follow qiuliang's comment

* update

* follow huihuang's comments

8046e33d

16 8月, 2021 2 次提交
- F
  
  [CPU-PSLIB] Add config for scale_sparse_grad in config_fleet.py,test=develop (#34893) · d028214d
  由 Fan Zhang 提交于 8月 16, 2021
  
  d028214d
- J
  Fix elementwise_add quantization (#34820) · ae80df91
  由 joanna.wozna.intel 提交于 8月 16, 2021
```
* Remove force_fp32_output from elementwise_add quantization

* Fix cpu_quantize_placement test

* Review related changes
```
  ae80df91
13 8月, 2021 2 次提交
- Z
  Bug fix : Can't load multiple modules of custom c++ op (#34505) · fc6b4a50
  由 zyfncg 提交于 8月 13, 2021
```
* Fix a bug : can't load more than one custom op module

* Fix a bug : can't load more than one custom op module

* add test for load multiple modules of custom c++ op

* add config for Coverage CI
```
  fc6b4a50
- Z
  
  fix generator thread safety bug (#34888) · f421741c
  由 Zeng Jinle 提交于 8月 13, 2021
  
  f421741c
11 8月, 2021 4 次提交

W
[Paddle TRT]fix_fc_int8_convert; fix_reshape_convert (#34787) · 3429c04b
由 Wangzheee 提交于 8月 11, 2021
```
* fix_fc_reshape_convert

* fix
```
3429c04b

Add ext_tensor.slice() API (#34227) · 3f011d82

由 Hao Lin 提交于 8月 11, 2021

* Add ext_tensor.slice() API, test=develop

* Call Tensor::mutable_data first to fix bugs and add test for writing to sliced tensor

* Fix unit test bug

* Fix code format problem, test=develop

* Fix code format problem

* Fix code format problem

* strengthen unit test

* Use CustomTensorUtils::ShareDataFrom to simplify codes

3f011d82

L
add the basic apis for auto_parallel (#33804) · 3f962e77
由 lilong12 提交于 8月 11, 2021
```
* add auto_parallel apis
```
3f962e77

Add no need output to gc check list (#34754) · 17c1dae9

由 hong 提交于 8月 11, 2021

* add not used output var to gc_check_list; test=develop

* add useless output to gc check list; test=develop

17c1dae9

10 8月, 2021 1 次提交

copy boost/any.hpp to utils and replace boost::any with self defined any (#34613) · 12892929

由 chentianyu03 提交于 8月 10, 2021

* add any.hpp to utils and replace boost::any with self defined paddle::any

* add copy any.hpp to custom op depends

* modify any.hpp include path

* remove boost from setup.py.in

* add copy any.hpp to custom op depends

* move any.hpp to paddle/utils/ dirs

* move any.h to extension/include direction

* copy utils to right directions

12892929

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致