提交 · b77fa1d91137af942c0788232ca5ef54cbccc7b6 · BaiXuePrincess / Paddle

07 9月, 2022 1 次提交

[Auto Parallel] Support Iterable dataset for auto parallel (#45518) · b77fa1d9

由 caozhou 提交于 9月 07, 2022

* support iterable dataset for auto parallel

* add split_data proto

* fix unittest bug

* fix recompute bug

* update cmake

b77fa1d9

01 9月, 2022 1 次提交

ps optimizer default config (#45563) · ae217373

由 wangguanqun 提交于 9月 01, 2022

* config

* fix unittest

* zero init & cache & patch config

* add barrier to save and load

* add unittest

ae217373

23 8月, 2022 1 次提交
- Z
  [AutoParallel] Add Quant Pass (#44877) · 61bc016c
  由 zhaoyingli 提交于 8月 23, 2022
```
* add quant pass
```
  61bc016c
13 8月, 2022 1 次提交

fl-ps: support split sparse params in local & remote (#44864) · 3f5c405f

由 ziyoujiyi 提交于 8月 13, 2022

* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

* fl-ps v1.0

* .

* support N + N mode

* .

* .

* .

* .

* delete print

* .

* .

* .

* .

* fix bug

* .

* .

* fl-ps with coordinator ready

* merge dev

* update message parse only

* update fl client scheduler

* fix bug

* update multithreads sync

* fix ci errors

* update role_maker.py

* update role_maker.py

* fix ci error: windows py import error

* fix ci error: windows py import error

* fix windows ci pylib import error

* add dump fields & params

* try to fix windows import fleet error

* fix ps FLAGS error

* fix logging risk

* fix logging possible risk

* write trainer_desc file

* support split sparse params in local & remote

* fix import paddle.fluid.core.PSGPU

* fix import paddle.fluid.core.PSGPU

* add remote_sparse & local_sparse config

* fix unittest

* fix test_dist_fleet_geo table error

* fix PADDLE_ENFORCE error

* fix other's pr conflict

3f5c405f

01 8月, 2022 1 次提交

GPUGraph merge to develop (#44594) · 798670bb

由 danleifeng 提交于 8月 01, 2022

798670bb

29 7月, 2022 1 次提交

[Auto parallel] Optimization Tuning (#43782) · 72f2ed43

由 JZ-LIANG 提交于 7月 29, 2022

* fixed bug for pass & engine

* fixed bug for benchmark GPT-3

* add tuner & profiler

* add algorithms & config

72f2ed43

26 7月, 2022 1 次提交

add horizontal federation learning ps feature (#44327) · 4bc22b69

由 ziyoujiyi 提交于 7月 26, 2022

* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

* fl-ps v1.0

* .

* support N + N mode

* .

* .

* .

* .

* delete print

* .

* .

* .

* .

* fix bug

* .

* .

* fl-ps with coordinator ready

* merge dev

* update message parse only

* update fl client scheduler

* fix bug

* update multithreads sync

* fix ci errors

* update role_maker.py

* update role_maker.py

* fix ci error: windows py import error

* fix ci error: windows py import error

* fix windows ci pylib import error

* add dump fields & params

* try to fix windows import fleet error

* fix ps FLAGS error

4bc22b69

20 7月, 2022 1 次提交
- D
  【GPUPS】Adam accessor (#43919) · b8d106e1
  由 danleifeng 提交于 7月 20, 2022
```
* add adam/sharedadam optimzier for gpups;edit optimizer struct;test=develop
```
  b8d106e1
02 6月, 2022 1 次提交

add federated learning parameter server(fl-ps) mode (#42682) · d999049f

由 ziyoujiyi 提交于 6月 02, 2022

* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

* fl-ps v1.0

* .

* support N + N mode

* .

* .

* .

* .

* delete print

* .

* .

* .

* .

d999049f

01 6月, 2022 1 次提交

Make fuse_gemm_epilogue support transpose_x and transpose_y (#40558) · 048b0013

由 sneaxiy 提交于 6月 01, 2022

* support weight transpose

* add ut

* add template

* fix transpose error

* fix transpose_comment

* add api tests

* add skipif

* add doc

048b0013

19 4月, 2022 1 次提交

double accessor and show_scale (#41943) · 8113c913

由 wangguanqun 提交于 4月 19, 2022

* double accessor and show_scale

* double accessor and show_scale

* rename

* fix bug in pslib config

* add unittest

8113c913

13 4月, 2022 1 次提交

the one ps proto (#41659) · b12af9e1

由 wangguanqun 提交于 4月 13, 2022

* the one ps proto

* the one ps proto

* fix

* fix

* fix

* fix windows ci

* fix windows ci

* add dependency

* add dependency

b12af9e1

31 3月, 2022 1 次提交

fix load bug and add distributed strategy from pslib (#40883) · 47383dca

由 wangguanqun 提交于 3月 31, 2022

* fix load bug and add distributed strategy from pslib

* add unittest

* use cvm config

* trainer and worker config

* add unittest

* add unittest

* add test

* code style

47383dca

28 3月, 2022 1 次提交
- J
  [Auto parallel] Mixed Precision FP16 Pass (#40615) · b99c1d07
  由 JZ-LIANG 提交于 3月 28, 2022
```
*  add FP16 Pass 

* Support the auto completion of while_op

*  acc aligned
```
  b99c1d07
17 1月, 2022 1 次提交
- S
  Add NoReduce mode for ParallelExecutor (#38969) · e50d883e
  由 sneaxiy 提交于 1月 17, 2022
```
* add no reduce mode for pe

* add NoReduce ut
```
  e50d883e
29 12月, 2021 1 次提交

[Auto Parallel] Sharding Pass (#38502) · e3faf345

由 JZ-LIANG 提交于 12月 29, 2021

* auto parallel sharding base

* chmod

* add unitest

* set unitest cmake dist label

* revise code according to rewiew

* chmod

e3faf345

09 12月, 2021 1 次提交
- W
  default accessor and multi table config (#37714) · a9e0d28c
  由 wangguanqun 提交于 12月 09, 2021
```
* default accessor and multi table config

* add unittest

* add unittest

* delete print
```
  a9e0d28c
06 12月, 2021 1 次提交
- K
  
  heter for collective (#37613) · 1bdb8578
  由 kuizhiqing 提交于 12月 06, 2021
  
  1bdb8578
30 11月, 2021 1 次提交
- Z
  
  pscore global shuffle&default accessor config (#37626) · 1514eec6
  由 zhaocaibei123 提交于 11月 30, 2021
  
  1514eec6
24 11月, 2021 1 次提交
- Z
  Adapt auto search (#37490) · 025053b4
  由 zhaoyingli 提交于 11月 24, 2021
```
* adapt auto search

* adapt auto search

* fix matmulv2 compatible

* del debug
```
  025053b4
01 11月, 2021 1 次提交
- Z
  
  memory sparse table & brpc communication upgrade dependency (#36734) · 29c6bcbf
  由 zhaocaibei123 提交于 11月 01, 2021
  
  29c6bcbf
14 10月, 2021 1 次提交
- Y
  
  [hybrid enhance] add flag to control the avg position for grad merge under pipeline mode (#36384) · 03d8304f
  由 Yuang Liu 提交于 10月 14, 2021
  
  03d8304f
08 10月, 2021 1 次提交

Support CUDA Graph on ParallelExecutor (#36250) · f9591bb1

由 Zeng Jinle 提交于 10月 08, 2021

* support CUDA Graph on PE

* add ut, fix CI compile

* reduce memory consumption

* fix CUDA 10 CI

* improve coverage

* improve python coverage

f9591bb1

15 9月, 2021 1 次提交
- W
  
  [hybrid] out data parallel as optimizer sharding parallel (#35593) · 78465703
  由 WangXi 提交于 9月 15, 2021
  
  78465703
08 9月, 2021 1 次提交

[Auto Parallel] Integrate all modules (#35483) · 12155358

由 Yulong Ao 提交于 9月 08, 2021

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* add dist

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update

* update

* delete unused proto

* resotre op_desc

* restore type_defs

* update var_desc

* remove dimss_mapping for proto_pybind

* update interface.py

* update framework.py

* update

* update

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* [WIP] Add the auto completion feature and related codes

* [WIP] Improve the auto completion and related codes

* [WIP] Make the auto completion to support data-parallel

* [WIP] Make the completion support mp and dp+mp

* [WIP] Refactor auto completion unit test for MLP

* [WIP] Refactor the implementation of DistributedOperatorImpl

* [WIP] Improve dims_mapping update rule and fix a bug

* [WIP] Support auto completion for one transformer decoder layer

* [WIP] Add a minor change

* [WIP] Fix a bug within the uint test

* Shard XShape tensor, add embedding completion and refactor code

* Add the distributed_operators dir to setup.py.in

* Improve the completion process and add the unittest for gpt

* fix process_mesh ut

* fix process_mesh ut

* update

* update, test=develop

* Add support for automatically completing distributed attrs of special ops

* update

* update

* update

* fix doc sample codes, test=develop

* improve coverage, test=develop

* add static_mode check, test=develop

* Model the cluster for cost model and physical mapping

* update, test=develop

* add set_placement, test=develop

* Add the check to make sure the candidate tensors' size is great than zero

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update, test=develop

* Auto mark dist attrs annotated by user

* update ndarray to nested list, test=develop

* update, test=develop

* Add auto-completion module for auto-parallel (based on PR#33804)

* Remove unnecessary files

* Remove unrelated files for the auto completion pr

* Update the unit test to improve the coverage

* Modify codes based on reviews

* Minor changes for CI

* Improve some codes based on new comments

* Fix bugs caused by shallow copy in attributes.py
* Imporve amend_distributed_attr_for_program in context.py
* Other changes for weihang's comments

* support shard reader

* support shard reader

* add parallel mode

* update process mesh

* add method to compute comm_group

* implement dist_embedding forward func

* implement dist matmul forward func

* implement dist reshape forward func

* add transpiler framework

* add transpiler forward

* implement transpiler forward

* implement transpiler backward & update

* add process

* add unitest

* chmod

* chmod

* chmod

* update unitest

* add unitest for gpt

* remove unused print

* rename transpiler --> partitioner

* rename transpiler --> partitioner

* chmod

* chmod

* bug fixed

* remove amp function

* update case for dp mode

* update case for dp mode

* [Auto Parallel] Integrate all parts with the newest code

* Integrate all parts of auto parallel and improve codes

* Integrate all parts by AutoParallelizer
* Add unit test for AutoParallelizer
* Improve auto completion module for pipeline parallel
* Add support for matmul_v2 in dist_matmul
* Correct the typo "stratergy" to "strategy"

* Modify distributed_strategy.proto to conform the main stream

* Restore parts of distributed_strategy to conform the develop branch
Co-authored-by: Nsandyhouse <lilong12@baidu.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>

12155358

01 9月, 2021 1 次提交
- S
  [HybridParallel]Support finetinue model for PipelineParallel (#35287) · 264ff9ef
  由 ShenLiang 提交于 9月 01, 2021
```
* add cache for send_recv

* add eval_batch for pipeline

* add eval batch for pipelineparallel

* add style code
```
  264ff9ef
20 8月, 2021 1 次提交
- Y
  
  [hybrid performance] Grad fuse for gradient merge under pipeline mode (#35004) · 4d9b2d6d
  由 Yuang Liu 提交于 8月 20, 2021
  
  4d9b2d6d
18 8月, 2021 1 次提交
- W
  [Hybrid Performance] Move the cast op of AMP which cast fp32 param to fp16... · a9673b44
  由 WangXi 提交于 8月 18, 2021
```
[Hybrid Performance] Move the cast op of AMP which cast fp32 param to fp16 param to the optimizer (#34965)
```
  a9673b44
04 8月, 2021 1 次提交
- 李
  Revert pull request 34212 (#34558) · 09892118
  由李季提交于 8月 04, 2021
```
* revert commit id 34212
```
  09892118
30 7月, 2021 1 次提交
- W
  add trainer desc config to distributed strategy (#34457) · e6aacd1e
  由 wangguanqun 提交于 7月 30, 2021
```
* add trainer desc config to distributed strategy

* code style modified
```
  e6aacd1e
29 7月, 2021 2 次提交
- Z
  add fix op run order pass (#34427) · 79e758c6
  由 Zeng Jinle 提交于 7月 29, 2021
```
* add fix op run order pass

* add ut for fix_op_run_order

* fix ci error

* improve coverage

* improve coverge again and fix cpu test case

* follow some comments
```
  79e758c6
- Y
  
  fix the allreduce fused bug, test=develop (#34446) · b56dbe08
  由 Yuang Liu 提交于 7月 29, 2021
  
  b56dbe08
19 7月, 2021 1 次提交
- 李
  
  set the fuse_all_reduce_ops defalut false (#34212) · 2d5d5f37
  由李季提交于 7月 19, 2021
  
  2d5d5f37
08 7月, 2021 1 次提交
- M
  
  Distributed Automatic SParsity with Fleet (#33558) · 86cb3fb8
  由 Ming-Xu Huang 提交于 7月 08, 2021
  
  86cb3fb8
01 7月, 2021 2 次提交
- Y
  
  gradient scale (#33862) · 57aabbab
  由 Yuang Liu 提交于 7月 01, 2021
  
  57aabbab
- J
  Dygraph/sharding (#33633) · f33f2444
  由 JZ-LIANG 提交于 7月 01, 2021
```
* dygraph sharding

* update unitest hybrid_parallel_communicate_group
```
  f33f2444
21 6月, 2021 1 次提交
- Y
  
  add sync calc stream and add ut for fuse on gpu (#33580) · e0e0c0fa
  由 Yuang Liu 提交于 6月 21, 2021
  
  e0e0c0fa
10 6月, 2021 1 次提交
- B
  
  dp c_allreduce_sum_fusion op (#33169) · 003b4616
  由 Baibaifan 提交于 6月 10, 2021
  
  003b4616
17 5月, 2021 1 次提交
- S
  [HybridParallel]Fix precision problem of model parallel (#32897) · c809530e
  由 ShenLiang 提交于 5月 17, 2021
```
* fix precision of mp

* fix bug of seed

* fix dp

* print group
```
  c809530e
11 5月, 2021 1 次提交
- S
  Support control flow in DataParallel (#32826) · 298f210d
  由 ShenLiang 提交于 5月 11, 2021
```
* fix find_unused_parameters default value
```
  298f210d

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致