提交 · a16ef9f1ea82b62c15ea314bf4286b049b0bef91 · PaddlePaddle / Paddle

26 10月, 2022 1 次提交
- R
  
  fix a bug that print log twice (#47336) (#47343) · a16ef9f1
  由 Roc 提交于 10月 26, 2022
  
  a16ef9f1
24 10月, 2022 3 次提交

Y

Fix virtualpp with mp/recompute bugs (#47242) (#47249) · 9780eb72
由 Yuang Liu 提交于 10月 24, 2022

9780eb72

Support BF16 training for sharding (#46846) (#47246) · 5c85f1a7

由 Ghost Screaming 提交于 10月 24, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* Support bfloat16 type for reducer and sharding.

* Fix some bug.

* Polish code.

* Polise code.

* Add bfloat16 datatype in fill_grad kernels.
Co-authored-by: Nsneaxiy <sneaxiy@126.com>
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

5c85f1a7

R

fix send for old dygraph mode by passing use_calc_stream to the send op (#47110) (#47201) · 82f1e1b7
由 Roc 提交于 10月 24, 2022

82f1e1b7

21 10月, 2022 1 次提交
- H
  
  support qat in sharding stage2 (#47169) (#47240) · 281891c5
  由 Haohongxiang 提交于 10月 21, 2022
  
  281891c5
20 10月, 2022 1 次提交
- W
  Fix cannot import `paddle.distributed` in python 3.6 on release/2.4 (#47141) · c894d91d
  由 Wen Sun 提交于 10月 20, 2022
```
* fix: fix incorrect import

* fix: fix incorrect usage
```
  c894d91d
19 10月, 2022 2 次提交

[Cherry-Pick][AutoParallel] auto_parallel cherry-pick to release2.4 (#47145) · 90b31790

由 zhaoyingli 提交于 10月 19, 2022

* [Auto Parallel] Make Engine class callable (#46416)

* [Auto Parallel] Imporve the user-defined fetches and logging

* [Auto Parallel] Make Engine class callable

* [Auto Parallel] Update the data loading of tuner

* Print IPS in auto parallel Engine (#46554)

* [AutoParallel] fix dist_split (#46505)

* [AutoParallel] fix dist_split

* add unittest

* update cmakelist

* [AutoParallel] fix sharding (#46572)

* [AutoParallel] fix process_mesh (#46583)

* [AutoParallel] fix reshard when train with eval (#46605)

* [AutoParallel] fix reshard when train with eval

* fix mppp

* [AutoParallel] fix amp when predict (#46637)

* [Auto Parallel]Update comp cost and completion for gpt auto search (#46387)

* update comp cost and completion for gpt auto search

* add unittest

* [Auto Parallel] Fix bugs caused by the inconsistent outputs of Engine API (#46633)

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Improve the fine-grained APIs (#46552)

* [Auto Parallel] Suppport different dataloaders

* [Auto Parallel] Add num_shards config for dataset

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Add the prepare API and replace __call__ with run

* [Auto Parallel] Improve the private implementations of Engine

* [Auto Parallel] Set capacity of dataloader for opt tuning

* [Auto Parallel] [WIP] Change the fine-grained API

* [Auto Parallel] Improve APIs to support different user cases

* [Auto Parallel] Add removed config

* [Auto Parallel] Add imports

* [Auto Parallel] Fix bugs for to_static

* [Auto Parallel] Remove unnecessary imports

* bugfix (#46921)

* [Auto Parallel] Fix the bug for None labels (#46987)

* [AutoParallel] adapt for gpt-gen (#46771)

* for gpt-gen

* fix reshard

* adapt assign and shape op

* add dist_assign & unittest

* add conditional block unittest

* rename unittest

* [Auto Parallel] Fix the bug of completion (#47056)

* [Auto Parallel] Fix the bug for None labels

* [Auto Parallel] Fix the completion bug

* [AutoParallel] add callbacks (#47014)

* [AutoParallel] add callbacks

* fix unittest

* fix dist_context

* fix engine

* fix cmakelist

* fix unittest's returns

* fix cmakelist

* [Auto Parallel] Add cost interface (#47043)

* add cost interface

* update inferface and add unittest

* update unittest

* update inferface

* [Auto Parallel]Add parallel tuner (#46189)

* add parallel tuner

* add unittest

* fix unittest

* set timeout of unittest

* set unittest timeout

* fix auto_mode setting

* update unittest

* sync from develop and update unittest

* remove unused import

* update unittest

* update cmakelist

* add unittests
Co-authored-by: NYulong Ao <aoyulong@baidu.com>
Co-authored-by: NRuibiao Chen <chenruibiao@baidu.com>
Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>

90b31790

Add enable_partial_send_recv switch in pipeline_configs (#46992) (#47083) · 1d015f12

由 Ghost Screaming 提交于 10月 19, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* Support allow_partial switch, which can be configure in
pipeline_configs. If sent tensor are not the same from
different hosts, they shouldn't been sent partially and
then concated as a whole tensor.

* Change name allow_partial to enable_partial_send_recv.

* Add global variable _enable_partial_send_recv

1d015f12

18 10月, 2022 2 次提交

Cherry pick for sharding (#47061) · 5b642140

由 Yuang Liu 提交于 10月 18, 2022

* [dygraph sharding] Overlap the reduce and the caculation for sharding stage 2. (#46495)

* [dygraph sharding stage 2] sharding broadcast overlap (#46656)

* Multi groups for broadcast of sharding stage 2 (#46894)

5b642140

[cherry-pick] Fix perf issues of mp/pp/fuse in eager mode (#47071) · b84edd90

由 Haohongxiang 提交于 10月 18, 2022

* [Dygraph] Fix performance of pp+mp by using send/recv_calc_stream instead of send/recv (#46116)

* [Dygraph] Fix Perf of FusedFeedForward and FusedAttention with AllReduce (#46780)

* update

b84edd90

17 10月, 2022 1 次提交

[Cherry-pick] Collective communication APIs (#46922) · 5fba2a98

由 Wen Sun 提交于 10月 17, 2022

* Support both use_calc_stream and sync_op in send recv APIs (#46023)

* Support both use_calc_stream and sync_op in allgather API (#46295)

* Support both use_calc_stream and sync_op in collective communication API (#46761)

* Move group and all reduce from collective to communication (#45848)

* Completes bfloat16 dtype for collective api in eager mode (#45844)

* Fix collective APIs cannot be recognized when building docs (#46962)
Co-authored-by: NLiYuRio <63526175+LiYuRio@users.noreply.github.com>

5fba2a98

11 10月, 2022 1 次提交

Cherry pick for dygraph pp (#46876) · 9cc3f69f

由 Yuang Liu 提交于 10月 11, 2022

* bug fix for virtual pipeline parallel (#45922)

* dont wait for send op under dygraph pp (#46209)

* [interleave pp] sync recv for 1f1b (#46399)

* [dygraph pp] all sync for allgather partial (#46483)

9cc3f69f

27 9月, 2022 2 次提交
- Z
  
  [AutoParallel] fix amp o1 (#46391) (#46481) · 5dab0b0d
  由 zhaoyingli 提交于 9月 27, 2022
  
  5dab0b0d
- L
  
  change use_calc_stream to sync_op (#46182) (#46493) · 8089a1fb
  由 LiYuRio 提交于 9月 27, 2022
  
  8089a1fb
26 9月, 2022 1 次提交

cherry-pick V2.4 (#46358) · 536d9d8c

由 ziyoujiyi 提交于 9月 26, 2022

* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

* fix gloo compile warning

* adapt for nn fl-ps

* flps del fake-init op

* add learning_rate_0 intializer op

* bug fix

* .

* .

536d9d8c

22 9月, 2022 3 次提交
- R
  logger manager (#45909) (#46087) · 7eb046c7
  由 Roc 提交于 9月 22, 2022
```
uniform logger manager in FleetAPI.
hidde API under distributed/utils which users don't need.
```
  7eb046c7
- H
  [Dygraph] Fix bugs of mp in eager mode (#46303) (#46396) · 372505be
  由 Haohongxiang 提交于 9月 22, 2022
```
* fix bugs of mp

* fix bugs of mp

* update

* update

* fix bug
```
  372505be
- Z
  
  [Auto Parallel] fix lazyinit (#46355) (#46382) · 083853cd
  由 zhaoyingli 提交于 9月 22, 2022
  
  083853cd
20 9月, 2022 3 次提交

cherry-pick V2.4 (#46294) · 3e8b3220

由 ziyoujiyi 提交于 9月 20, 2022

* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

* fix gloo compile warning

* adapt for nn fl-ps

* flps del fake-init op

* add learning_rate_0 intializer op

3e8b3220

H
[PolishComments] Polish some code comments (#46032) (#46261) · 42e56f65
由 HongyuJia 提交于 9月 20, 2022
```
* polish code comments

* polish data_device_transform.cc
```
42e56f65

[Cherry-Pick][AutoParallel] change import way and fix strategy (#46270) · c43ebfcf

由 zhaoyingli 提交于 9月 20, 2022

* [Auto Parallel] Change the import way of Auto Parallel (#46115)

* fix strategy (#46256)

* [Auto Parallel] performance improvement for Sharding-DP hybrid parallelism (#46180)

* remove no need grad allreduce communication when sharding-dp

* remove no need grad allreduce communication when sharding-dp

* bugfix

* bugfix

* bugfix
Co-authored-by: NYulong Ao <aoyulong@baidu.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>

c43ebfcf

19 9月, 2022 6 次提交

W

Recompute unify incubate (#46073) (#46210) · 4bced24a
由 wuhuachaocoding 提交于 9月 19, 2022

4bced24a

[cherry-pick] add abs,mean,sum,ge,gt,pow,etc higher-order differentiation operators (#46184) · ad8beaaf

由 Xiaoxu Chen 提交于 9月 19, 2022

* [cherry-pick] extend reduce_sum,reduce_sum,eq,ne,ge,abs,pow,etc higher order operators

* add reduce_mean,reduce_sum primitive ops
* add ne_p gt_p primitive operators
* add ge_p abs_p primitive oparators
* add cast primitive operators
* add pow,square prim2oirg rules
* add elementwise_div orig2prim rule

* [cherry-pick] add mean,sum,ge,gt,ne,abs,etc higher-order differentiation operators(#45888)

* add reduce_mean,reduce_sum primitive ops

* add ne_p gt_p primitive operators

* add ge_p abs_p primitive oparators

ad8beaaf

refactor mp. (#45803) (#46121) · e5dc9d61

由 wuhuachaocoding 提交于 9月 19, 2022

* refactor mp.

* update setup.py.

* update mp_layers.py for compatibility.

* add documents for mp_layers.py

* update init.py

* update collective.py.

* update.

* update mp_ops.py

* update.

* update code style.

* update code style.

e5dc9d61

[Cherry-pick][Auto Parallel] Improve the APIs (#46164) · c5cc4278

由 Yulong Ao 提交于 9月 19, 2022

* [AutoParallel] adapt gradient merge pass (#45915)

* adapt gradient merge

* fix op_role

* fix strategy

* [Auto Parallel] Gradient Fuse Allreduce (#45643)

* bugfix (#45332)

* dist embedding support lookup table v1

* add unitest

* customize wait_comm

* group gradients

* bugfix

* update program

* [Auto Parallel] Improve the APIs (#45776)

* [Auto Parallel] Use c++ dist attr in the completion process

* [Auto Parallel] Add minor changes

* [Auto Parallel] Use c++ dist attr in the completion process

* [Auto Parallel] Add minor changes

* [Auto Parallel] Add the serialization process for dist attrs

* [Auto Parallel] Remove unnecessary comments

* [Auto Parallel] Fix some bugs

* [Auto Parallel] Fix the code style

* [Auto Parallel] Remove unnecessary impls

* [Auto Parallel] Fix the importing error

* [Auto Parallel] Fix the copy from bugs of op dist attr

* [Auto Parallel] Replace the use of constexpr if

* [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh

* [Auto Parallel] Change API of the completion unittest

* [Auto Parallel] Fix the bug when set_attr an int

* [Auto Parallel] Add the unittest for the serialization

* [Auto Parallel] Add some unit tests

* [Auto Paralle] Unify the strategy

* [Auto Parallel] Improve the engine api

* [Auto Parallel] Reset the changes made to the framework

* [Auto Parallel] Change the engine unittest

* [Auto Parallel] Update API of the completion and partitioner

* [Auto Parallel] Update unit tests using engine api

* update shard annotation

* [Auto Parallel] Remove the modifications of other modules

* [Auto Parallel] Add docs for APIs

* add new strategy

* [Auto Parallel] Replace the logger

* [Auto Parallel] Restore the test_program.py

* [Auto Parallel] Change the import rules

* [Auto Parallel] Add the examples for Engine

* [Auto Parallel] Do some minor changes

* [Auto Parallel] Remove yaml dependency

* [Auto Parallel] Fix the unittests

* add valid after train

* bug fix
Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>

* [Auto Parallel] Bugfix allreduce fuse for MP (#46086)

* bugfix

* bugfix

* typos fixed

* update strategy (#46138)
Co-authored-by: Nzhaoyingli <86812880+zhaoyinglia@users.noreply.github.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>

c5cc4278

C
Revert "Simplify size op impl (#45808)" (#46168) · dabb8f23
由 Chen Weihang 提交于 9月 19, 2022
```
This reverts commit c252b1de.
```
dabb8f23
S

rename fleetx, develop=document_fix (#46141) · 7a6db0a3
由 ShenLiang 提交于 9月 19, 2022

7a6db0a3

17 9月, 2022 1 次提交

V2.4 - cherry-pick (#46126) · a76fa414

由 ziyoujiyi 提交于 9月 17, 2022

* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

* fix gloo compile warning

* adapt for nn fl-ps

a76fa414

15 9月, 2022 1 次提交
- C
  
  fix distributed bug caused by fill_any_like (#45978) (#46041) · 9012e8bc
  由 Charles-hit 提交于 9月 15, 2022
  
  9012e8bc
09 9月, 2022 3 次提交
- Z
  [AutoParallel] adapt lazyinit & fix pass (#45840) · bc2265f8
  由 zhaoyingli 提交于 9月 09, 2022
```
* adapt lazy init and fix pass

* add unittest

* update comment

* fix amp and sharding

* remove clip_by_norm
```
  bc2265f8
- C
  Simplify size op impl (#45808) · c252b1de
  由 Chen Weihang 提交于 9月 09, 2022
```
* simplify size op

* trans to cuda manuly

* fix copy error
```
  c252b1de
- Y
  
  fix dygraph pp + mp nan after async send/recv (#45869) · 5d7e1c91
  由 Yuang Liu 提交于 9月 09, 2022
  
  5d7e1c91
08 9月, 2022 1 次提交
- L
  
  add group argument (#44758) · bb725e3a
  由 LiYuRio 提交于 9月 08, 2022
  
  bb725e3a
07 9月, 2022 2 次提交
- Y
  
  [dygraph hybrid pp for interleave] Save/Load for interleaved pipeline. (#45797) · a9cc0274
  由 Yuang Liu 提交于 9月 07, 2022
  
  a9cc0274
- C
  [Auto Parallel] Support Iterable dataset for auto parallel (#45518) · b77fa1d9
  由 caozhou 提交于 9月 07, 2022
```
* support iterable dataset for auto parallel

* add split_data proto

* fix unittest bug

* fix recompute bug

* update cmake
```
  b77fa1d9
06 9月, 2022 2 次提交
- Y
  
  [dygraph hybrid pp for interleave] The interleave scheduler for pipeline parallel (#45497) · 72b5b5bf
  由 Yuang Liu 提交于 9月 06, 2022
  
  72b5b5bf
- W
  
  Completes basic dtypes for collective api in eager mode (#45574) · 7a92e74b
  由 Wen Sun 提交于 9月 06, 2022
  
  7a92e74b
05 9月, 2022 1 次提交
- Z
  [AutoParallel] dist_matmul trans_x or trans_y (#45678) · 4a9895b1
  由 zhaoyingli 提交于 9月 05, 2022
```
* dist_matmul trans

* update unittest

* update cmakelist
```
  4a9895b1
02 9月, 2022 2 次提交
- J
  [Auto Parallel] DP Calc-Comm Overlapping Support Weight Sharing (#45443) · 9b5e0154
  由 JZ-LIANG 提交于 9月 02, 2022
```
* bugfix (#45332)

* customize wait_comm
```
  9b5e0154
- W
  
  update some input for pp and moe about recompute. (#45628) · 4c780311
  由 wuhuachaocoding 提交于 9月 02, 2022
  
  4c780311

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功