提交 · 4bacf2abd4ca58515288396dcf8fff910dff89d0 · PaddlePaddle / Paddle

31 10月, 2022 1 次提交

2.4/fix engine build (#47462) · 4b3589fb

由 zhaoyingli 提交于 10月 31, 2022

* update codestyle

* [AutoParallel] fix fp16 for subblock (#47189)

* [AutoParallel] fix fp16 for subblock

* fix engine

* fix comment

* [AutoParallel] fix engine _build and cost method (#47263)

* fix engine build method

* fix import

* update engine cost

* update raise error

* update cmakelist

* revert optimizer

* revert optimizer

* fix unittest

* fix unittest
Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>

4b3589fb

19 10月, 2022 1 次提交

[Cherry-Pick][AutoParallel] auto_parallel cherry-pick to release2.4 (#47145) · 90b31790

由 zhaoyingli 提交于 10月 19, 2022

* [Auto Parallel] Make Engine class callable (#46416)

* [Auto Parallel] Imporve the user-defined fetches and logging

* [Auto Parallel] Make Engine class callable

* [Auto Parallel] Update the data loading of tuner

* Print IPS in auto parallel Engine (#46554)

* [AutoParallel] fix dist_split (#46505)

* [AutoParallel] fix dist_split

* add unittest

* update cmakelist

* [AutoParallel] fix sharding (#46572)

* [AutoParallel] fix process_mesh (#46583)

* [AutoParallel] fix reshard when train with eval (#46605)

* [AutoParallel] fix reshard when train with eval

* fix mppp

* [AutoParallel] fix amp when predict (#46637)

* [Auto Parallel]Update comp cost and completion for gpt auto search (#46387)

* update comp cost and completion for gpt auto search

* add unittest

* [Auto Parallel] Fix bugs caused by the inconsistent outputs of Engine API (#46633)

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Improve the fine-grained APIs (#46552)

* [Auto Parallel] Suppport different dataloaders

* [Auto Parallel] Add num_shards config for dataset

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Add the prepare API and replace __call__ with run

* [Auto Parallel] Improve the private implementations of Engine

* [Auto Parallel] Set capacity of dataloader for opt tuning

* [Auto Parallel] [WIP] Change the fine-grained API

* [Auto Parallel] Improve APIs to support different user cases

* [Auto Parallel] Add removed config

* [Auto Parallel] Add imports

* [Auto Parallel] Fix bugs for to_static

* [Auto Parallel] Remove unnecessary imports

* bugfix (#46921)

* [Auto Parallel] Fix the bug for None labels (#46987)

* [AutoParallel] adapt for gpt-gen (#46771)

* for gpt-gen

* fix reshard

* adapt assign and shape op

* add dist_assign & unittest

* add conditional block unittest

* rename unittest

* [Auto Parallel] Fix the bug of completion (#47056)

* [Auto Parallel] Fix the bug for None labels

* [Auto Parallel] Fix the completion bug

* [AutoParallel] add callbacks (#47014)

* [AutoParallel] add callbacks

* fix unittest

* fix dist_context

* fix engine

* fix cmakelist

* fix unittest's returns

* fix cmakelist

* [Auto Parallel] Add cost interface (#47043)

* add cost interface

* update inferface and add unittest

* update unittest

* update inferface

* [Auto Parallel]Add parallel tuner (#46189)

* add parallel tuner

* add unittest

* fix unittest

* set timeout of unittest

* set unittest timeout

* fix auto_mode setting

* update unittest

* sync from develop and update unittest

* remove unused import

* update unittest

* update cmakelist

* add unittests
Co-authored-by: NYulong Ao <aoyulong@baidu.com>
Co-authored-by: NRuibiao Chen <chenruibiao@baidu.com>
Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>

90b31790

20 9月, 2022 1 次提交

[Cherry-Pick][AutoParallel] change import way and fix strategy (#46270) · c43ebfcf

由 zhaoyingli 提交于 9月 20, 2022

* [Auto Parallel] Change the import way of Auto Parallel (#46115)

* fix strategy (#46256)

* [Auto Parallel] performance improvement for Sharding-DP hybrid parallelism (#46180)

* remove no need grad allreduce communication when sharding-dp

* remove no need grad allreduce communication when sharding-dp

* bugfix

* bugfix

* bugfix
Co-authored-by: NYulong Ao <aoyulong@baidu.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>

c43ebfcf

19 9月, 2022 1 次提交

[Cherry-pick][Auto Parallel] Improve the APIs (#46164) · c5cc4278

由 Yulong Ao 提交于 9月 19, 2022

* [AutoParallel] adapt gradient merge pass (#45915)

* adapt gradient merge

* fix op_role

* fix strategy

* [Auto Parallel] Gradient Fuse Allreduce (#45643)

* bugfix (#45332)

* dist embedding support lookup table v1

* add unitest

* customize wait_comm

* group gradients

* bugfix

* update program

* [Auto Parallel] Improve the APIs (#45776)

* [Auto Parallel] Use c++ dist attr in the completion process

* [Auto Parallel] Add minor changes

* [Auto Parallel] Use c++ dist attr in the completion process

* [Auto Parallel] Add minor changes

* [Auto Parallel] Add the serialization process for dist attrs

* [Auto Parallel] Remove unnecessary comments

* [Auto Parallel] Fix some bugs

* [Auto Parallel] Fix the code style

* [Auto Parallel] Remove unnecessary impls

* [Auto Parallel] Fix the importing error

* [Auto Parallel] Fix the copy from bugs of op dist attr

* [Auto Parallel] Replace the use of constexpr if

* [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh

* [Auto Parallel] Change API of the completion unittest

* [Auto Parallel] Fix the bug when set_attr an int

* [Auto Parallel] Add the unittest for the serialization

* [Auto Parallel] Add some unit tests

* [Auto Paralle] Unify the strategy

* [Auto Parallel] Improve the engine api

* [Auto Parallel] Reset the changes made to the framework

* [Auto Parallel] Change the engine unittest

* [Auto Parallel] Update API of the completion and partitioner

* [Auto Parallel] Update unit tests using engine api

* update shard annotation

* [Auto Parallel] Remove the modifications of other modules

* [Auto Parallel] Add docs for APIs

* add new strategy

* [Auto Parallel] Replace the logger

* [Auto Parallel] Restore the test_program.py

* [Auto Parallel] Change the import rules

* [Auto Parallel] Add the examples for Engine

* [Auto Parallel] Do some minor changes

* [Auto Parallel] Remove yaml dependency

* [Auto Parallel] Fix the unittests

* add valid after train

* bug fix
Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>

* [Auto Parallel] Bugfix allreduce fuse for MP (#46086)

* bugfix

* bugfix

* typos fixed

* update strategy (#46138)
Co-authored-by: Nzhaoyingli <86812880+zhaoyinglia@users.noreply.github.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>

c5cc4278

09 9月, 2022 1 次提交

[AutoParallel] adapt lazyinit & fix pass (#45840) · bc2265f8

由 zhaoyingli 提交于 9月 09, 2022

* adapt lazy init and fix pass

* add unittest

* update comment

* fix amp and sharding

* remove clip_by_norm

bc2265f8

07 9月, 2022 1 次提交

[Auto Parallel] Support Iterable dataset for auto parallel (#45518) · b77fa1d9

由 caozhou 提交于 9月 07, 2022

* support iterable dataset for auto parallel

* add split_data proto

* fix unittest bug

* fix recompute bug

* update cmake

b77fa1d9

23 8月, 2022 2 次提交
- J
  
  bugfix (#45332) · 257438f3
  由 JZ-LIANG 提交于 8月 23, 2022
  
  257438f3
- J
  [Auto Parallel] Data Parallel Comm & Calc Overlap Optimization (#45173) · 229befc8
  由 JZ-LIANG 提交于 8月 23, 2022
```
* bugfix

* remove scaling

* support rescale_grad opt

* add unitest
```
  229befc8
18 8月, 2022 1 次提交
- Z
  [AutoParallel] support ClipGradByGlobalNorm (#45205) · bb6bd223
  由 zhaoyingli 提交于 8月 18, 2022
```
* add clip_grad

* fix comments

* add unittest

* update logger
```
  bb6bd223
15 8月, 2022 1 次提交
- Z
  [AutoParallel] add collate_fn for dist_loader (#45053) · 3649099f
  由 zhaoyingli 提交于 8月 15, 2022
```
* add collate_fn

* fix number of inputs
```
  3649099f
03 8月, 2022 1 次提交
- J
  
  [Auto Parallel] Unify gradient synchronization procedure of data parallel (#44815) · 70770d0d
  由 JZ-LIANG 提交于 8月 03, 2022
  
  70770d0d
29 7月, 2022 1 次提交

[Auto parallel] Optimization Tuning (#43782) · 72f2ed43

由 JZ-LIANG 提交于 7月 29, 2022

* fixed bug for pass & engine

* fixed bug for benchmark GPT-3

* add tuner & profiler

* add algorithms & config

72f2ed43

25 7月, 2022 1 次提交
- A
  [dy2st]Add ProgramHelper to polish build program logic in autoparallel.Engine (#44513) · 243acdb4
  由 Aurelius84 提交于 7月 25, 2022
```
* [dy2st]Add ProgramHelper to polish build program logic in autoparallel.Engine

* refine code
```
  243acdb4
18 7月, 2022 1 次提交
- C
  
  [auto parallel] remove comm init control (#44385) · 876e2ff1
  由 caozhou 提交于 7月 18, 2022
  
  876e2ff1
13 7月, 2022 2 次提交
- J
  [Auto parallel] Accelerate procedure of partitioning and generating dist graphs (#44224) · 07f33da9
  由 JZ-LIANG 提交于 7月 13, 2022
```
* avoid sync with cpp in partition op

* delay eval & predict mode

* bugfix for gradient merge pass
```
  07f33da9
- C
  [Auto Parallel] Add comm init control by socket (#44148) · 7dc7fc4b
  由 caozhou 提交于 7月 13, 2022
```
* add comm init control by socket

* avoid single card instance failure
```
  7dc7fc4b
11 7月, 2022 1 次提交
- Z
  [AutoParallel] add 'to_static' in engine api (#44202) · 13a250a2
  由 zhaoyingli 提交于 7月 11, 2022
```
* add 'to_static' in engine api

* fix cmakelist
```
  13a250a2
07 7月, 2022 1 次提交
- Z
  [AutoParallel] fix 'op_role' for gradient merge & recompute (#44138) · db2c71a4
  由 zhaoyingli 提交于 7月 07, 2022
```
* fix op_role

* fix engine

* update op_role
```
  db2c71a4
29 6月, 2022 1 次提交
- J
  [Auto parallel] Bug fixed for GPT3 benchmark (#43793) · 74c9b57b
  由 JZ-LIANG 提交于 6月 29, 2022
```
* fixed bug for pass & engine

* fixed bug for benchmark GPT-3
```
  74c9b57b
24 6月, 2022 1 次提交

[Auto Parallel] Use a fast completion for data parallelism (#43585) · e64823c1

由 Yulong Ao 提交于 6月 24, 2022

* [Auto Parallel] Use a fast completion for data parallelism

* remove unuse cuSparse function

* [Auto Parallel] Fix some bugs of the fast dp completion

* [Auto Parallel] Add the cmake statements

* [Auto Parallel] Make the unittest adapt to the new interface

* [Auto Parallel] Modify the timeout of the unittest

* [Auto Parallel] Remove unnecessary comments
Co-authored-by: Nzhouwei25 <zhouwei25@baidu.com>

e64823c1

13 6月, 2022 1 次提交
- Z
  [AutoParallel] fix fetch list (#43412) · 562b184c
  由 zhaoyingli 提交于 6月 13, 2022
```
* fix fetch list

* fix unittest
```
  562b184c
08 6月, 2022 1 次提交
- Z
  [AutoParallel] add fetch_list in engine api (#43312) · 971e4791
  由 zhaoyingli 提交于 6月 08, 2022
```
* add fetch_list

* fix evaluate log

* tiny fix
```
  971e4791
05 6月, 2022 1 次提交

【code format check upgrade】 step2：yapf (#42944) · a072fca8

由 Sing_chan 提交于 6月 05, 2022

* use yapf to format all python file

* yapf exclude two unittests file for they rely on writing and reading file, and format will break them

* disable diff_py_file because too many diff files cause command following failed

a072fca8

02 6月, 2022 1 次提交
- Z
  [AutoParallel] engine.prepare only once (#43093) · 8c7cb3d6
  由 zhaoyingli 提交于 6月 02, 2022
```
* prepare only once
```
  8c7cb3d6
01 6月, 2022 1 次提交

[Auto Parallel] Add miscellaneous improvements (#43108) · 010aba33

由 Yulong Ao 提交于 6月 01, 2022

* [Auto Parallel] Add the parallel tuner

* [Auto Parallel] Improve the parallel tuner and fix some bugs

* upodate cost model

* update import Resharder by dist op

* update cost model

* fix comp cost bug

* update cost model

* [Auto Parallel] Amend the dist attr for #processses=1

* update cost model and tuner

* update cost model and tuner

* update cost model and tuner

* update cluster

* update reshard

* [Auto Parallel] Add the estimation from the cost model

* [Auto Parallel] Reimplement the backup and restore functions

* [Auto Parallel] Fix the bugs of the parallel tuner

* [Auto Parallel] Update the engine api and dist context

* [Auto Parallel] Work around the high order grad problem

* [Auto Parallel] Add some miscellaneous improvements

* [Auto Parallel] Add a unittest for DistributedContext
Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>

010aba33

19 5月, 2022 1 次提交
- Z
  [AutoParallel] split data in dataloader (#42838) · df470954
  由 zhaoyingli 提交于 5月 19, 2022
```
* slice data in dist_loader & flag to scale grad

* bug fix

* update unittest

* enable static
```
  df470954
10 5月, 2022 1 次提交

[Auto Parallel] Refactor the engine api and parallelizer (#42576) · 83a4b26a

由 Yulong Ao 提交于 5月 10, 2022

* [Auto Parallel] Refactor the engine api and parallelizer

* [Auto Parallel] Fix the default dist op for the slice op

* [Auto Parallel] Fix the format of planer.py

* [Auto Parallel] Fix a bug

83a4b26a

07 5月, 2022 1 次提交

[Auto Parallel] Improve the codes of the completion and distributed context (#40671) · bed9aaea

由 Yulong Ao 提交于 5月 07, 2022

* [Auto Parallel] Replace the old planner by the new partition tuner

* [Auto Parallel] Improve the completion and distributed context

* [Auto Parallel] Fix some bugs of the compatible check of some dist ops

* [Auto Parallel] Fix some bugs

bed9aaea

06 5月, 2022 1 次提交

[AutoParallel] adapt for 2d laplace (#41601) · c043a21b

由 zhaoyingli 提交于 5月 06, 2022

* add default_ctx in backward.py

* record grad_var_to_var with grad_times

* fix backward

* update annotation

* add complete_high_order_grad in complete_forward

* add dist slice op

* update grad_var_to_var type

* update partition_block init mapping before loss op

* update compatible for 'XShape' & update 'allreduce_vars'

* add dist reshape op when input dim equal to output dim

* update 'set_grad_var_shape' with grad_var_to_var

* fix dist slice

* fix set_grad_var_shape

* add dist pnorm op

* fix dist pnorm dist_attr

* fix engine startprogram & adapt highorder grad

* fix set_grad_var_shape when mp

* update unittest

* update cmakelist

* default strategy in engine: dp

* bug fix

* tiny fix

* flatten outputs

* fix default strategy

* init default ctx

* tiny fix

* test=allcase

c043a21b

18 4月, 2022 1 次提交

[Auto parallel] Transformer MHA & FFN Fused Dist op (#41163) · ceef73c9

由 JZ-LIANG 提交于 4月 18, 2022

* adapot dist op

* [Auto Parallel] Support the auto completion of while_op

* add dist_fill_constant_batch_size_like

* align infer  accuracy

ceef73c9

28 3月, 2022 1 次提交
- C
  [Auto Parallel] Update reshard (#40865) · d101334c
  由 caozhou 提交于 3月 28, 2022
```
* fix code stype

* update unitest
```
  d101334c
23 3月, 2022 1 次提交
- Z
  [AutoParallel] engine & dist_saver (#40528) · 3980e222
  由 zhaoyingli 提交于 3月 23, 2022
```
* add dist_saver and update engine

* add dist_saver and update engine
```
  3980e222
16 3月, 2022 1 次提交

[Auto Parallel] Add the support for the auto completion of while_op (#39939) · ec6b8fbd

由 Yulong Ao 提交于 3月 16, 2022

* [Auto Parallel] Support the auto completion of while_op

* [Auto Parallel] Improve the completion algorithms

* [Auto Parallel] Fix bugs for ernie inference

* [Auto Parallel] Remove attrs which cannot be pickled

* [Auto Parallel] make the dims_mappings of LodTensorArray vars empty

* [Auto Parallel] Fix bugs for the ernie inference in the pipeline parallel

* [Auto Parallel] Remove unncessary comments

* [Auto Parallel] Fix a bug of the CMakeLists

* [Auto Parallel] Use the newest APIs to write the unit test

* [Auto Parallel] Remove unnecessary statements

ec6b8fbd

07 3月, 2022 1 次提交

[AutoParallel]engine support pp (#40084) · 71cb016c

由 zhaoyingli 提交于 3月 07, 2022

* engine support pp

* fix format

* avoid multi print

* fix convert

* bug fix

* add pp unittest

71cb016c

24 2月, 2022 1 次提交
- J
  
  fix bug for block state (#39854) · 5fd7b5c3
  由 JZ-LIANG 提交于 2月 24, 2022
  
  5fd7b5c3
22 2月, 2022 1 次提交
- Y
  [Auto Parallel] Add the high-level Engine API (#39709) · 5595fdbb
  由 Yulong Ao 提交于 2月 22, 2022
```
* [Auto Parallel] Add the high-level Engine API

* Update the test cmakefile
```
  5595fdbb

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功