提交 · 9780eb721bb699abaeb6c062ebc6b44dc1523af7 · BaiXuePrincess / Paddle

19 10月, 2022 1 次提交

[Cherry-Pick][AutoParallel] auto_parallel cherry-pick to release2.4 (#47145) · 90b31790

由 zhaoyingli 提交于 10月 19, 2022

* [Auto Parallel] Make Engine class callable (#46416)

* [Auto Parallel] Imporve the user-defined fetches and logging

* [Auto Parallel] Make Engine class callable

* [Auto Parallel] Update the data loading of tuner

* Print IPS in auto parallel Engine (#46554)

* [AutoParallel] fix dist_split (#46505)

* [AutoParallel] fix dist_split

* add unittest

* update cmakelist

* [AutoParallel] fix sharding (#46572)

* [AutoParallel] fix process_mesh (#46583)

* [AutoParallel] fix reshard when train with eval (#46605)

* [AutoParallel] fix reshard when train with eval

* fix mppp

* [AutoParallel] fix amp when predict (#46637)

* [Auto Parallel]Update comp cost and completion for gpt auto search (#46387)

* update comp cost and completion for gpt auto search

* add unittest

* [Auto Parallel] Fix bugs caused by the inconsistent outputs of Engine API (#46633)

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Improve the fine-grained APIs (#46552)

* [Auto Parallel] Suppport different dataloaders

* [Auto Parallel] Add num_shards config for dataset

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Add the prepare API and replace __call__ with run

* [Auto Parallel] Improve the private implementations of Engine

* [Auto Parallel] Set capacity of dataloader for opt tuning

* [Auto Parallel] [WIP] Change the fine-grained API

* [Auto Parallel] Improve APIs to support different user cases

* [Auto Parallel] Add removed config

* [Auto Parallel] Add imports

* [Auto Parallel] Fix bugs for to_static

* [Auto Parallel] Remove unnecessary imports

* bugfix (#46921)

* [Auto Parallel] Fix the bug for None labels (#46987)

* [AutoParallel] adapt for gpt-gen (#46771)

* for gpt-gen

* fix reshard

* adapt assign and shape op

* add dist_assign & unittest

* add conditional block unittest

* rename unittest

* [Auto Parallel] Fix the bug of completion (#47056)

* [Auto Parallel] Fix the bug for None labels

* [Auto Parallel] Fix the completion bug

* [AutoParallel] add callbacks (#47014)

* [AutoParallel] add callbacks

* fix unittest

* fix dist_context

* fix engine

* fix cmakelist

* fix unittest's returns

* fix cmakelist

* [Auto Parallel] Add cost interface (#47043)

* add cost interface

* update inferface and add unittest

* update unittest

* update inferface

* [Auto Parallel]Add parallel tuner (#46189)

* add parallel tuner

* add unittest

* fix unittest

* set timeout of unittest

* set unittest timeout

* fix auto_mode setting

* update unittest

* sync from develop and update unittest

* remove unused import

* update unittest

* update cmakelist

* add unittests
Co-authored-by: NYulong Ao <aoyulong@baidu.com>
Co-authored-by: NRuibiao Chen <chenruibiao@baidu.com>
Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>

90b31790

11 7月, 2022 1 次提交
- L
  
  fix undefined-variable (#44187) · ee5cb5f2
  由 Leo Chen 提交于 7月 10, 2022
  
  ee5cb5f2
23 6月, 2022 1 次提交

fix paddle.vision.datasets.* en docs (#43649) · 1fca8f33

由 Nyakku Shigure 提交于 6月 23, 2022

* rewrite all code examples, test=document_fix

* refine arguments, test=document_fix

* fix desc format error, test=document_fix

* capitalize the first letter, test=document_fix

* refine api desc, test=document_fix

* fix wrong COPY-FROM label in Model docs, test=document_fix

* refine returns, test=document_fix

* refine returns, test=document_fix

* add a blank line in code block, test=document_fix

1fca8f33

17 6月, 2022 1 次提交

fix paddle.Model en docs (#43537) · 4c3969fa

由 Nyakku Shigure 提交于 6月 17, 2022

* add copy-from label for code examples, test=document_fix

* refine docs, test=document_fix

* add some output for code example, test=document_fix

* add `optional`, test=document_fix

* add missing parameters, test=document_fix

* add missing links for `ProgBarLogger` and `ModelCheckpoint`, test=document_fix

* update eval_batch example, test=document_fix

* fix typos in stack_outputs, test=document_fix

* np.random -> paddle.random, test=document_fix

4c3969fa

05 6月, 2022 1 次提交

【code format check upgrade】 step2：yapf (#42944) · a072fca8

由 Sing_chan 提交于 6月 05, 2022

* use yapf to format all python file

* yapf exclude two unittests file for they rely on writing and reading file, and format will break them

* disable diff_py_file because too many diff files cause command following failed

a072fca8

13 5月, 2022 1 次提交

[Eager] Support test_dist_hapi_model under eager mode (#42702) · 9840fb70

由 Weilong Wu 提交于 5月 13, 2022

* [Eager] Support test_dist_hapi_model under eager mode

* [Eager] Polish code

* Fix code-format issue, coverage-ci issue

9840fb70

12 5月, 2022 1 次提交
- S
  
  Fix some typos in paddle/. (#42408) · 2012672c
  由 Shuangchi He 提交于 5月 12, 2022
  
  2012672c
25 3月, 2022 1 次提交

Refactor Dygraph Flags (#40786) · 3085d5e4

由 Jiabin Yang 提交于 3月 25, 2022

* refactor eager flags

* fix flags error when we switch from eager to dygraph

* fix ci problem

* fix ci

* fix ci

* merge develop and fix code style

* merge develop and fix code style

* fix op test error

* fix op test error

* fix op test error

* fix op test error

* fix op test error

* merge develop

3085d5e4

21 3月, 2022 1 次提交

Merge some test bug (#40543) · 56c43ccd

由 hong 提交于 3月 21, 2022

* switch eager mode and change it

* set default is eager

* set default is eager

* fix error; test=develop

* fix some error; test=develop

* update

* upd

* update code; test=develop

* update

* fix some bug; test=develop

* fix bug; test=develop

* fix bug; test=develop

* fix bug; test=develop

* fix error; test=develop

* format; test=develop
Co-authored-by: NJiabinYang <360788950@qq.com>

56c43ccd

22 10月, 2021 1 次提交

[hapi] support dygraph amp O2 (#36441) · 08248db0

由 Leo Chen 提交于 10月 22, 2021

* [hapi] support dygrapg amp O2

* fix problem of static pure fp16 in hapi

* fix bug

* fix format

* fix ut

* follow comments

* update ut

* update amp save/load

* fix ut

* refine code format

08248db0

29 7月, 2021 1 次提交
- W
  add parameter of input in model.summary (#34165) · 40bd7a7a
  由 wangna11BD 提交于 7月 29, 2021
```
* add input option in model.summary
```
  40bd7a7a
23 7月, 2021 1 次提交
- S
  
  fix bug for num_iters in fit/evaluate (#34059) · 08c5b1d1
  由 shangliang Xu 提交于 7月 23, 2021
  
  08c5b1d1
22 7月, 2021 1 次提交
- J
  
  fix hapi fleet bug in static mode (#34311) · 13991b5e
  由 Jiaqi Liu 提交于 7月 22, 2021
  
  13991b5e
08 7月, 2021 1 次提交
- S
  add num_iters in fit/evalate (#33986) · 97faf90e
  由 shangliang Xu 提交于 7月 08, 2021
```
* add num_iters in fit/evalate, test=develop
```
  97faf90e
28 6月, 2021 8 次提交
- L
  
  update docs · 9a73283b
  由 lyuwenyu 提交于 6月 22, 2021
  
  9a73283b
- L
  
  update docs · 01474d3b
  由 lyuwenyu 提交于 6月 22, 2021
  
  01474d3b
- L
  
  fix doc, last iter, and test for amp · a8fec662
  由 lyuwenyu 提交于 6月 22, 2021
  
  a8fec662
- L
  
  update · 77eae14a
  由 lyuwenyu 提交于 6月 17, 2021
  
  77eae14a
- L
  
  update in static mode · f9f21a5c
  由 lyuwenyu 提交于 6月 17, 2021
  
  f9f21a5c
- L
  
  update docs · 2e933629
  由 lyuwenyu 提交于 6月 17, 2021
  
  2e933629
- L
  
  update · ae2b2185
  由 lyuwenyu 提交于 6月 17, 2021
  
  ae2b2185
- L
  
  add gradient accumulate for dygraph · 87eb929f
  由 lyuwenyu 提交于 6月 17, 2021
  
  87eb929f
21 6月, 2021 1 次提交
- T
  Del six.PY code2 (#33607) · 0f7187af
  由 tianshuo78520a 提交于 6月 21, 2021
```
* del py2 code2

* fix test timeout
```
  0f7187af
11 6月, 2021 1 次提交
- Z
  update 2.0 public api in vision (#33308) · 2de737eb
  由 zhiboniu 提交于 6月 11, 2021
```
* update 2.0 public api in vision

* fix some flake8 errors
```
  2de737eb
09 6月, 2021 1 次提交
- L
  Add option "verbose" for predict api (#33405) · e08fdd16
  由 LielinJiang 提交于 6月 09, 2021
```
* add option verbose for predict api
```
  e08fdd16
07 6月, 2021 1 次提交
- Z
  
  fix undefined-variable (#33355) · 443cf71a
  由 zhangchunle 提交于 6月 07, 2021
  
  443cf71a
29 4月, 2021 1 次提交
- Z
  
  update 2.0 public api in hapi (#32650) · 243b4326
  由 zhiboniu 提交于 4月 29, 2021
  
  243b4326
26 4月, 2021 2 次提交
- X
  [2.1 API] Modified params of some APIs to support tuple and list. (#32528) · 400c3aa7
  由 xiemoyuan 提交于 4月 26, 2021
```
* Modified params of some APIs to support tuple and list.

* fixed bug.
```
  400c3aa7
- J
  
  fix acc typo and shape error, and remove 'users' subjects in amp doc, test=document_fix (#32476) · ab3d2bf0
  由 Jiaqi Liu 提交于 4月 26, 2021
  
  ab3d2bf0
23 4月, 2021 1 次提交
- B
  solve hccl communicate conflict (#32447) · 0e74eea2
  由 Baibaifan 提交于 4月 23, 2021
```
solve hccl communicate conflict (#32447)
```
  0e74eea2
21 4月, 2021 1 次提交

【NPU】Merge NPU ccl code (#32381) · c3158527

由 zhang wenhui 提交于 4月 21, 2021

* add allreduce and broadcast without test (#31024)

add allreduce and broadcast without test

* Refactor HCCLCommContext to be compatible with Paddle (#31359)

Refactor HCCLCommContext to be compatible with Paddle (#31359)

* [NPU] add npu kernel for communication op (#31437)

* add allreduce and broadcast without test

* add c_broadcast_test case

* build c_comm_init and c_create_group operators

* make the whole thing compile

* add broadcast and init op test case but run failed

* make unit test compile

* fix broadcast test bug and change into hcom for ccl

* change c_comm_init and c_create_group ops accordingly

* make tests compile

* transfer code to 27

* compiled successfully in 28, but run failed

* test broadcast in 28, but failed

* make hcom primitives work

* change hccl data type for base.h

* fix broadcast bug

* make attributes work

* fix group name bug

* add allreduce but test failed

* allreduce bug for qiuliang

* allreduce finished

* add allgather and reducescatter

* merge all op code

* add allgather test

* finish run all ccl op test exclude send/recv

* all all op and test exclude send/recv

* send_v2_npu.cc recv_v2_npiu.cc compiled

* fix ccl core dump bug and test allgather, reducescatter, broadcast op

* fix allreduce bug just for test

* hcom send&recv test pass, without hcom_destroy

* for qiuliang test

* Ascend Send&Recv Test Pass

* all op (ex send/recv) ok

* fix bug

* merge all ccl op

* style merge to PaddlePaddle

* merge style

* new merge style

* merge style 2

* insert an empty at the end

* disable ctest for hcom to pass ci
Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>

* Add auto-increasing tag id for Hcom OPs (#31702)

* add c_reduce_sum op (#31793)

add c_reduce_sum op

* update Ascendrc hccl to 20.3 (#32126)

update Ascendrc hccl to 20.3 (#32126)

* fix merge code

* change cmake.txt1

* [NPU] Support npu kernel for c sync stream op (#31386)

* sync stream npu op

* add with_ascend_acl

* update c++ unittest

* compile all failed

* try to pre commit

* after pre commit

* merge&compile&test hccl successfully!

* fix code style

* fix code style

* fix bugs about hccl

* fix some bugs

* fix code style

* fix style

* fix style

* fix

* fixed

* merge develop
Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>
Co-authored-by: Nxiayanming <41795079@qq.com>

c3158527

15 4月, 2021 1 次提交

Update hapi to support AMP (#31417) · fabdb43c

由 Jiaqi Liu 提交于 4月 15, 2021

* make hapi support amp, and add unittest

* make unittest only support GPU

* update parameters for amp in hapi.Model

* update hapi.Model.prepare interface, and update unittest

* fix test_model.py unittest bug

* add grad clear in dygraph

* use_fp16_guard defaults to True, which could avoid nan

* add input check, and add internal doc link to low level api

* update doc, and decrease the sample num of dataset to avoid timeout

* make hapi amp param  support str 'O1' or 'O2'

* resume calling , modify the code of the check part

* upgrade the usage of Fleet API, and disable 'pure_fp16' param

fabdb43c

11 1月, 2021 1 次提交
- L
  Delete incorrect warning message (#30196) · e6a1e875
  由 LielinJiang 提交于 1月 11, 2021
```
* fix warning and no grad
```
  e6a1e875
07 1月, 2021 1 次提交
- W
  
  refine the paddle place support using str (#28769) · 7dd551e0
  由 wangchaochaohu 提交于 1月 07, 2021
  
  7dd551e0
27 11月, 2020 2 次提交

Support dynamic graph distributed (#28997) · e2d01eb6

由 ShenLiang 提交于 11月 27, 2020

* add reducer

* refine envent for memorycopy

* add concat&split for allreduce

* apply concat & split for fuse tensor

* fix nccl dep

* fix the untest, compile problem and ddp initialize problem

* fix untest for mac & add some comments & solve the repeated param in sublayers

* fix untest for windows & fix document

e2d01eb6

L
Enhance logger callback for benchmark (#29106) · 9f53f3d0
由 LielinJiang 提交于 11月 27, 2020
```
* enhance logger callback for benchmark
```
9f53f3d0

25 11月, 2020 1 次提交
- Q
  Fix doc format for callbacks, metrics and Model (#28638) · 8bbedc23
  由 qingqing01 提交于 11月 25, 2020
```
* Fix doc format for callbacks, metrics and Model
* Fix code sample and doc
```
  8bbedc23
23 11月, 2020 3 次提交

Add EarlyStopping (#28691) · 70385518

由 LiuChiachi 提交于 11月 23, 2020

* add early stopping

* add doc for early stopping

* fix sample code bugs

* update infer of mode, update doc, add unittests to increase coverage rate

* fix sample code for early stopping

* update sample code and unittests

* reduce time cost of test_callbacks unittest

* fix model.py code style error

70385518

Update path name of saving in hapi (#28462) · 8c8b42f2

由 LiuChiachi 提交于 11月 23, 2020

* update hapi save_inference_model output pathname

* update hapi save_inference_model output pathname

* use new 2.0-api paddle.static.io.load_inference_model

* add unittests to increase coverage rate

8c8b42f2

L
Add lr scheduler callback for high level api (#28737) · 00e55ded
由 LielinJiang 提交于 11月 23, 2020
```
* add lr scheduler
```
00e55ded

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致