- 30 5月, 2023 1 次提交
-
-
由 Yulong Ao 提交于
* [Auto Parallel] Reorganize the fold structure * [Auto Parallel] Fix some import errors
-
- 17 4月, 2023 1 次提交
-
-
由 Yulong Ao 提交于
-
- 12 4月, 2023 1 次提交
-
-
由 Yulong Ao 提交于
* [Auto Parallel] Speedup the completion process * [Auto Parallel] Skip the property of dist_context when deepcopying * [Auto Parallel] Remove the unnecessary print * [Auto Parallel] Move some changes from 2.4 branch to develop * Update engine.py * [Auto Parallel] Fix a bug
-
- 10 4月, 2023 1 次提交
-
-
由 JZ-LIANG 提交于
* unique id for mesh * rng ctrl * support dropout * register op * adopt for recompute * update unitest * support pp
-
- 16 3月, 2023 1 次提交
-
-
由 JZ-LIANG 提交于
* update env setting * update pass logic * dist op support bf16 * backward cast update * update setting * update backward * revert amp pass * update fp16 backward logic * register c_embedding bf16 * revert engine * add unitest * add unitest * update unitest * update cmake * update math * update math.py * update unitest * update unitest * revise unitest * revise unitest * update unitest * update unitest * update unitest
-
- 11 1月, 2023 1 次提交
-
-
由 yuehuayingxueluo 提交于
* add FusedLinear pass * add fused_op_list and renname PASSES to OP_FUSION * add fused_passes_list to constants.py * add test_passes.py * fix test_fused_passes.py * fix add if float(paddle.version.cuda()) >= 11.6: * renamed test_fused_passes.py * fix CMakeList.txt
-
- 10 1月, 2023 1 次提交
-
-
由 Yulong Ao 提交于
* [Auto Parallel] Remove some fluid APIs * [Auto Parallel] Fix the wrong import * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix the importing bug
-
- 04 1月, 2023 1 次提交
-
-
由 JZ-LIANG 提交于
* remove deps and prior comm * grad comm fuse * add deps for amp&global norm * stage2 broadcast prior deps * stage2 grad overlap * stream_analyzer bugfix * overlap enable * dep op namescope * depend support multiple inputs * check finite deps * stage2 param comm overlap * Set kD2HStream * grad comm hierarchical * grad comm hierarchical * new unitest Co-authored-by: Nchenruibiao <chenruibiao@baidu.com>
-
- 29 12月, 2022 1 次提交
-
-
由 xu98bin 提交于
* auto parallel bf16
-
- 27 12月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* [AutoParallel] quantization pass support export * support subgraph * move_presist_var_to_global_block * update unittest * fix ci-coverage * fix codestyle * fix fake_dequantize_op * remove unused var * fix ci error and aprroval error * add unittest for fp16 in test_dequant_linear * replace mutable data * fix unittest in non-cuda-core * fix unittest Co-authored-by: Ncarryyu <569782149@qq.com> Co-authored-by: Nwufeisheng <wfs1997@163.com>
-
- 29 11月, 2022 1 次提交
-
-
由 Nyakku Shigure 提交于
* isort all files * revert conflicting files * revert conflicting files * revert conflicting files
-
- 23 10月, 2022 1 次提交
-
-
由 Nyakku Shigure 提交于
* update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format
-
- 18 10月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* [AutoParallel] add callbacks * fix unittest * fix dist_context * fix engine * fix cmakelist * fix unittest's returns * fix cmakelist
-
- 12 10月, 2022 1 次提交
-
-
由 Nyakku Shigure 提交于
* [CodeStyle][F401] remove unused import in python/paddle/distributed * remove pass * empty commit * Fix ValueError: list.remove(x): x not in list for meta_optimizer_names. Fix ValueError: list.remove(x): x not in list for meta_optimizer_names. * Fix split import. Fix split import. * add noqa after meta_optimizers in factory * restort collective ops * expand `import *` * add noqa after required imports * try to fix APIs without core.ops * Revert "try to fix APIs without core.ops" This reverts commit 6172beaf601e84bf61f2490c12c4739f0edaa5eb. * fix an increment * empty commit * add noqa after required imports * expand `import *`, fix ci error Co-authored-by: NShuangchi He <34329208+Yulv-git@users.noreply.github.com>
-
- 28 9月, 2022 1 次提交
-
-
由 JZ-LIANG 提交于
* support input mask
-
- 22 9月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
-
- 15 9月, 2022 1 次提交
-
-
由 Yulong Ao 提交于
* [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com> Co-authored-by: Ncaozhou <caozhou@radi.ac.cn> Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>
-
- 09 9月, 2022 2 次提交
-
-
由 zhaoyingli 提交于
* adapt gradient merge * fix op_role * fix strategy
-
由 zhaoyingli 提交于
* adapt lazy init and fix pass * add unittest * update comment * fix amp and sharding * remove clip_by_norm
-
- 31 8月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* add grad_clip pass * add unittest * add notes * update func * add dist_attr for new op
-
- 23 8月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* add quant pass
-
- 18 8月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* add clip_grad * fix comments * add unittest * update logger
-
- 15 8月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* add collate_fn * fix number of inputs
-
- 12 8月, 2022 1 次提交
-
-
由 JZ-LIANG 提交于
* bugfix * remove scaling * support rescale_grad opt
-
- 13 7月, 2022 1 次提交
-
-
由 JZ-LIANG 提交于
* avoid sync with cpp in partition op * delay eval & predict mode * bugfix for gradient merge pass
-
- 11 7月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* add 'to_static' in engine api * fix cmakelist
-
- 29 6月, 2022 1 次提交
-
-
由 JZ-LIANG 提交于
* fixed bug for pass & engine * fixed bug for benchmark GPT-3
-
- 06 6月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* fix gradient merge * bug fix * update annotation
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
* use yapf to format all python file * yapf exclude two unittests file for they rely on writing and reading file, and format will break them * disable diff_py_file because too many diff files cause command following failed
-
- 01 6月, 2022 1 次提交
-
-
由 Yulong Ao 提交于
* [Auto Parallel] Add the parallel tuner * [Auto Parallel] Improve the parallel tuner and fix some bugs * upodate cost model * update import Resharder by dist op * update cost model * fix comp cost bug * update cost model * [Auto Parallel] Amend the dist attr for #processses=1 * update cost model and tuner * update cost model and tuner * update cost model and tuner * update cluster * update reshard * [Auto Parallel] Add the estimation from the cost model * [Auto Parallel] Reimplement the backup and restore functions * [Auto Parallel] Fix the bugs of the parallel tuner * [Auto Parallel] Update the engine api and dist context * [Auto Parallel] Work around the high order grad problem * [Auto Parallel] Add some miscellaneous improvements * [Auto Parallel] Add a unittest for DistributedContext Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
-
- 30 5月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* use original id in dist_op_context.grad_op_id_to_op_id * del assert * remove redundant map
-
- 19 5月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* slice data in dist_loader & flag to scale grad * bug fix * update unittest * enable static
-
- 10 5月, 2022 1 次提交
-
-
由 Yulong Ao 提交于
* [Auto Parallel] Refactor the engine api and parallelizer * [Auto Parallel] Fix the default dist op for the slice op * [Auto Parallel] Fix the format of planer.py * [Auto Parallel] Fix a bug
-