- 10 8月, 2023 1 次提交
-
-
由 LiYuRio 提交于
-
- 24 7月, 2023 1 次提交
-
-
由 Chen Weihang 提交于
* add shard tensor api * add DistAttr api * add unittest for coverage * fix process mesh sample code * fix checking error
-
- 07 7月, 2023 1 次提交
-
-
由 Leo Chen 提交于
-
- 29 6月, 2023 1 次提交
-
-
由 zhaoyingli 提交于
* add skip_gc_vars for 1f1b schedule mode * add pp_degree and pp_stage
-
- 25 6月, 2023 1 次提交
-
-
由 zhaoyingli 提交于
* auto parallel support pipeline scheduler with standalone executor * rm check_fetch * update cmakelist and flags env * rm set micro batch id * rm import * update utils func * raise error when merge tensor for return_numpy is False * fix _pipeline_opt * fix unittest
-
- 20 6月, 2023 1 次提交
-
-
由 Azure 提交于
* add auto tuner * compare and record module * revert launch main * add prune rule * add unit test * add auto tuner * revert launch main * add prune rule * modify unit test script * fix bug for dump nodes; fix bug for checking log file * fix bug --------- Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
-
- 14 6月, 2023 2 次提交
-
-
由 caozhou 提交于
* add auto tuner * fix prune * fix sharding prune and mbs candidates * fix cfg * fix launch * fix launch * add unittest * fix code style
-
由 Ghost Screaming 提交于
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result is wrong. * Remove climits. * Fix problem of pickle and NCCL_P2P_DISABLE in distributed testcases in cuda12. * Fix problem of TimeOut of distributed testcases under cuda12. * Remove useless modification. * Remove useless modification.
-
- 13 6月, 2023 2 次提交
- 02 6月, 2023 1 次提交
-
-
由 zhaoyingli 提交于
* [AutoParallel] add 1F1B * rm amp
-
- 01 6月, 2023 1 次提交
-
-
由 zhaoyingli 提交于
* [AutoParallel] update while control_flow with pipeline * update process group instantiate * fix micro_bsz for reshard * update api for micro batch size * add strategy for dp optimization
-
- 26 4月, 2023 1 次提交
-
-
由 zhenhailiu 提交于
* polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish
-
- 17 4月, 2023 2 次提交
-
-
由 tianshuo78520a 提交于
* mv ps distributed dir * fix * add del auto_parallel * add auto_parallel * fix ps * fix bug * fix test bug * fix test bug * merge develop fix error * merge develop fix error * merge develop fix error
-
由 caozhou 提交于
* add o2 tune * add unittest * fix error * set unittest timeout
-
- 10 4月, 2023 1 次提交
-
-
由 JZ-LIANG 提交于
* unique id for mesh * rng ctrl * support dropout * register op * adopt for recompute * update unitest * support pp
-
- 29 3月, 2023 1 次提交
-
-
由 yuehuayingxueluo 提交于
* add fuse adamw pass * fix some bugs * fix CIbug * change chunk_size * fix CI bug * rm test_fused_adam_op.py * fix CI bugs * fix fuse_adamw_op_pass.cc * change code style * fix CI bug * fix ut bug and use_adamw_op_pass.cc * fix test_fuse_adamw_pass.py * fix CI bug * remove fluid * fix ci bug * fix CI bug
-
- 23 3月, 2023 1 次提交
-
-
由 caozhou 提交于
* add patterns * update rule based tuner * add forward sub program completion * add unittest * add bwd sub program completion
-
- 22 3月, 2023 1 次提交
-
-
由 zhaoyingli 提交于
* [AutoParallel] support bloom * fix import * align amp and bf16 * update func name * clipbyglobalnorm and add_n support bf16 * upgrade amp strategy api * update bf16 unittest * fix static clip --------- Co-authored-by: Nliangjianzhong <liangjianzhong@baidu.com> Co-authored-by: NAurelius84 <zhangliujie@baidu.com>
-
- 16 3月, 2023 1 次提交
-
-
由 JZ-LIANG 提交于
* update env setting * update pass logic * dist op support bf16 * backward cast update * update setting * update backward * revert amp pass * update fp16 backward logic * register c_embedding bf16 * revert engine * add unitest * add unitest * update unitest * update cmake * update math * update math.py * update unitest * update unitest * revise unitest * revise unitest * update unitest * update unitest * update unitest
-
- 14 3月, 2023 1 次提交
-
-
由 zhaoyingli 提交于
-
- 15 2月, 2023 1 次提交
-
-
由 xu98bin 提交于
* auto parallel align tool * modify function get_var's return * add save and load in align_tool * modify load function and save function * add finding different ops in align tool * full auto parallel align tool add test file for auto parallel align tool set timeout for test modify get_backward_tmp_var function add annotation for align tool modify test file modify code to restart CI remove timeout * set timeout
-
- 09 2月, 2023 1 次提交
-
-
由 yuehuayingxueluo 提交于
* fix the processing order of passes in pass_base.py * fix processing order * add _PASS_PROCESS_ORDER_LIST * delete some pass in _PASS_PROCESS_ORDER_LIST * add assert in pass_base.py * remove fuse_optimizer * add _fusion_opt_list_rule * add test_pass_base_list.py * fix some bug * add fused_attention * add some passes to list * fix ci bug * fix ci bug
-
- 11 1月, 2023 1 次提交
-
-
由 yuehuayingxueluo 提交于
* add FusedLinear pass * add fused_op_list and renname PASSES to OP_FUSION * add fused_passes_list to constants.py * add test_passes.py * fix test_fused_passes.py * fix add if float(paddle.version.cuda()) >= 11.6: * renamed test_fused_passes.py * fix CMakeList.txt
-
- 29 12月, 2022 1 次提交
-
-
由 xu98bin 提交于
* auto parallel bf16
-
- 27 12月, 2022 2 次提交
-
-
由 zhaoyingli 提交于
* fix input order * add unittest * update cmakelist
-
由 zhaoyingli 提交于
* [AutoParallel] quantization pass support export * support subgraph * move_presist_var_to_global_block * update unittest * fix ci-coverage * fix codestyle * fix fake_dequantize_op * remove unused var * fix ci error and aprroval error * add unittest for fp16 in test_dequant_linear * replace mutable data * fix unittest in non-cuda-core * fix unittest Co-authored-by: Ncarryyu <569782149@qq.com> Co-authored-by: Nwufeisheng <wfs1997@163.com>
-
- 14 12月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* [AutoParallel] recompute tuning * fix conflict * update comment * bug fix * update rc algo * tiny fix * fix clear process_group * remove comment * update segment print * fix import OpRole * adapt amp pass and grad_clip pass for opt_tuner * update tuning config * fix import * annotate recompute info on ops and upgrade recompute pass * add op_namescope for seed op * record reserved vars * fix recompute var's dist_attr * fix strategy unittest * adapt for fp16 * update unittest * revert copy opt * update unittest * rename set_recompute_segments * fix unittest
-
- 08 12月, 2022 1 次提交
-
-
由 Jianghai 提交于
* add cluster_partition and device_meshes to process_meshes funcs * add unitest
-
- 29 11月, 2022 1 次提交
-
-
由 caozhou 提交于
* add pattern match * add unittest
-
- 28 11月, 2022 1 次提交
-
-
由 caozhou 提交于
* add pattern for auto search * add unittest
-
- 22 11月, 2022 1 次提交
-
-
由 caozhou 提交于
-
- 18 11月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* [AutoParallel] selective recompute * add cmakelist
-
- 07 11月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* fp16 pass support assign op * choose assign op exec mode * add unittest * add cmakelist
-
- 31 10月, 2022 1 次提交
-
-
由 Yulong Ao 提交于
* [Auto Parallel] Improve the c++ dist attr * [Auto Parallel] Modify test_program.py * [Auto Parallel] Add the missiong import
-
- 28 10月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* fix engine build method * fix import * update engine cost * update raise error * update cmakelist * revert optimizer * revert optimizer * fix unittest * fix unittest Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
-
- 18 10月, 2022 2 次提交
-
-
由 caozhou 提交于
* add parallel tuner * add unittest * fix unittest * set timeout of unittest * set unittest timeout * fix auto_mode setting * update unittest * sync from develop and update unittest * remove unused import * update unittest * update cmakelist * add unittests
-
由 zhaoyingli 提交于
* [AutoParallel] add callbacks * fix unittest * fix dist_context * fix engine * fix cmakelist * fix unittest's returns * fix cmakelist
-
- 14 10月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* for gpt-gen * fix reshard * adapt assign and shape op * add dist_assign & unittest * add conditional block unittest * rename unittest
-
- 28 9月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* [AutoParallel] fix dist_split * add unittest * update cmakelist
-