- 14 6月, 2023 1 次提交
-
-
由 Ghost Screaming 提交于
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result is wrong. * Remove climits. * Fix problem of pickle and NCCL_P2P_DISABLE in distributed testcases in cuda12. * Fix problem of TimeOut of distributed testcases under cuda12. * Remove useless modification. * Remove useless modification.
-
- 13 6月, 2023 2 次提交
- 02 6月, 2023 1 次提交
-
-
由 zhaoyingli 提交于
* [AutoParallel] add 1F1B * rm amp
-
- 01 6月, 2023 1 次提交
-
-
由 zhaoyingli 提交于
* [AutoParallel] update while control_flow with pipeline * update process group instantiate * fix micro_bsz for reshard * update api for micro batch size * add strategy for dp optimization
-
- 26 4月, 2023 1 次提交
-
-
由 zhenhailiu 提交于
* polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish
-
- 17 4月, 2023 2 次提交
-
-
由 tianshuo78520a 提交于
* mv ps distributed dir * fix * add del auto_parallel * add auto_parallel * fix ps * fix bug * fix test bug * fix test bug * merge develop fix error * merge develop fix error * merge develop fix error
-
由 caozhou 提交于
* add o2 tune * add unittest * fix error * set unittest timeout
-
- 10 4月, 2023 1 次提交
-
-
由 JZ-LIANG 提交于
* unique id for mesh * rng ctrl * support dropout * register op * adopt for recompute * update unitest * support pp
-
- 29 3月, 2023 1 次提交
-
-
由 yuehuayingxueluo 提交于
* add fuse adamw pass * fix some bugs * fix CIbug * change chunk_size * fix CI bug * rm test_fused_adam_op.py * fix CI bugs * fix fuse_adamw_op_pass.cc * change code style * fix CI bug * fix ut bug and use_adamw_op_pass.cc * fix test_fuse_adamw_pass.py * fix CI bug * remove fluid * fix ci bug * fix CI bug
-
- 23 3月, 2023 1 次提交
-
-
由 caozhou 提交于
* add patterns * update rule based tuner * add forward sub program completion * add unittest * add bwd sub program completion
-
- 22 3月, 2023 1 次提交
-
-
由 zhaoyingli 提交于
* [AutoParallel] support bloom * fix import * align amp and bf16 * update func name * clipbyglobalnorm and add_n support bf16 * upgrade amp strategy api * update bf16 unittest * fix static clip --------- Co-authored-by: Nliangjianzhong <liangjianzhong@baidu.com> Co-authored-by: NAurelius84 <zhangliujie@baidu.com>
-
- 16 3月, 2023 1 次提交
-
-
由 JZ-LIANG 提交于
* update env setting * update pass logic * dist op support bf16 * backward cast update * update setting * update backward * revert amp pass * update fp16 backward logic * register c_embedding bf16 * revert engine * add unitest * add unitest * update unitest * update cmake * update math * update math.py * update unitest * update unitest * revise unitest * revise unitest * update unitest * update unitest * update unitest
-
- 14 3月, 2023 1 次提交
-
-
由 zhaoyingli 提交于
-
- 15 2月, 2023 1 次提交
-
-
由 xu98bin 提交于
* auto parallel align tool * modify function get_var's return * add save and load in align_tool * modify load function and save function * add finding different ops in align tool * full auto parallel align tool add test file for auto parallel align tool set timeout for test modify get_backward_tmp_var function add annotation for align tool modify test file modify code to restart CI remove timeout * set timeout
-
- 09 2月, 2023 1 次提交
-
-
由 yuehuayingxueluo 提交于
* fix the processing order of passes in pass_base.py * fix processing order * add _PASS_PROCESS_ORDER_LIST * delete some pass in _PASS_PROCESS_ORDER_LIST * add assert in pass_base.py * remove fuse_optimizer * add _fusion_opt_list_rule * add test_pass_base_list.py * fix some bug * add fused_attention * add some passes to list * fix ci bug * fix ci bug
-
- 11 1月, 2023 1 次提交
-
-
由 yuehuayingxueluo 提交于
* add FusedLinear pass * add fused_op_list and renname PASSES to OP_FUSION * add fused_passes_list to constants.py * add test_passes.py * fix test_fused_passes.py * fix add if float(paddle.version.cuda()) >= 11.6: * renamed test_fused_passes.py * fix CMakeList.txt
-
- 29 12月, 2022 1 次提交
-
-
由 xu98bin 提交于
* auto parallel bf16
-
- 27 12月, 2022 2 次提交
-
-
由 zhaoyingli 提交于
* fix input order * add unittest * update cmakelist
-
由 zhaoyingli 提交于
* [AutoParallel] quantization pass support export * support subgraph * move_presist_var_to_global_block * update unittest * fix ci-coverage * fix codestyle * fix fake_dequantize_op * remove unused var * fix ci error and aprroval error * add unittest for fp16 in test_dequant_linear * replace mutable data * fix unittest in non-cuda-core * fix unittest Co-authored-by: Ncarryyu <569782149@qq.com> Co-authored-by: Nwufeisheng <wfs1997@163.com>
-
- 14 12月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* [AutoParallel] recompute tuning * fix conflict * update comment * bug fix * update rc algo * tiny fix * fix clear process_group * remove comment * update segment print * fix import OpRole * adapt amp pass and grad_clip pass for opt_tuner * update tuning config * fix import * annotate recompute info on ops and upgrade recompute pass * add op_namescope for seed op * record reserved vars * fix recompute var's dist_attr * fix strategy unittest * adapt for fp16 * update unittest * revert copy opt * update unittest * rename set_recompute_segments * fix unittest
-
- 08 12月, 2022 1 次提交
-
-
由 Jianghai 提交于
* add cluster_partition and device_meshes to process_meshes funcs * add unitest
-
- 29 11月, 2022 1 次提交
-
-
由 caozhou 提交于
* add pattern match * add unittest
-
- 28 11月, 2022 1 次提交
-
-
由 caozhou 提交于
* add pattern for auto search * add unittest
-
- 22 11月, 2022 1 次提交
-
-
由 caozhou 提交于
-
- 18 11月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* [AutoParallel] selective recompute * add cmakelist
-
- 07 11月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* fp16 pass support assign op * choose assign op exec mode * add unittest * add cmakelist
-
- 31 10月, 2022 1 次提交
-
-
由 Yulong Ao 提交于
* [Auto Parallel] Improve the c++ dist attr * [Auto Parallel] Modify test_program.py * [Auto Parallel] Add the missiong import
-
- 28 10月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* fix engine build method * fix import * update engine cost * update raise error * update cmakelist * revert optimizer * revert optimizer * fix unittest * fix unittest Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
-
- 18 10月, 2022 2 次提交
-
-
由 caozhou 提交于
* add parallel tuner * add unittest * fix unittest * set timeout of unittest * set unittest timeout * fix auto_mode setting * update unittest * sync from develop and update unittest * remove unused import * update unittest * update cmakelist * add unittests
-
由 zhaoyingli 提交于
* [AutoParallel] add callbacks * fix unittest * fix dist_context * fix engine * fix cmakelist * fix unittest's returns * fix cmakelist
-
- 14 10月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* for gpt-gen * fix reshard * adapt assign and shape op * add dist_assign & unittest * add conditional block unittest * rename unittest
-
- 28 9月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* [AutoParallel] fix dist_split * add unittest * update cmakelist
-
- 15 9月, 2022 1 次提交
-
-
由 Yulong Ao 提交于
* [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com> Co-authored-by: Ncaozhou <caozhou@radi.ac.cn> Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>
-
- 07 9月, 2022 1 次提交
-
-
由 caozhou 提交于
* support iterable dataset for auto parallel * add split_data proto * fix unittest bug * fix recompute bug * update cmake
-
- 05 9月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* dist_matmul trans * update unittest * update cmakelist
-
- 31 8月, 2022 2 次提交
-
-
由 JZ-LIANG 提交于
* bugfix (#45332) * dist embedding support lookup table v1 * add unitest * update unitest cmake
-
由 zhaoyingli 提交于
* add grad_clip pass * add unittest * add notes * update func * add dist_attr for new op
-
- 23 8月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* add quant pass
-
- 18 8月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* add clip_grad * fix comments * add unittest * update logger
-