1. 04 6月, 2022 1 次提交
  2. 17 4月, 2022 1 次提交
    • F
      XPUPS Adaptation (#40991) · 0ef3ef28
      Fan Zhang 提交于
      * Adapt XPUPS - 1st version - 3.24
      
      * Adapt XPUPS - update XPU PushSparse -  2nd version - 3.24
      
      * Adapt XPUPS - add XPU PullSparseOp - 3nd version - 3.25
      
      * refactor heter comm kernel
      
      * update. test=develop
      
      * Adapt XPUPS - modify by compilation - 4th version - 3.27
      
      * update calc_shard_offset. test=develop
      
      * update xpu kernel. test=develop
      
      * update args of calc_shard_offset
      
      * update. test=develop
      
      * remove customGradMerger
      
      * update. test=develop
      
      * heter_comm update
      
      * heter_comm update
      
      * update calc_shard_offset. test=develop
      
      * heter_comm update
      
      * update args of calc_shard_offset
      
      * update. test=develop
      
      * remove customGradMerger
      
      * update. test=develop
      
      * fix. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update optimizer kernel
      
      * Adapt XPUPS - use WITH_XPU_KP and modify wrapper kernel function - 5th version - 3.30
      
      * update. test=develop
      
      * update pslib.cmake
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * Adapt XPUPS - modify by kp compilation  - 6th version - 3.30
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update optimizer kernel
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * used by minxu
      
      * update heter_comm_inl
      
      * fix. test=develop
      
      * Adapt XPUPS - modify by kp compilation  - 7th version - 3.30
      
      * fix. test=develop
      
      * add optimizer kernel. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * 3.31 update
      
      * Adapt XPUPS - update kp compilation path  - 8th version - 3.31
      
      * add optimizer kernel. test=develop
      
      * fix kunlun not support size_t. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix kunlun not support size_t. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update heter_comm_kernel.kps 3.31
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update heter_comm_kernel.kps 3.31
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update heter_comm.h 3.31
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update hashtable. test=develop
      
      * update. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 9th version - 4.1
      
      * update hashtable. test=develop
      
      * fix. test=develop
      
      * update hashtable 4.1
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 10th version - 4.1
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update. test=develop
      
      * modify by compilation 4.1
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * modify by compilation 4.1
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * modify by compilation 4.1
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * modify by compilation 4.1 19:30
      
      * fix. test=develop
      
      * update ps_gpu_wrapper.kps 4.1
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 11th version - 4.1
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 12nd version - 4.2
      
      * fix. test=develop
      
      * fix. test=develop
      
      * modify by compilation 4.2
      
      * 4.2 update
      
      * fix. test=develop
      
      * template init. test=develop
      
      * update 4.6
      
      * fix. test=develop
      
      * template init. test=develop
      
      * 4.6 modify by compilation
      
      * hashtable template init. test=develop
      
      * hashtable template init. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=devlop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=devlop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 13nd version - 4.7
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * 4.11 update
      
      * fix. test=develop
      
      * fix. test=develop
      
      * 4.11 update
      
      * update by pre-commit
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * 4.12 update
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 14th version - 4.13
      
      * 4.13 update
      
      * 4.14 update
      
      * 4.14 update
      
      * 4.14 update
      
      * 4.14 modify by merged latest compilation
      
      * retry CI 4.14
      
      * 4.15 pass static check
      
      * 4.15 modify by gpups CI
      
      * 3.16 update by gpups CI - modify ps_gpu_wrapper.h
      
      * 4.16 update
      
      * 4.16 pass xpu compile
      
      * 4.16 retry CI
      
      * 4.16 update
      Co-authored-by: Nzmxdream <zhangminxu01@baidu.com>
      0ef3ef28
  3. 28 1月, 2022 1 次提交
    • F
      [PSLIB] Add Metrics Module, Support User-defined Add Metric (#38789) · 2e6be886
      Fan Zhang 提交于
      * [PSLIB] Add Metrics Module, Support User-defined Add Metric
      
      * [PSLIB] Modify According to CI
      
      * [PSLIB] Modify According to CI
      
      * [PSLIB] Modify According to CI
      
      * [PSLIB] Modify According to CI Coverage
      
      * [PSLIB] Modify According to CI
      
      * [PSLIB] Modify According to CI
      
      * [PSLIB] Modify According to CI
      
      * [PSLIB] Modify According to CI
      
      * [PSLIB] Modify According to CI
      
      * [PSLIB] Modify According to CI Coverage
      
      * [PSLIB] Modify According to CI Coverage
      
      * [PSLIB] Modify According to CI Coverage
      
      * modify role_maker
      
      * update CMakeLists.txt
      2e6be886
  4. 27 1月, 2022 1 次提交
  5. 31 12月, 2021 1 次提交
  6. 07 9月, 2021 1 次提交
  7. 10 5月, 2021 1 次提交
  8. 21 4月, 2021 1 次提交
    • Z
      【NPU】Merge NPU ccl code (#32381) · c3158527
      zhang wenhui 提交于
      * add allreduce and broadcast without test (#31024)
      
      add allreduce and broadcast without test
      
      * Refactor HCCLCommContext to be compatible with Paddle (#31359)
      
      Refactor HCCLCommContext to be compatible with Paddle (#31359)
      
      * [NPU] add npu kernel for communication op (#31437)
      
      * add allreduce and broadcast without test
      
      * add c_broadcast_test case
      
      * build c_comm_init and c_create_group operators
      
      * make the whole thing compile
      
      * add broadcast and init op test case but run failed
      
      * make unit test compile
      
      * fix broadcast test bug and change into hcom for ccl
      
      * change c_comm_init and c_create_group ops accordingly
      
      * make tests compile
      
      * transfer code to 27
      
      * compiled successfully in 28, but run failed
      
      * test broadcast in 28, but failed
      
      * make hcom primitives work
      
      * change hccl data type for base.h
      
      * fix broadcast bug
      
      * make attributes work
      
      * fix group name bug
      
      * add allreduce but test failed
      
      * allreduce bug for qiuliang
      
      * allreduce finished
      
      * add allgather and reducescatter
      
      * merge all op code
      
      * add allgather test
      
      * finish run all ccl op test exclude send/recv
      
      * all all op and test exclude send/recv
      
      * send_v2_npu.cc recv_v2_npiu.cc compiled
      
      * fix ccl core dump bug and test allgather, reducescatter, broadcast op
      
      * fix allreduce bug just for test
      
      * hcom send&recv test pass, without hcom_destroy
      
      * for qiuliang test
      
      * Ascend Send&Recv Test Pass
      
      * all op (ex send/recv) ok
      
      * fix bug
      
      * merge all ccl op
      
      * style merge to PaddlePaddle
      
      * merge style
      
      * new merge style
      
      * merge style 2
      
      * insert an empty at the end
      
      * disable ctest for hcom to pass ci
      Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
      Co-authored-by: Nf2hkop <f2huestc@outlook.com>
      
      * Add auto-increasing tag id for Hcom OPs (#31702)
      
      * add c_reduce_sum op (#31793)
      
      add c_reduce_sum op
      
      * update Ascendrc hccl to 20.3 (#32126)
      
      update Ascendrc hccl to 20.3 (#32126)
      
      * fix merge code
      
      * change cmake.txt1
      
      * [NPU] Support npu kernel for c sync stream op (#31386)
      
      * sync stream npu op
      
      * add with_ascend_acl
      
      * update c++ unittest
      
      * compile all failed
      
      * try to pre commit
      
      * after pre commit
      
      * merge&compile&test hccl successfully!
      
      * fix code style
      
      * fix code style
      
      * fix bugs about hccl
      
      * fix some bugs
      
      * fix code style
      
      * fix style
      
      * fix style
      
      * fix
      
      * fixed
      
      * merge develop
      Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
      Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
      Co-authored-by: Nf2hkop <f2huestc@outlook.com>
      Co-authored-by: Nxiayanming <41795079@qq.com>
      c3158527
  9. 15 4月, 2021 1 次提交
    • T
      heterps support pscore (#32093) · 9f8c8f96
      Thunderbrook 提交于
      * pscore support heterps
      
      * fleet cmake
      
      * fleet wrapper
      
      * macro
      
      * solve conflict
      
      * solve conflict
      
      * add unitest
      
      * paddle enforce
      
      * unitest
      
      * unitest
      
      * unitest
      9f8c8f96
  10. 07 4月, 2021 1 次提交
    • Z
      【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3
      zhang wenhui 提交于
      * Ascend rc (#30483)
      
      * Fix compilcation on CANN20.1 and older (#30494)
      
      Fix compilcation on CANN20.1 and older
      
      * Add distribution supported (#30578)
      
      Add distribution supported
      
      * Build praser for Hcom* operators (#30627)
      
      Build praser for Hcom* operators
      
      * Pass device_ids info from launch to trainer. (#30632)
      
      Pass device_ids info from launch to trainer
      
      * Add Hccl program group (#30642)
      
      Add Hccl program group
      
      * Add startup bash files of test_ascend_group. (#30645)
      
      Add startup bash files of test_ascend_group
      
      * cleanup (#30646)
      
      cleanup test_ascend_group.py
      
      * [Feature] Build parser to support distributed training (#30658)
      
      [Feature] Build parser to support distributed training
      
      * fix compilation on ascend-20.1 (#30722)
      
      fix compilation on ascend-20.1
      
      * Dev/fix ascend string (#30749)
      
      Dev/fix ascend string
      
      * code style (#30781)
      
      code style
      
      * Merge ascend_optimizer and ascend_parser. (#30776)
      
      Merge ascend_optimizer and ascend_parser.
      
      * Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)
      
      Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug
      
      * Add paddle ascend distribution training supported (#30796)
      
      Add paddle ascend distribution training supported
      
      * pass cxx_flags to gloo cmake (#30857)
      
      * Destroy session first. (#30954)
      
      Destroy session first.
      
      * merge
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix style, test=develop
      
      * fix, test=develop
      
      * fix
      
      * fix log fatal, test=develop
      
      * fix enforce style, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix rccl, test=develop
      
      * fix test, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix node_num, test=develop
      
      * fix ids str, test=develop
      
      * fix ids str, test=develop
      
      * fix ids str, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
      Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
      Co-authored-by: Ndingsiyu <18369187719@163.com>
      Co-authored-by: NOleNet <olenet@126.com>
      8c7c53b3
  11. 23 2月, 2021 1 次提交
  12. 18 1月, 2021 1 次提交
  13. 25 12月, 2020 1 次提交
  14. 23 12月, 2020 1 次提交
  15. 18 8月, 2020 1 次提交
  16. 07 8月, 2020 1 次提交
  17. 06 8月, 2020 1 次提交
    • T
      add heter ps mode (#25682) · 0cb60c70
      Thunderbrook 提交于
      * add heter ps mode
      
      * code style
      test=develop
      
      * add with_pslib
      test=develop
      
      * unitest
      test=develop
      
      * code style
      test=develop
      
      * code style
      test=develop
      
      * code style
      test=develop
      
      * code style
      test=develop
      
      * code style
      test=develop
      
      * code style
      test=develop
      
      * code style
      test=develop
      
      * code style
      test=develop
      
      * test monitor
      test=develop
      
      * prepare trainer
      test=develop
      
      * code style
      test=develop
      0cb60c70
  18. 25 2月, 2020 1 次提交
    • H
      PaddleBox Framework Part2 (#22466) · 175954d8
      hutuxian 提交于
      * Add two types of Metric Calculator: MultiTaskCalculator & CmatchRankCalculator.
      * Add a config for DynamicAdjustChannelNum function to denote whether we will discard the remaining instances when they are not be distributed evenly.
      * Remove CPU code in Pull/PushSparse and we will add it back when testing it fully.
      * Fix some known issues: such as copying persistable vars after one epoch running.
      175954d8
  19. 05 2月, 2020 1 次提交
  20. 14 1月, 2020 1 次提交
  21. 31 8月, 2019 1 次提交
    • H
      Paddlebox Framework (#18982) · c756b5d2
      hutuxian 提交于
      * Support looking up embeddings from BoxPS.
      * Add a _pull_box_sparse op, for now this op is not exposed to users.
      * Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on.
      * Add 'BoxPSDataset' in python code.
      * Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS.
      * Add UT.
      * More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982
      c756b5d2
  22. 16 4月, 2019 1 次提交
  23. 29 3月, 2019 4 次提交