1. 21 4月, 2021 1 次提交
    • Z
      【NPU】Merge NPU ccl code (#32381) · c3158527
      zhang wenhui 提交于
      * add allreduce and broadcast without test (#31024)
      
      add allreduce and broadcast without test
      
      * Refactor HCCLCommContext to be compatible with Paddle (#31359)
      
      Refactor HCCLCommContext to be compatible with Paddle (#31359)
      
      * [NPU] add npu kernel for communication op (#31437)
      
      * add allreduce and broadcast without test
      
      * add c_broadcast_test case
      
      * build c_comm_init and c_create_group operators
      
      * make the whole thing compile
      
      * add broadcast and init op test case but run failed
      
      * make unit test compile
      
      * fix broadcast test bug and change into hcom for ccl
      
      * change c_comm_init and c_create_group ops accordingly
      
      * make tests compile
      
      * transfer code to 27
      
      * compiled successfully in 28, but run failed
      
      * test broadcast in 28, but failed
      
      * make hcom primitives work
      
      * change hccl data type for base.h
      
      * fix broadcast bug
      
      * make attributes work
      
      * fix group name bug
      
      * add allreduce but test failed
      
      * allreduce bug for qiuliang
      
      * allreduce finished
      
      * add allgather and reducescatter
      
      * merge all op code
      
      * add allgather test
      
      * finish run all ccl op test exclude send/recv
      
      * all all op and test exclude send/recv
      
      * send_v2_npu.cc recv_v2_npiu.cc compiled
      
      * fix ccl core dump bug and test allgather, reducescatter, broadcast op
      
      * fix allreduce bug just for test
      
      * hcom send&recv test pass, without hcom_destroy
      
      * for qiuliang test
      
      * Ascend Send&Recv Test Pass
      
      * all op (ex send/recv) ok
      
      * fix bug
      
      * merge all ccl op
      
      * style merge to PaddlePaddle
      
      * merge style
      
      * new merge style
      
      * merge style 2
      
      * insert an empty at the end
      
      * disable ctest for hcom to pass ci
      Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
      Co-authored-by: Nf2hkop <f2huestc@outlook.com>
      
      * Add auto-increasing tag id for Hcom OPs (#31702)
      
      * add c_reduce_sum op (#31793)
      
      add c_reduce_sum op
      
      * update Ascendrc hccl to 20.3 (#32126)
      
      update Ascendrc hccl to 20.3 (#32126)
      
      * fix merge code
      
      * change cmake.txt1
      
      * [NPU] Support npu kernel for c sync stream op (#31386)
      
      * sync stream npu op
      
      * add with_ascend_acl
      
      * update c++ unittest
      
      * compile all failed
      
      * try to pre commit
      
      * after pre commit
      
      * merge&compile&test hccl successfully!
      
      * fix code style
      
      * fix code style
      
      * fix bugs about hccl
      
      * fix some bugs
      
      * fix code style
      
      * fix style
      
      * fix style
      
      * fix
      
      * fixed
      
      * merge develop
      Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
      Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
      Co-authored-by: Nf2hkop <f2huestc@outlook.com>
      Co-authored-by: Nxiayanming <41795079@qq.com>
      c3158527
  2. 19 4月, 2021 1 次提交
  3. 15 4月, 2021 1 次提交
    • T
      heterps support pscore (#32093) · 9f8c8f96
      Thunderbrook 提交于
      * pscore support heterps
      
      * fleet cmake
      
      * fleet wrapper
      
      * macro
      
      * solve conflict
      
      * solve conflict
      
      * add unitest
      
      * paddle enforce
      
      * unitest
      
      * unitest
      
      * unitest
      9f8c8f96
  4. 09 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d
      Leo Chen 提交于
      * [feature] support npu allocator (#30840)
      
      [feature] support npu allocator
      
      * [feature] support npu operator (#30951)
      
      [feature] support npu operator
      
      * [feature] support npu allocator, part 2 (#30972)
      
      * support npu allocator
      
      * add npu device context
      
      * fix some compile problem
      
      * fix some compile problem
      
      * add npu info
      
      * compile ok
      
      * fix include dir
      
      * support naive_best_fit_allocator
      
      * run ut ok, bug failed to exit
      
      * call aclrtResetDevice before exit
      
      * fix aclFinilize
      
      * add system allocatot test
      
      * add selected_gpus in gtest
      
      * add tensor_test for npu
      
      * support npu op, initial commit
      
      * add npu stream
      
      * add elementwise_add_op
      
      * compile ok
      
      * fix typo
      
      * fix elementwise_add_op_npu_test
      
      * support op run
      
      * test can run but failed
      
      * change aclopExecuteV2 to aclopCompileAndExecute
      
      * support parsing ascend rank table file (#31000)
      
      support parsing ascend rank table file
      
      * Fix reshape on GE graph. (#31084)
      
      Fix reshape on GE graph
      
      * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)
      
      * add npu sub op
      
      * fix typo
      
      * rename test
      
      * fix bug
      
      * fix bug
      
      * add fp16 kernel
      
      * fix typo
      
      * support sub grad op
      
      * support elementwise_sub_grad op
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      
      * Fix compilation problem (#31100)
      
      Fix compilation problem (#31100)
      
      * fix compile
      
      * fix code stype
      
      * remove const_cast
      
      * support adding correct npu op in pybind.h (#31143)
      
      * support adding correct npu op in pybind.h
      
      * refine code
      
      * [NPU] Support executor with NPU (#31057)
      
      * [NPU] Support executor with NPU
      
      * Fix code according to reviews
      
      * Fix code
      
      * Add unittest for sub op npu
      
      * refactor npu device manager (#31154)
      
      refactor npu device manager (#31154)
      
      * fix selected npus
      
      * fix compile
      
      * fix reading flags from env
      
      * format
      Co-authored-by: Nxiayanming <41795079@qq.com>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      ccf5709d
  5. 08 4月, 2021 1 次提交
  6. 07 4月, 2021 2 次提交
    • Z
      【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3
      zhang wenhui 提交于
      * Ascend rc (#30483)
      
      * Fix compilcation on CANN20.1 and older (#30494)
      
      Fix compilcation on CANN20.1 and older
      
      * Add distribution supported (#30578)
      
      Add distribution supported
      
      * Build praser for Hcom* operators (#30627)
      
      Build praser for Hcom* operators
      
      * Pass device_ids info from launch to trainer. (#30632)
      
      Pass device_ids info from launch to trainer
      
      * Add Hccl program group (#30642)
      
      Add Hccl program group
      
      * Add startup bash files of test_ascend_group. (#30645)
      
      Add startup bash files of test_ascend_group
      
      * cleanup (#30646)
      
      cleanup test_ascend_group.py
      
      * [Feature] Build parser to support distributed training (#30658)
      
      [Feature] Build parser to support distributed training
      
      * fix compilation on ascend-20.1 (#30722)
      
      fix compilation on ascend-20.1
      
      * Dev/fix ascend string (#30749)
      
      Dev/fix ascend string
      
      * code style (#30781)
      
      code style
      
      * Merge ascend_optimizer and ascend_parser. (#30776)
      
      Merge ascend_optimizer and ascend_parser.
      
      * Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)
      
      Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug
      
      * Add paddle ascend distribution training supported (#30796)
      
      Add paddle ascend distribution training supported
      
      * pass cxx_flags to gloo cmake (#30857)
      
      * Destroy session first. (#30954)
      
      Destroy session first.
      
      * merge
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix style, test=develop
      
      * fix, test=develop
      
      * fix
      
      * fix log fatal, test=develop
      
      * fix enforce style, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix rccl, test=develop
      
      * fix test, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix node_num, test=develop
      
      * fix ids str, test=develop
      
      * fix ids str, test=develop
      
      * fix ids str, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
      Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
      Co-authored-by: Ndingsiyu <18369187719@163.com>
      Co-authored-by: NOleNet <olenet@126.com>
      8c7c53b3
    • J
      [3D-parallelism] Hybrid Model Parallelism (#32074) · 1e60a0c4
      JZ-LIANG 提交于
      1e60a0c4
  7. 02 4月, 2021 1 次提交
  8. 31 3月, 2021 1 次提交
  9. 26 3月, 2021 1 次提交
  10. 22 3月, 2021 1 次提交
  11. 10 3月, 2021 1 次提交
  12. 05 2月, 2021 1 次提交
  13. 01 2月, 2021 1 次提交
  14. 20 1月, 2021 1 次提交
  15. 18 1月, 2021 1 次提交
  16. 12 1月, 2021 2 次提交
  17. 08 1月, 2021 1 次提交
  18. 05 1月, 2021 1 次提交
  19. 25 12月, 2020 1 次提交
  20. 24 12月, 2020 1 次提交
  21. 17 12月, 2020 1 次提交
  22. 11 12月, 2020 1 次提交
  23. 30 11月, 2020 1 次提交
  24. 26 11月, 2020 2 次提交
    • J
      [sharding] doc, api, bug fixed (#28983) · 0dadacc4
      JZ-LIANG 提交于
      * add lars to fleet meta optimizer
      
      * add lamb to proto
      
      * add lamb to fleet meta optimizer
      
      * fixed syntax bug
      
      * fixed syntax bug
      
      * fixed syntax error in lamb, add config setter of lamb in distributed_strategy
      
      * trigger unitest to rerun
      
      * add new unitest func for lamb
      
      * revise unitest for lars and lamb
      
      * revise dgc meta unitest
      
      * revise lars document in distribute_strategy
      
      * revise lars lamb document in distributed_strategy.py
      
      * revise lars lamb document in distributed_strategy.py
      
      * add weight decay exclude logic to lars
      
      * restore optimzier.py
      
      * restore optimizer.py as develop except lars
      
      * add epsilon and exclude fn to distributed_sttrategy
      
      * add lars epsilon
      
      * revise unitest for fleet lars and lamb
      
      * revise lars lamb unitest for CI coverage
      
      * revise lars argument api
      
      * revise lars argument api
      
      * revise lars argument api
      
      * revise api doc of lars
      
      * fix op role
      
      * add sharding save and add_sync_comm_for_test function
      
      * add comm_analyse to utlis
      
      * revise sharding_utils
      
      * add sharding saving unittest
      
      * revise sharding utils for unittest
      
      * revise sharding en doc
      
      * update sharding utils api
      
      * add doc for sharding
      
      * fixed bug in sharding var size count
      
      * update varsize count in sharding
      
      * fix sharding num_nccl_comm
      
      * Revert "fix sharding num_nccl_comm"
      
      This reverts commit d51587c15e9323acf226ddd36154275f0d1daf76.
      0dadacc4
    • W
      Fix multi nccl comm & wait server ready (#28663) · e931c7ba
      WangXi 提交于
      e931c7ba
  25. 24 11月, 2020 1 次提交
    • L
      Upgrade string literals to raw string (#28989) · 3815d7aa
      Leo Chen 提交于
      * upgrade comment string to raw string
      
      * fix string in
      
      * fix string with ' '
      
      * revert update on comments
      
      * upgrade only necessary
      
      * fix sample code checker
      
      * fix comments with '''
      3815d7aa
  26. 23 11月, 2020 1 次提交
  27. 18 11月, 2020 1 次提交
    • J
      [Sharding] add new features (#28568) · 5a9f6889
      JZ-LIANG 提交于
      * add lars to fleet meta optimizer
      
      * add lamb to proto
      
      * add lamb to fleet meta optimizer
      
      * fixed syntax bug
      
      * fixed syntax bug
      
      * fixed syntax error in lamb, add config setter of lamb in distributed_strategy
      
      * trigger unitest to rerun
      
      * add new unitest func for lamb
      
      * revise unitest for lars and lamb
      
      * revise dgc meta unitest
      
      * revise lars document in distribute_strategy
      
      * revise lars lamb document in distributed_strategy.py
      
      * revise lars lamb document in distributed_strategy.py
      
      * add weight decay exclude logic to lars
      
      * restore optimzier.py
      
      * restore optimizer.py as develop except lars
      
      * add epsilon and exclude fn to distributed_sttrategy
      
      * add lars epsilon
      
      * revise unitest for fleet lars and lamb
      
      * revise lars lamb unitest for CI coverage
      
      * revise lars argument api
      
      * revise lars argument api
      
      * revise lars argument api
      
      * revise api doc of lars
      
      * fix op role
      
      * add sharding save and add_sync_comm_for_test function
      
      * add comm_analyse to utlis
      
      * revise sharding_utils
      
      * add sharding saving unittest
      
      * revise sharding utils for unittest
      5a9f6889
  28. 28 10月, 2020 1 次提交
  29. 26 10月, 2020 1 次提交
  30. 16 10月, 2020 1 次提交
  31. 14 10月, 2020 1 次提交
  32. 13 10月, 2020 2 次提交
  33. 12 10月, 2020 1 次提交
  34. 25 9月, 2020 1 次提交
  35. 20 9月, 2020 1 次提交
    • T
      【paddle.fleet】Fix/role maker api fix (#27326) · d6b54de4
      tangwei12 提交于
      * fix fleet util and gloo
      
      * fix worker endpoints
      
      * fix
      
      * fix UT
      
      * fix gloo
      
      * fix gloo
      
      * update gloo
      
      * update gloo
      
      * update gloo
      
      * update gloo
      
      * update gloo
      
      * fix gloo wrapper for hdfs
      
      * add file gloo and UT
      
      * fix UT
      
      * fix UT
      
      * fix UT
      
      * hide public method of RoleMaker
      
      * fix UT
      
      * GPU fleetrun support gloo
      
      * parameterserver fleetrun support gloo
      
      * add UT
      
      * add UT
      
      * fix UT
      
      * fix get server endpoint
      
      * fix get server endpoint
      
      * fix UT
      
      * hide public method of rolemaker
      
      * hide public method of rolemaker
      
      * hide public method of rolemaker
      
      * Update test_fleet_rolemaker_new.py
      
      * hide public method of rolemaker
      
      * hide public method of rolemaker
      d6b54de4
  36. 17 9月, 2020 1 次提交