1. 12 1月, 2022 2 次提交
    • Z
      the_one_ps dirs reconstruct (#38804) · 50609214
      ziyoujiyi 提交于
      * delete gloo connect retry
      
      * the_one_ps dirs reconstruct
      
      * .
      
      * .
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      50609214
    • J
      [Dist Pass] Amp Pass (#38764) · cc24427e
      JZ-LIANG 提交于
      * auto parallel sharding base
      
      * chmod
      
      * add unitest
      
      * set unitest cmake dist label
      
      * revise code according to rewiew
      
      * chmod
      
      * bugfix for grad_clip and param broadcast
      
      * chmod
      
      * update unitest
      
      * chmod
      
      * add clip
      
      * chmod
      
      * add amp pass
      
      * chmod
      
      * add unitest
      
      * remove grad update
      
      * fixed bug
      
      * fixed bug
      
      * fixed typose
      
      * fixed typoes
      cc24427e
  2. 11 1月, 2022 1 次提交
  3. 06 1月, 2022 3 次提交
  4. 31 12月, 2021 1 次提交
  5. 30 12月, 2021 2 次提交
  6. 29 12月, 2021 1 次提交
  7. 24 12月, 2021 1 次提交
  8. 22 12月, 2021 1 次提交
  9. 21 12月, 2021 3 次提交
  10. 20 12月, 2021 2 次提交
  11. 19 12月, 2021 1 次提交
  12. 17 12月, 2021 3 次提交
  13. 14 12月, 2021 2 次提交
  14. 12 12月, 2021 1 次提交
    • 沉潜的鱼儿's avatar
      Dist op compatible (#37994) · 89bced5e
      沉潜的鱼儿 提交于
      * dist matmul op compatible
      
      * dist op unittest
      
      * modify dist matmul
      
      * modify dist reshape
      
      * modify dist reshape
      
      * add a space
      
      * add a space
      
      * delete dist matmul op
      
      * modify reshape
      
      * add dist op unittest
      
      * modify dist op unittest
      89bced5e
  15. 10 12月, 2021 1 次提交
  16. 09 12月, 2021 2 次提交
  17. 08 12月, 2021 1 次提交
  18. 07 12月, 2021 2 次提交
    • Z
      Buf fix for reset grad inplace version (#37811) · cf586021
      Zhanlue Yang 提交于
      * Debug
      
      * Fixed issue with reset_grad_inplace_version when used with clear_gradient & cross-batch accumulation
      
      * Rearranged interfaces
      
      * Fixed ci issues
      cf586021
    • Y
      [Auto para] Relaunch with auto mapping function (#37326) · 506e79d1
      Yulong Ao 提交于
      * [Auto Parallel]  Add the unified cluster representation
      
      * [Auto Parallel] Add the graph class for physical mapping
      
      * [Auto Parallel] Add the simple physical mapper
      
      * Set the timeout of the mapper
      
      * Merge the upstream develop unittests cmake files
      
      * Fix a bug of the process group
      
      * Remove mapper unittest from platforms which is not GPU
      
      * Move the instantiation of process group after resharding
      
      * Add the local id for devices
      
      * Update the rank mapping format
      
      * [Auto Parallel] Relaunch with the rank mapping file
      
      * Remove the unnecessary json file
      
      * Avoid entering get_device_proc_info for auto mapping
      
      * Correct the mapper unit test
      
      * Add some comments
      
      * Remove the related files about mapping
      
      * Update the unittest for auto mapping
      
      * Remove unused rank_mapping unittest
      
      * Improve the unittest coverage
      
      * Improve the unittest coverage
      
      * Improve the unittest of relaunch
      
      * Fix the unittest problem in CI
      
      * Improve the unittest of relaunch
      
      * Remove unnecessary statements
      
      * Update the unittest cmakefile
      
      * Correct the cmakefile of auto parallel unittests
      
      * Modify codes based on the new elastic change
      
      * Use the GPUs exclusively in the unittest
      
      * Correct the cmakefile
      
      * Set the timeout of the unittest
      506e79d1
  19. 06 12月, 2021 2 次提交
  20. 02 12月, 2021 2 次提交
  21. 01 12月, 2021 1 次提交
  22. 30 11月, 2021 3 次提交
    • X
      [Auto Parallel] elastic support auto parallel re-launch (#37523) · 5440d2f9
      xiayanming 提交于
      * [Auto Parallel] elastic support auto parallel re-launch
      
      * [Auto Parallel] elastic support auto parallel re-launch
      
      * fix ci issue
      
      * fix ci issue
      
      * fix rank mapping unittest
      
      * fix rank mapping unittest
      
      * fix ci issue
      
      * fix ci issue
      
      * fix ci issue
      
      * fix ci issue
      
      * fix ci issue
      
      * fix ci issue
      
      * fix ci issue
      
      * fix ci issue
      
      * fix ci issue
      
      * fix ci issue
      
      * fix ci issue
      
      * fix ci issue
      
      * fix ci issue
      5440d2f9
    • Z
      1514eec6
    • Y
      [Auto Parallel] Do the physical mapping between the process graph and the cluster graph (#37094) · b0dff05d
      Yulong Ao 提交于
      * [Auto Parallel]  Add the unified cluster representation
      
      * [Auto Parallel] Add the graph class for physical mapping
      
      * [Auto Parallel] Add the simple physical mapper
      
      * Set the timeout of the mapper
      
      * Merge the upstream develop unittests cmake files
      
      * Fix a bug of the process group
      
      * Remove mapper unittest from platforms which is not GPU
      
      * Move the instantiation of process group after resharding
      
      * Add the local id for devices
      
      * Update the rank mapping format
      
      * Add some comments
      
      * Remove the related files about mapping
      
      * Update the unittest for auto mapping
      
      * Remove unused rank_mapping unittest
      
      * Improve the unittest coverage
      
      * Improve the unittest coverage
      b0dff05d
  23. 29 11月, 2021 2 次提交