1. 27 2月, 2020 2 次提交
    • J
      MPI 3.x support via mpi4py (#107) · 7e813283
      Jeff Rasley 提交于
      * add mpirun support for openmpi 4.0
      
      * add master addr support from args
      
      * switch mpi detection to use mpi4py
      
      * set constant for default distributed port
      
      * Make sure deepspeed_mpi exits in args
      7e813283
    • J
      Init distributed torch only if needed (#108) · 5aa58b38
      Jeff Rasley 提交于
      * add auto-detect to torch dist init
      
      * update tests to infer distributed init status
      
      * prevent crash if dist_init_required is True but already initiliazed
      
      * only init if safe to do so (forgot to add this file in prev commit)
      5aa58b38
  2. 25 2月, 2020 1 次提交
  3. 24 2月, 2020 1 次提交
  4. 22 2月, 2020 1 次提交
  5. 21 2月, 2020 1 次提交
  6. 20 2月, 2020 2 次提交
  7. 15 2月, 2020 1 次提交
  8. 10 2月, 2020 1 次提交
  9. 08 2月, 2020 1 次提交
    • S
      Samyamr/batchconfig (#33) · 5a0abc65
      Samyam Rajbhandari 提交于
      * simplifying the batch config, using a single assert to test for validity and allowing for specifying only the micro batch size
      
      * Simplifying Batch Config, Adding ability to specify batch using just micro_batch, and adding a bunch of unit tests
      
      * ran formatting
      
      * Typo fixes and added the config file
      
      * reformatting
      
      * path fixes
      
      * removing print statements
      5a0abc65
  10. 07 2月, 2020 1 次提交
  11. 06 2月, 2020 1 次提交
  12. 04 2月, 2020 6 次提交
  13. 01 2月, 2020 7 次提交