1. 14 8月, 2023 1 次提交
    • A
      [AutoTuner] Add GBS search, gpu memory usage (#55466) · 4c0c458a
      Azure 提交于
      * temp commit
      
      * distribute best cfg
      
      * update metric extracting
      
      * fix bugs of prune and reading log
      
      * fix adding cfg bug
      
      * reset status
      
      * remove alarm and set logdir
      
      * deepcopy ctx
      
      * change alarm
      
      * fix restart bug
      
      * best no need alarm
      
      * add gbs search, add gpu memory to history csv, add memory detect
      
      * fix bug
      
      * fix memory read bug; fix etcd connection bug
      
      * fix memory read bug, add oom detection for all ranks
      
      * fix read log and oom detaction, add error code for read log
      
      * add unit test
      
      * Update master.py
      
      ---------
      Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
      4c0c458a
  2. 14 6月, 2023 1 次提交