• A
    [AutoTuner] Add GBS search, gpu memory usage (#55466) · 4c0c458a
    Azure 提交于
    * temp commit
    
    * distribute best cfg
    
    * update metric extracting
    
    * fix bugs of prune and reading log
    
    * fix adding cfg bug
    
    * reset status
    
    * remove alarm and set logdir
    
    * deepcopy ctx
    
    * change alarm
    
    * fix restart bug
    
    * best no need alarm
    
    * add gbs search, add gpu memory to history csv, add memory detect
    
    * fix bug
    
    * fix memory read bug; fix etcd connection bug
    
    * fix memory read bug, add oom detection for all ranks
    
    * fix read log and oom detaction, add error code for read log
    
    * add unit test
    
    * Update master.py
    
    ---------
    Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
    4c0c458a
recorder.py 2.8 KB