• W
    [1.1] [project] train imagenet using large batch size (#13766) · 26200f2e
    Wu Yi 提交于
    * fix nccl2 lars dist support
    
    * put lars in momentum op
    
    * add tests lars
    
    * fix ci
    
    * fix cpu kernel
    
    * soft warning
    
    * remove lars in test_recognize_digits.py
    
    * move to another op
    
    * add file
    
    * update api.spec test=develop
    
    * update test=develop
    
    * fix api.spec test=develop
    
    * wip
    
    * wip, finish grad merge ops
    
    * wip, finish graph build
    
    * wip test running
    
    * work on 1 gpu
    
    * workable version
    
    * update
    
    * fix tests
    
    * fuse broadcast op
    
    * fix compile failed
    
    * refine
    
    * add batch merge test mnist
    
    * fix CI test=develop
    
    * fix build
    
    * use independent bn params for batch merge test=develop
    
    * update api.spec
    
    * follow comments and for test
    
    * wip
    
    * refine tests test=develop
    
    * follow comments test=develop
    
    * remove startup bn modify test=develop
    
    * follow comments test=develop
    
    * fix merge test=develop
    26200f2e
multi_devices_graph_pass.h 3.5 KB