• H
    supports collective communicated training (#18175) · b7128bac
    HaoRen 提交于
    * fix prepare context redundant code problem, optimize executor by caching create_varaiables
    test=develop
    
    * supports collective training in executor
    
    * make fetch_list runable with variables, add more unittest for use_program_cache
    test=develop
    
    * fix comment
    test=develop
    
    * use unique name for nccl_id
    
    * supports output to stream in program_to_code
    
    * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
    
    * set op role in collective training
    
    * add collective op role
    
    * remove orig file
    
    * add build optimizer by strategy
    
    * add collective strategy
    
    * refine collective strategy
    
    * add multi-process role maker
    
    * refine strategy building factory so that we can easily plugin more strategy
    
    * scale loss grad in collective sgd transpiler
    
    * add support for distributed fc
    
    * code format
    
    * revert some features for dist fc
    
    * add support for distributed fc training
    
    * fix prepare context redundant code problem, optimize executor by caching create_varaiables
    test=develop
    
    * supports collective training in executor
    
    * make fetch_list runable with variables, add more unittest for use_program_cache
    test=develop
    
    * use unique name for nccl_id
    
    * supports output to stream in program_to_code
    
    * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
    
    * set op role in collective training
    
    * add collective op role
    
    * fix comment
    test=develop
    
    * remove orig file
    
    * add build optimizer by strategy
    
    * add collective strategy
    
    * refine collective strategy
    
    * add multi-process role maker
    
    * refine strategy building factory so that we can easily plugin more strategy
    
    * scale loss grad in collective sgd transpiler
    
    * add support for distributed fc
    
    * code format
    
    * revert some features for dist fc
    
    * add support for distributed fc training
    
    * test=develop
    add collective op unittest standard
    
    * test=develop
    remove the test_collective directory
    
    * test=develop
    remove the test_collective directory
    
    * remove slicegather test
    
    * code format for reducescatter
    
    * update attr of shard_index_op
    
    * Modify macro nccl_helper
    
    * remove test without distribute
    
    * macro collective_helper
    
    * marcro update
    
    * test=develop
    update support python3.5
    
    * test=develop change gpu memory use to 0.1 when test
    
    * test=develop
    update ut equal func
    
    * test=develop
    set flags to 1.5
    
    * test=develop fix pickle dumple  py35
    
    * test=develop
    fix divide in slice and add sync_comm_stream
    update atol and rtol to 1e-05
    rm shard_index op and test
    modify read input from file to read from memory
    remove origin_program in framework and add i/o in c_sync_calc_stream
    
    * test=develop update unittest sync operator I/O
    b7128bac
nccl.h 2.9 KB