• X
    [Kunlun] Modify some legacy code on distributed training (#55515) · 806f8d2b
    XiaociZhang 提交于
    * [Kunlun] Mofify some legacy code on distributed training
    
    There were limitations on XPUs before, such as concat/split is not
    supported, and c_broadcast only support fp32. These limitations are
    lifted recently.
    
    Multi-device profiling on XPU will also be supported by this PR.
    Without this PR, a hanging broadcast will be issued by devices that
    enables profiling, eventually lead to kernel timeout error.
    
    * fix typo
    806f8d2b
parallel_executor.cc 72.7 KB