[cherry-pick] Support CPU Parallel in DataParallel Interface by GLOO to speed up training (#35745) (#36605) * User specified backend (#35745) * remove tensordot
拖放文件到此处或点击上传