resort .cu headers, set clang-format not sort include block and consider .cu as main source file (#43633)
* move part sum op kernel * remove deprecated names