Fork自 PaddlePaddle / Paddle
resort .cu headers, set clang-format not sort include block and consider .cu as main source file (#43633)
* move gather_nd/scatter/scatter_nd_add * fix npu/xpu ci * follow comments * small fix