* refine structure for cuda and rocm * update * update * update * update
This reverts commit aef291f4.
* add reference to global_gather and global_scatter operators
* upload global scatter and global gather operators related files