Created by: jeng1220
PR types
Performance optimization
PR changes
OPs
Describe
This patch removes unnecessary barrier for data transfer of needed offset, so data transfer can be overlapped with GPU kernel execution. It also fixes incorrect name of slice plugin. That is, replaces "skip_layernorm" with "slice"