Created by: cryoco
PR types
Performance optimization
PR changes
Others
Describe
This patch removes unnecessary barrier for data transfer of needed offset, so data transfer can be overlap with GPU kernel execution.
This patch also fixes incorrect name of slice plugin. That is, replaces "layernorm" with "slice"