Refactor the organization of layer_norm cuda impl so that it can be reused in fused attention op. Extract the layer_norm cuda impl form layer_norm_op.cu to layer_norm_kernel.cu.h. Define fused/attention_layer_norm.h, which can be used in fused attention op in next PR.