Created by: yihuaxu
For performance requirement, now we remove some temporary tensors for fused_emb_seq_pool operator.