Created by: cryoco
Optimize multihead matmul of ERNIE according to PR#22486
TODO:
- use cublas api to compute matmul;
- detect precision in trt plugin to use fp16 mode;
- move matmul functions in multihead_matmul_op.cu to operators/math for reusability.