Created by: GaoWei8
Optimize the kernel implementation of layernorm with openmp (#20895) Add ernie c++ inference test (#21015) fix cmake fails on inference_download_and_uncompress (#21185) Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972) Polish the codes of fc when needs padding (#21378) Add ernie large c++ inference test (#21365) Modify padding strategy: remove weight copy in fc padding (#21650) optimize fc jit (#21878)