Created by: LeoZhao-Intel
enable omp for elementwise ops, which can speed up BERT training on CPU by using multi-core and threads.
test=develop