• Y
    Optimize the layer_norm operator with AVX intrinsic function (#14417) · f4c869d8
    Yihua Xu 提交于
    * Optimize layer_norm operator with AVX intrinsic functions
    
    * Revert the wrong modifications
    
    * Implement the jit kernel for layer_norm operator
    
    * Add math headfile to fix the compile issue (test=develop)
    
    * Add math headfile to fix the compile issue (test=develop)
    
    * Fixed the intrinsic headfile issue (test=develop)
    
    * Fix the conflicts (test=develop)
    
    * Revert for CUDA compiler (test=develop)
    
    * Fixed the cuda depency (test=develop)
    
    * Fix the marco issues (test=develop)
    f4c869d8
jit_kernel.h 4.1 KB