• T
    use universal intrinsic for FP16 · 903789f7
    Tomoaki Teshima 提交于
      * use v_float16x4 (universal intrinsic) instead of raw SSE/NEON implementation
      * define v_load_f16/v_store_f16 since v_load can't be distinguished when short pointer passed
      * brush up implementation on old compiler (guard correctly)
      * add test for v_load_f16 and round trip conversion of v_float16x4
      * fix conversion error
    903789f7
test_intrin.cpp 26.9 KB