1. 17 12月, 2018 7 次提交
    • E
      hardfloat: implement float32/64 square root · f131bae8
      Emilio G. Cota 提交于
      Performance results for fp-bench:
      
      Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
      - before:
      sqrt-single: 42.30 MFlops
      sqrt-double: 22.97 MFlops
      - after:
      sqrt-single: 311.42 MFlops
      sqrt-double: 311.08 MFlops
      
      Here USE_FP makes a huge difference for f64's, with throughput
      going from ~200 MFlops to ~300 MFlops.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      f131bae8
    • E
      hardfloat: implement float32/64 fused multiply-add · ccf770ba
      Emilio G. Cota 提交于
      Performance results for fp-bench:
      
      1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
      - before:
      fma-single: 74.73 MFlops
      fma-double: 74.54 MFlops
      - after:
      fma-single: 203.37 MFlops
      fma-double: 169.37 MFlops
      
      2. ARM Aarch64 A57 @ 2.4GHz
      - before:
      fma-single: 23.24 MFlops
      fma-double: 23.70 MFlops
      - after:
      fma-single: 66.14 MFlops
      fma-double: 63.10 MFlops
      
      3. IBM POWER8E @ 2.1 GHz
      - before:
      fma-single: 37.26 MFlops
      fma-double: 37.29 MFlops
      - after:
      fma-single: 48.90 MFlops
      fma-double: 59.51 MFlops
      
      Here having 3FP64 set to 1 pays off for x86_64:
      [1] 170.15 vs [0] 153.12 MFlops
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      ccf770ba
    • E
      hardfloat: implement float32/64 division · 4a629561
      Emilio G. Cota 提交于
      Performance results for fp-bench:
      
      1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
      - before:
      div-single: 34.84 MFlops
      div-double: 34.04 MFlops
      - after:
      div-single: 275.23 MFlops
      div-double: 216.38 MFlops
      
      2. ARM Aarch64 A57 @ 2.4GHz
      - before:
      div-single: 9.33 MFlops
      div-double: 9.30 MFlops
      - after:
      div-single: 51.55 MFlops
      div-double: 15.09 MFlops
      
      3. IBM POWER8E @ 2.1 GHz
      - before:
      div-single: 25.65 MFlops
      div-double: 24.91 MFlops
      - after:
      div-single: 96.83 MFlops
      div-double: 31.01 MFlops
      
      Here setting 2FP64_USE_FP to 1 pays off for x86_64:
      [1] 215.97 vs [0] 62.15 MFlops
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      4a629561
    • E
      hardfloat: implement float32/64 multiplication · 2dfabc86
      Emilio G. Cota 提交于
      Performance results for fp-bench:
      
      1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
      - before:
      mul-single: 126.91 MFlops
      mul-double: 118.28 MFlops
      - after:
      mul-single: 258.02 MFlops
      mul-double: 197.96 MFlops
      
      2. ARM Aarch64 A57 @ 2.4GHz
      - before:
      mul-single: 37.42 MFlops
      mul-double: 38.77 MFlops
      - after:
      mul-single: 73.41 MFlops
      mul-double: 76.93 MFlops
      
      3. IBM POWER8E @ 2.1 GHz
      - before:
      mul-single: 58.40 MFlops
      mul-double: 59.33 MFlops
      - after:
      mul-single: 60.25 MFlops
      mul-double: 94.79 MFlops
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      2dfabc86
    • E
      hardfloat: implement float32/64 addition and subtraction · 1b615d48
      Emilio G. Cota 提交于
      Performance results (single and double precision) for fp-bench:
      
      1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
      - before:
      add-single: 135.07 MFlops
      add-double: 131.60 MFlops
      sub-single: 130.04 MFlops
      sub-double: 133.01 MFlops
      - after:
      add-single: 443.04 MFlops
      add-double: 301.95 MFlops
      sub-single: 411.36 MFlops
      sub-double: 293.15 MFlops
      
      2. ARM Aarch64 A57 @ 2.4GHz
      - before:
      add-single: 44.79 MFlops
      add-double: 49.20 MFlops
      sub-single: 44.55 MFlops
      sub-double: 49.06 MFlops
      - after:
      add-single: 93.28 MFlops
      add-double: 88.27 MFlops
      sub-single: 91.47 MFlops
      sub-double: 88.27 MFlops
      
      3. IBM POWER8E @ 2.1 GHz
      - before:
      add-single: 72.59 MFlops
      add-double: 72.27 MFlops
      sub-single: 75.33 MFlops
      sub-double: 70.54 MFlops
      - after:
      add-single: 112.95 MFlops
      add-double: 201.11 MFlops
      sub-single: 116.80 MFlops
      sub-double: 188.72 MFlops
      
      Note that the IBM and ARM machines benefit from having
      HARDFLOAT_2F{32,64}_USE_FP set to 0. Otherwise their performance
      can suffer significantly:
      - IBM Power8:
      add-single: [1] 54.94 vs [0] 116.37 MFlops
      add-double: [1] 58.92 vs [0] 201.44 MFlops
      - Aarch64 A57:
      add-single: [1] 80.72 vs [0] 93.24 MFlops
      add-double: [1] 82.10 vs [0] 88.18 MFlops
      
      On the Intel machine, having 2F64 set to 1 pays off, but it
      doesn't for 2F32:
      - Intel i7-6700K:
      add-single: [1] 285.79 vs [0] 426.70 MFlops
      add-double: [1] 302.15 vs [0] 278.82 MFlops
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      1b615d48
    • E
      fpu: introduce hardfloat · a94b7839
      Emilio G. Cota 提交于
      The appended paves the way for leveraging the host FPU for a subset
      of guest FP operations. For most guest workloads (e.g. FP flags
      aren't ever cleared, inexact occurs often and rounding is set to the
      default [to nearest]) this will yield sizable performance speedups.
      
      The approach followed here avoids checking the FP exception flags register.
      See the added comment for details.
      
      This assumes that QEMU is running on an IEEE754-compliant FPU and
      that the rounding is set to the default (to nearest). The
      implementation-dependent specifics of the FPU should not matter; things
      like tininess detection and snan representation are still dealt with in
      soft-fp. However, this approach will break on most hosts if we compile
      QEMU with flags that break IEEE compatibility. There is no way to detect
      all of these flags at compilation time, but at least we check for
      -ffast-math (which defines __FAST_MATH__) and disable hardfloat
      (plus emit a #warning) when it is set.
      
      This patch just adds common code. Some operations will be migrated
      to hardfloat in subsequent patches to ease bisection.
      
      Note: some architectures (at least PPC, there might be others) clear
      the status flags passed to softfloat before most FP operations. This
      precludes the use of hardfloat, so to avoid introducing a performance
      regression for those targets, we add a flag to disable hardfloat.
      In the long run though it would be good to fix the targets so that
      at least the inexact flag passed to softfloat is indeed sticky.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      a94b7839
    • E
      softfloat: rename canonicalize to sf_canonicalize · f9943c7f
      Emilio G. Cota 提交于
      glibc >= 2.25 defines canonicalize in commit eaf5ad0
      (Add canonicalize, canonicalizef, canonicalizel., 2016-10-26).
      
      Given that we'll be including <math.h> soon, prepare
      for this by prefixing our canonicalize() with sf_ to avoid
      clashing with the libc's canonicalize().
      Reported-by: NBastian Koppelmann <kbastian@mail.uni-paderborn.de>
      Tested-by: NBastian Koppelmann <kbastian@mail.uni-paderborn.de>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      f9943c7f
  2. 17 10月, 2018 1 次提交
  3. 06 10月, 2018 3 次提交
  4. 24 8月, 2018 2 次提交
  5. 16 8月, 2018 1 次提交
  6. 18 5月, 2018 19 次提交
  7. 15 5月, 2018 2 次提交
  8. 11 5月, 2018 1 次提交
  9. 17 4月, 2018 2 次提交
  10. 16 4月, 2018 1 次提交
  11. 13 4月, 2018 1 次提交