1. 21 3月, 2012 4 次提交
    • N
      nearbyint optimization (only clear inexact when necessary) · 91c28f61
      nsz 提交于
      old code saved/restored the fenv (the new code is only as slow
      as that when inexact is not set before the call, but some other
      flag is set and the rounding is inexact, which is rare)
      
      before:
      bench_nearbyint_exact              5000000 N        261 ns/op
      bench_nearbyint_inexact_set        5000000 N        262 ns/op
      bench_nearbyint_inexact_unset      5000000 N        261 ns/op
      
      after:
      bench_nearbyint_exact             10000000 N         94.99 ns/op
      bench_nearbyint_inexact_set       25000000 N         65.81 ns/op
      bench_nearbyint_inexact_unset     10000000 N         94.97 ns/op
      91c28f61
    • N
      remove a fixme comment · 8c6fc860
      nsz 提交于
      8c6fc860
    • N
      clean up pow.c and powf.c · f1347a3a
      nsz 提交于
      fix comments about special cases
      f1347a3a
    • N
      clean up powl.c · 615bbd36
      nsz 提交于
      fix special cases, use multiplication instead of scalbnl
      615bbd36
  2. 20 3月, 2012 16 次提交
    • N
      fix a cbrtl.c regression and remove x87 precision setting · 1e2fea63
      nsz 提交于
      1e2fea63
    • R
      optimize scalbn family · baa43bca
      Rich Felker 提交于
      the fscale instruction is slow everywhere, probably because it
      involves a costly and unnecessary integer truncation operation that
      ends up being a no-op in common usages. instead, construct a floating
      point scale value with integer arithmetic and simply multiply by it,
      when possible.
      
      for float and double, this is always possible by going to the
      next-larger type. we use some cheap but effective saturating
      arithmetic tricks to make sure even very large-magnitude exponents
      fit. for long double, if the scaling exponent is too large to fit in
      the exponent of a long double value, we simply fallback to the
      expensive fscale method.
      
      on atom cpu, these changes speed up scalbn by over 30%. (min rdtsc
      timing dropped from 110 cycles to 70 cycles.)
      baa43bca
    • R
      remquo asm: return quotient mod 8, as intended by the spec · 7513d3ec
      Rich Felker 提交于
      this is a lot more efficient and also what is generally wanted.
      perhaps the bit shuffling could be more efficient...
      7513d3ec
    • R
      804fbf0b
    • R
      fix exp asm · acb74492
      Rich Felker 提交于
      exponents (base 2) near 16383 were broken due to (1) wrong cutoff, and
      (2) inability to fit the necessary range of scalings into a long
      double value.
      
      as a solution, we fall back to using frndint/fscale for insanely large
      exponents, and also have to special-case infinities here to avoid
      inf-inf generating nan.
      
      thankfully the costly code never runs in normal usage cases.
      acb74492
    • N
      code cleanup of named constants · 0cbb6547
      nsz 提交于
      zero, one, two, half are replaced by const literals
      The policy was to use the f suffix for float consts (1.0f),
      but don't use suffix for long double consts (these consts
      can be exactly represented as double).
      0cbb6547
    • N
      fix remainder*.c: remove useless long double cast · b03255af
      nsz 提交于
      b03255af
    • N
      don't try to create non-standard denormalization signal · 4caa17b2
      nsz 提交于
      Underflow exception is only raised when the result is
      invalid, but fmod is always exact. x87 has a denormalization
      exception, but that's nonstandard. And the superflous *1.0
      will be optimized away by any compiler that does not honor
      signaling nans.
      4caa17b2
    • N
      new modff.c code, fix nan handling in modfl · 75483499
      nsz 提交于
      75483499
    • N
      use scalbn or *2.0 instead of ldexp, fix fmal · 2786c7d2
      nsz 提交于
      Some code assumed ldexp(x, 1) is faster than 2.0*x,
      but ldexp is a wrapper around scalbn which uses
      multiplications inside, so this optimization is
      wrong.
      
      This commit also fixes fmal which accidentally
      used ldexp instead of ldexpl loosing precision.
      
      There are various additional changes from the
      work-in-progress const cleanups.
      2786c7d2
    • N
      fix long double const workaround in cbrtl · 01fdfd49
      nsz 提交于
      01fdfd49
    • N
      don't inline __rem_pio2l so the code size is smaller · 2e8c8fbe
      nsz 提交于
      2e8c8fbe
    • N
      minor fix in __tanl (get sign properly) · c3587eff
      nsz 提交于
      c3587eff
    • R
      bug fix: wrong opcode for writing long long · d9c1d72c
      Rich Felker 提交于
      d9c1d72c
    • N
      remove long double const workarounds · eca1c35e
      nsz 提交于
      Some long double consts were stored in two doubles as a workaround
      for x86_64 and i386 with the following comment:
      /* Long double constants are slow on these arches, and broken on i386. */
      This is most likely old gcc bug related to the default x87 fpu
      precision setting (it's double instead of double extended on BSD).
      eca1c35e
    • N
      fix erfl wrapper for long double==double case · 9a810cb6
      nsz 提交于
      9a810cb6
  3. 19 3月, 2012 20 次提交