1. 25 3月, 2020 4 次提交
  2. 07 2月, 2020 5 次提交
  3. 06 8月, 2019 2 次提交
    • R
      fix build regression in i386 asm for atan2, atan2f · 6818c31c
      Rich Felker 提交于
      commit f3ed8bfe inadvertently removed
      labels that were still needed.
      6818c31c
    • R
      fix x87 stack imbalance in corner cases of i386 math asm · f3ed8bfe
      Rich Felker 提交于
      commit 31c5fb80 introduced underflow
      code paths for the i386 math asm, along with checks on the fpu status
      word to skip the underflow-generation instructions if the underflow
      flag was already raised. unfortunately, at least one such path, in
      log1p, returned with 2 items on the x87 stack rather than just 1 item
      for the return value. this is a violation of the ABI's calling
      convention, and could cause subsequent floating point code to produce
      NANs due to x87 stack overflow. if floating point results are used in
      flow control, this can lead to runaway wrong code execution.
      
      rather than reviewing each "underflow already raised" code path for
      correctness, remove them all. they're likely slower than just
      performing the underflow code unconditionally, and significantly more
      complex.
      
      all of this code should be ripped out and replaced by C source files
      with inline asm. doing so would preclude this kind of error by having
      the compiler perform all x87 stack register allocation and stack
      manipulation, and would produce comparable or better code. however
      such a change is a much larger project.
      f3ed8bfe
  4. 19 4月, 2015 1 次提交
    • R
      remove the last of possible-textrels from i386 asm · 8ed66ecb
      Rich Felker 提交于
      none of these are actual textrels because of ld-time binding performed
      by -Bsymbolic-functions, but I'm changing them with the goal of making
      ld-time binding purely an optimization rather than relying on it for
      semantic purposes.
      
      in the case of memmove's call to memcpy, making it explicit that the
      memmove asm is assuming the forward-copying behavior of the memcpy asm
      is desirable anyway; in case memcpy is ever changed, the semantic
      mismatch would be apparent while editing memmcpy.s.
      8ed66ecb
  5. 06 11月, 2014 1 次提交
  6. 09 1月, 2014 1 次提交
  7. 05 9月, 2013 1 次提交
    • S
      math: fix exp2l asm on x86 (raise underflow correctly) · 07039ed8
      Szabolcs Nagy 提交于
      there were two problems:
      * omitted underflow on subnormal results: exp2l(-16383.5) was calculated
      as sqrt(2)*2^-16384, the last bits of sqrt(2) are zero so the down scaling
      does not underflow eventhough the result is in subnormal range
      * spurious underflow for subnormal inputs: exp2l(0x1p-16400) was evaluated
      as f2xm1(x)+1 and f2xm1 raised underflow (because inexact subnormal result)
      
      the first issue is fixed by raising underflow manually if x is in
      (-32768,-16382] and not integer (x-0x1p63+0x1p63 != x)
      
      the second issue is fixed by treating x in (-0x1p64,0x1p64) specially
      
      for these fixes the special case handling was completely rewritten
      07039ed8
  8. 15 8月, 2013 3 次提交
  9. 17 12月, 2012 1 次提交
  10. 15 12月, 2012 1 次提交
    • S
      math: fix i386/expl.s with more precise x*log2e · a8f73bb1
      Szabolcs Nagy 提交于
      with naive exp2l(x*log2e) the last 12bits of the result was incorrect
      for x with large absolute value
      
      with hi + lo = x*log2e is caluclated to 128 bits precision and then
        expl(x) = exp2l(hi) + exp2l(hi) * f2xm1(lo)
      this gives <1.5ulp measured error everywhere in nearest rounding mode
      a8f73bb1
  11. 12 12月, 2012 1 次提交
  12. 09 8月, 2012 1 次提交
  13. 08 5月, 2012 1 次提交
  14. 05 5月, 2012 1 次提交
    • N
      math: change the formula used for acos.s · f697d66b
      nsz 提交于
      old: 2*atan2(sqrt(1-x),sqrt(1+x))
      new: atan2(fabs(sqrt((1-x)*(1+x))),x)
      improvements:
      * all edge cases are fixed (sign of zero in downward rounding)
      * a bit faster (here a single call is about 131ns vs 162ns)
      * a bit more precise (at most 1ulp error on 1M uniform random
      samples in [0,1), the old formula gave some 2ulp errors as well)
      f697d66b
  15. 04 4月, 2012 1 次提交
    • N
      math: fix x86 asin accuracy · 37eaec3a
      nsz 提交于
      use (1-x)*(1+x) instead of (1-x*x) in asin.s
      the later can be inaccurate with upward rounding when x is close to 1
      37eaec3a
  16. 29 3月, 2012 1 次提交
    • N
      math: remove x86 modf asm · d79ac8c3
      nsz 提交于
      the int part was wrong when -1 < x <= -0 (+0.0 instead of -0.0)
      and the size and performace gain of the asm version was negligible
      d79ac8c3
  17. 28 3月, 2012 1 次提交
  18. 23 3月, 2012 1 次提交
    • R
      asm for hypot and hypotf · ad2d2b96
      Rich Felker 提交于
      special care is made to avoid any inexact computations when either arg
      is zero (in which case the exact absolute value of the other arg
      should be returned) and to support the special condition that
      hypot(±inf,nan) yields inf.
      
      hypotl is not yet implemented since avoiding overflow is nontrivial.
      ad2d2b96
  19. 22 3月, 2012 1 次提交
  20. 20 3月, 2012 5 次提交
    • R
      optimize scalbn family · baa43bca
      Rich Felker 提交于
      the fscale instruction is slow everywhere, probably because it
      involves a costly and unnecessary integer truncation operation that
      ends up being a no-op in common usages. instead, construct a floating
      point scale value with integer arithmetic and simply multiply by it,
      when possible.
      
      for float and double, this is always possible by going to the
      next-larger type. we use some cheap but effective saturating
      arithmetic tricks to make sure even very large-magnitude exponents
      fit. for long double, if the scaling exponent is too large to fit in
      the exponent of a long double value, we simply fallback to the
      expensive fscale method.
      
      on atom cpu, these changes speed up scalbn by over 30%. (min rdtsc
      timing dropped from 110 cycles to 70 cycles.)
      baa43bca
    • R
      remquo asm: return quotient mod 8, as intended by the spec · 7513d3ec
      Rich Felker 提交于
      this is a lot more efficient and also what is generally wanted.
      perhaps the bit shuffling could be more efficient...
      7513d3ec
    • R
      804fbf0b
    • R
      fix exp asm · acb74492
      Rich Felker 提交于
      exponents (base 2) near 16383 were broken due to (1) wrong cutoff, and
      (2) inability to fit the necessary range of scalings into a long
      double value.
      
      as a solution, we fall back to using frndint/fscale for insanely large
      exponents, and also have to special-case infinities here to avoid
      inf-inf generating nan.
      
      thankfully the costly code never runs in normal usage cases.
      acb74492
    • R
      bug fix: wrong opcode for writing long long · d9c1d72c
      Rich Felker 提交于
      d9c1d72c
  21. 19 3月, 2012 6 次提交