1. 05 9月, 2013 17 次提交
    • S
      math: fix exp2l asm on x86 (raise underflow correctly) · 07039ed8
      Szabolcs Nagy 提交于
      there were two problems:
      * omitted underflow on subnormal results: exp2l(-16383.5) was calculated
      as sqrt(2)*2^-16384, the last bits of sqrt(2) are zero so the down scaling
      does not underflow eventhough the result is in subnormal range
      * spurious underflow for subnormal inputs: exp2l(0x1p-16400) was evaluated
      as f2xm1(x)+1 and f2xm1 raised underflow (because inexact subnormal result)
      
      the first issue is fixed by raising underflow manually if x is in
      (-32768,-16382] and not integer (x-0x1p63+0x1p63 != x)
      
      the second issue is fixed by treating x in (-0x1p64,0x1p64) specially
      
      for these fixes the special case handling was completely rewritten
      07039ed8
    • S
    • S
      math: remove *_WORD64 macros from libm.h · 63b9cc77
      Szabolcs Nagy 提交于
      only fma used these macros and the explicit union is clearer
      63b9cc77
    • S
      math: remove old longdbl.h · 94a3d13a
      Szabolcs Nagy 提交于
      94a3d13a
    • S
      math: long double fix (use ldshape union) · aa0c4a20
      Szabolcs Nagy 提交于
      * use new ldshape union consistently
      * add ld128 support to frexpl
      * simplify sqrtl comment (ld64 is not just arm)
      aa0c4a20
    • S
      math: use float_t and double_t in scalbnf and scalbn · 2eaed464
      Szabolcs Nagy 提交于
      remove STRICT_ASSIGN (c99 semantics is assumed) and use the conventional
      union to prepare the scaling factor (so libm.h is no longer needed)
      2eaed464
    • S
      math: fix remaining old long double code (erfl, fmal, lgammal, scalbnl) · 34660d73
      Szabolcs Nagy 提交于
      in lgammal don't handle 1 and 2 specially, in fma use the new ldshape
      union instead of ld80 one.
      34660d73
    • S
      math: cbrt cleanup and long double fix · 535104ab
      Szabolcs Nagy 提交于
      * use float_t and double_t
      * cleanup subnormal handling
      * bithacks according to the new convention (ldshape for long double
      and explicit unions for float and double)
      535104ab
    • S
      math: fix underflow in exp*.c and long double handling in exp2l · 39c910fb
      Szabolcs Nagy 提交于
      * don't care about inexact flag
      * use double_t and float_t (faster, smaller, more precise on x86)
      * exp: underflow when result is zero or subnormal and not -inf
      * exp2: underflow when result is zero or subnormal and not exact
      * expm1: underflow when result is zero or subnormal
      * expl: don't underflow on -inf
      * exp2: fix incorrect comment
      * expm1: simplify special case handling and overflow properly
      * expm1: cleanup final scaling and fix negative left shift ub (twopk)
      39c910fb
    • S
      math: long double trigonometric cleanup (cosl, sinl, sincosl, tanl) · ea9bb95a
      Szabolcs Nagy 提交于
      ld128 support was added to internal kernel functions (__cosl, __sinl,
      __tanl, __rem_pio2l) from freebsd (not tested, but should be a good
      start for when ld128 arch arrives)
      
      __rem_pio2l had some code cleanup, the freebsd ld128 code seems to
      gather the results of a large reduction with precision loss (fixed
      the bug but a todo comment was added for later investigation)
      
      the old copyright was removed from the non-kernel wrapper functions
      (cosl, sinl, sincosl, tanl) since these are trivial and the interesting
      parts and comments had been already rewritten.
      ea9bb95a
    • S
      math: long double inverse trigonometric cleanup (acosl, asinl, atanl, atan2l) · bcd797a5
      Szabolcs Nagy 提交于
      * added ld128 support from freebsd fdlibm (untested)
      * using new ldshape union instead of IEEEl2bits
      * inexact status flag is not supported
      bcd797a5
    • S
      math: rewrite hypot · c2a0dfea
      Szabolcs Nagy 提交于
      method: if there is a large difference between the scale of x and y
      then the larger magnitude dominates, otherwise reduce x,y so the
      argument of sqrt (x*x+y*y) does not overflow or underflow and calculate
      the argument precisely using exact multiplication. If the argument
      has less error than 1/sqrt(2) ~ 0.7 ulp, then the result has less error
      than 1 ulp in nearest rounding mode.
      
      the original fdlibm method was the same, except it used bit hacks
      instead of dekker-veltkamp algorithm, which is problematic for long
      double where different representations are supported. (the new hypot
      and hypotl code should be smaller and faster on 32bit cpu archs with
      fast fpu), the new code behaves differently in non-nearest rounding,
      but the error should be still less than 2ulps.
      
      ld80 and ld128 are supported
      c2a0dfea
    • S
      math: rewrite remainder functions (remainder, remquo, fmod, modf) · ee2ee92d
      Szabolcs Nagy 提交于
      * results are exact
      * modfl follows truncl (raises inexact flag spuriously now)
      * modf and modff only had cosmetic cleanup
      * remainder is just a wrapper around remquo now
      * using iterative shift+subtract for remquo and fmod
      * ld80 and ld128 are supported as well
      ee2ee92d
    • S
      math: rewrite rounding functions (ceil, floor, trunc, round, rint) · d1a2ead8
      Szabolcs Nagy 提交于
      * faster, smaller, cleaner implementation than the bit hacks of fdlibm
      * use arithmetics like y=(double)(x+0x1p52)-0x1p52, which is an integer
      neighbor of x in all rounding modes (0<=x<0x1p52) and only use bithacks
      when that's faster and smaller (for float it usually is)
      * the code assumes standard excess precision handling for casts
      * long double code supports both ld80 and ld128
      * nearbyint is not changed (it is a wrapper around rint)
      d1a2ead8
    • S
      math: fix logb(-0.0) in downward rounding mode · 98be442e
      Szabolcs Nagy 提交于
      use -1/(x*x) instead of -1/(x+0) to return -inf, -0+0 is -0 in
      downward rounding mode
      98be442e
    • S
      math: ilogb cleanup · 4cec31fc
      Szabolcs Nagy 提交于
      * consistent code style
      * explicit union instead of typedef for double and float bit access
      * turn FENV_ACCESS ON to make 0/0.0f raise invalid flag
      * (untested) ld128 version of ilogbl (used by logbl which has ld128 support)
      4cec31fc
    • S
      long double cleanup, initial commit · af5f6d95
      Szabolcs Nagy 提交于
      new ldshape union, ld128 support is kept, code that used the old
      ldshape union was rewritten (IEEEl2bits union of freebsd libm is
      not touched yet)
      
      ld80 __fpclassifyl no longer tries to handle invalid representation
      af5f6d95
  2. 04 9月, 2013 1 次提交
  3. 03 9月, 2013 3 次提交
  4. 02 9月, 2013 1 次提交
  5. 01 9月, 2013 7 次提交
  6. 31 8月, 2013 6 次提交
  7. 28 8月, 2013 5 次提交
    • R
      remove -Wcast-align from --enable-warnings · f7bc29ed
      Rich Felker 提交于
      I originally added this warning option based on a misunderstanding of
      how it works. it does not warn whenever the destination of the cast
      has stricter alignment; it only warns in cases where misaligned
      dereference could lead to a fault. thus, it's essentially a no-op for
      i386, which had me wrongly believing the code was clean for this
      warning level. on other archs, numerous diagnostic messages are
      produced, and all of them are false-positives, so it's better just not
      to use it.
      f7bc29ed
    • R
      optimized C memcpy · 90edf1cc
      Rich Felker 提交于
      unlike the old C memcpy, this version handles word-at-a-time reads and
      writes even for misaligned copies. it does not require that the cpu
      support misaligned accesses; instead, it performs bit shifts to
      realign the bytes for the destination.
      
      essentially, this is the C version of the ARM assembly language
      memcpy. the ideas are all the same, and it should perform well on any
      arch with a decent number of general-purpose registers that has a
      barrel shift operation. since the barrel shifter is an optional cpu
      feature on microblaze, it may be desirable to provide an alternate asm
      implementation on microblaze, but otherwise the C code provides a
      competitive implementation for "generic risc-y" cpu archs that should
      alleviate the urgent need for arch-specific memcpy asm.
      90edf1cc
    • R
      stdbool.h should define __bool_true_false_are_defined even for C++ · 38e6acbf
      Rich Felker 提交于
      while the incorporation of this requirement from C99 into C++11 was
      likely an accident, some software expects it to be defined, and it
      doesn't hurt. if the requirement is removed, then presumably
      __bool_true_false_are_defined would just be in the implementation
      namespace and thus defining it would still be legal.
      38e6acbf
    • R
      fix invalid instruction mnemonics in powerpc fenv asm · ebc87349
      Rich Felker 提交于
      there is no non-dot version of the andis instruction, but there's no
      harm in updating the flags anyway, so just use the dot version.
      ebc87349
    • R
      optimized C memset · a543369e
      Rich Felker 提交于
      this version of memset is optimized both for small and large values of
      n, and makes no misaligned writes, so it is usable (and near-optimal)
      on all archs. it is capable of filling up to 52 or 56 bytes without
      entering a loop and with at most 7 branches, all of which can be fully
      predicted if memset is called multiple times with the same size.
      
      it also uses the attribute extension to inform the compiler that it is
      violating the aliasing rules, unlike the previous code which simply
      assumed it was safe to violate the aliasing rules since translation
      unit boundaries hide the violations from the compiler. for non-GNUC
      compilers, 100% portable fallback code in the form of a naive loop is
      provided. I intend to eventually apply this approach to all of the
      string/memory functions which are doing word-at-a-time accesses.
      a543369e