提交 · 6bbdbfdcdeac216c4a13edd04dee1f6d87bd33c8 · OpenHarmony / Third Party Musl

25 3月, 2020 4 次提交
- A
  
  math: move i386 sqrt to C with inline asm · acfe6d03
  由 Alexander Monakov 提交于 1月 07, 2020
  
  acfe6d03
- A
  
  math: move i386 sqrtf to C with inline asm · 29adaeb2
  由 Alexander Monakov 提交于 1月 06, 2020
  
  29adaeb2
- A
  
  math: move trivial x86-family sqrt functions to C with inline asm · 41b290ba
  由 Alexander Monakov 提交于 1月 06, 2020
  
  41b290ba
- A
  
  math: move x87-family fabs functions to C with inline asm · c24a9923
  由 Alexander Monakov 提交于 1月 06, 2020
  
  c24a9923
07 2月, 2020 5 次提交

remove i386 asm for single and double precision exp-family functions · a662220d

由 Rich Felker 提交于 2月 06, 2020

these did not truncate excess precision in the return value. fixing
them looks like considerable work, and the current C code seems to
outperform them significantly anyway.

long double functions are left in place because they are not subject
to excess precision issues and probably better than the C code.

a662220d

R
rename i386 exp.s to exp_ld.s · 2f0c31c0
由 Rich Felker 提交于 2月 06, 2020
```
this commit is for the sake of reviewable history.
```
2f0c31c0
R

fix excess precision in return value of i386 log-family functions · ab9e2090
由 Rich Felker 提交于 2月 06, 2020

ab9e2090
R
fix excess precision in return value of i386 acos[f] and asin[f] · 141c8d4c
由 Rich Felker 提交于 2月 06, 2020
```
analogous to commit 1c9afd69 for
atan[2][f].
```
141c8d4c

fix excess precision in return value of i386 atan[2][f] · 1c9afd69

由 Rich Felker 提交于 2月 06, 2020

for functions implemented in C, this is a requirement of C11 (F.6);
strictly speaking that text does not apply to standard library
functions, but it seems to be intended to apply to them, and C2x is
expected to make it a requirement.

failure to drop excess precision is particularly bad for inverse trig
functions, where a value with excess precision can be outside the
range of the function (entire range, or range for a particular
subdomain), breaking reasonable invariants a caller may expect.

1c9afd69

06 8月, 2019 2 次提交

R
fix build regression in i386 asm for atan2, atan2f · 6818c31c
由 Rich Felker 提交于 8月 05, 2019
```
commit f3ed8bfe inadvertently removed
labels that were still needed.
```
6818c31c

fix x87 stack imbalance in corner cases of i386 math asm · f3ed8bfe

由 Rich Felker 提交于 8月 05, 2019

commit 31c5fb80 introduced underflow
code paths for the i386 math asm, along with checks on the fpu status
word to skip the underflow-generation instructions if the underflow
flag was already raised. unfortunately, at least one such path, in
log1p, returned with 2 items on the x87 stack rather than just 1 item
for the return value. this is a violation of the ABI's calling
convention, and could cause subsequent floating point code to produce
NANs due to x87 stack overflow. if floating point results are used in
flow control, this can lead to runaway wrong code execution.

rather than reviewing each "underflow already raised" code path for
correctness, remove them all. they're likely slower than just
performing the underflow code unconditionally, and significantly more
complex.

all of this code should be ripped out and replaced by C source files
with inline asm. doing so would preclude this kind of error by having
the compiler perform all x87 stack register allocation and stack
manipulation, and would produce comparable or better code. however
such a change is a much larger project.

f3ed8bfe

19 4月, 2015 1 次提交

remove the last of possible-textrels from i386 asm · 8ed66ecb

由 Rich Felker 提交于 4月 18, 2015

none of these are actual textrels because of ld-time binding performed
by -Bsymbolic-functions, but I'm changing them with the goal of making
ld-time binding purely an optimization rather than relying on it for
semantic purposes.

in the case of memmove's call to memcpy, making it explicit that the
memmove asm is assuming the forward-copying behavior of the memcpy asm
is desirable anyway; in case memcpy is ever changed, the semantic
mismatch would be apparent while editing memmcpy.s.

8ed66ecb

06 11月, 2014 1 次提交

math: use fnstsw consistently instead of fstsw in x87 asm · ec431894

由 Szabolcs Nagy 提交于 11月 05, 2014

fnstsw does not wait for pending unmasked x87 floating-point exceptions
and it is the same as fstsw when all exceptions are masked which is the
only environment libc supports.

ec431894

09 1月, 2014 1 次提交
- S
  math: add drem and dremf weak aliases to i386 remainder asm · bcff807d
  由 Szabolcs Nagy 提交于 1月 08, 2014
```
weak_alias was only in the c code, so drem was missing on platforms
where remainder is implemented in asm.
```
  bcff807d
05 9月, 2013 1 次提交

math: fix exp2l asm on x86 (raise underflow correctly) · 07039ed8

由 Szabolcs Nagy 提交于 9月 05, 2013

there were two problems:
* omitted underflow on subnormal results: exp2l(-16383.5) was calculated
as sqrt(2)*2^-16384, the last bits of sqrt(2) are zero so the down scaling
does not underflow eventhough the result is in subnormal range
* spurious underflow for subnormal inputs: exp2l(0x1p-16400) was evaluated
as f2xm1(x)+1 and f2xm1 raised underflow (because inexact subnormal result)

the first issue is fixed by raising underflow manually if x is in
(-32768,-16382] and not integer (x-0x1p63+0x1p63 != x)

the second issue is fixed by treating x in (-0x1p64,0x1p64) specially

for these fixes the special case handling was completely rewritten

07039ed8

15 8月, 2013 3 次提交
- S
  
  math: fix i386 atan2.s to raise underflow for subnormal results · 411efb3b
  由 Szabolcs Nagy 提交于 8月 15, 2013
  
  411efb3b
- S
  math: fix x86 asin, atan, exp, log1p to raise underflow · 31c5fb80
  由 Szabolcs Nagy 提交于 8月 15, 2013
```
underflow is raised by an inexact subnormal float store,
since subnormal operations are slow, check the underflow
flag and skip the store if it's already raised
```
  31c5fb80
- S
  
  math: fix x86 expl.s to raise underflow and clean up special case handling · 1b3973fb
  由 Szabolcs Nagy 提交于 8月 15, 2013
  
  1b3973fb
17 12月, 2012 1 次提交
- S
  
  math: x86_64 version of expl, fixed some comments in the i386 version · 58bba42d
  由 Szabolcs Nagy 提交于 12月 16, 2012
  
  58bba42d
15 12月, 2012 1 次提交

math: fix i386/expl.s with more precise x*log2e · a8f73bb1

由 Szabolcs Nagy 提交于 12月 14, 2012

with naive exp2l(x*log2e) the last 12bits of the result was incorrect
for x with large absolute value

with hi + lo = x*log2e is caluclated to 128 bits precision and then
  expl(x) = exp2l(hi) + exp2l(hi) * f2xm1(lo)
this gives <1.5ulp measured error everywhere in nearest rounding mode

a8f73bb1

12 12月, 2012 1 次提交
- S
  math: add empty __invtrigl.s to i386 and x86_64 · 1384ad5f
  由 Szabolcs Nagy 提交于 12月 12, 2012
```
__invtrigl is not needed when acosl, asinl, atanl have asm
implementations
```
  1384ad5f
09 8月, 2012 1 次提交
- N
  math: fix exp.s on i386 and x86_64 so the exception flags are correct · 1fb01691
  由 nsz 提交于 8月 08, 2012
```
exp(inf), exp(-inf), exp(nan) used to raise wrong flags
```
  1fb01691
08 5月, 2012 1 次提交
- R
  
  some assemblers don't like fistpq; use the alt. mnemonic fistpll · 0e195dfa
  由 Rich Felker 提交于 5月 07, 2012
  
  0e195dfa
05 5月, 2012 1 次提交

math: change the formula used for acos.s · f697d66b

由 nsz 提交于 5月 05, 2012

old: 2*atan2(sqrt(1-x),sqrt(1+x))
new: atan2(fabs(sqrt((1-x)*(1+x))),x)
improvements:
* all edge cases are fixed (sign of zero in downward rounding)
* a bit faster (here a single call is about 131ns vs 162ns)
* a bit more precise (at most 1ulp error on 1M uniform random
samples in [0,1), the old formula gave some 2ulp errors as well)

f697d66b

04 4月, 2012 1 次提交

math: fix x86 asin accuracy · 37eaec3a

由 nsz 提交于 4月 04, 2012

use (1-x)*(1+x) instead of (1-x*x) in asin.s
the later can be inaccurate with upward rounding when x is close to 1

37eaec3a

29 3月, 2012 1 次提交

math: remove x86 modf asm · d79ac8c3

由 nsz 提交于 3月 29, 2012

the int part was wrong when -1 < x <= -0 (+0.0 instead of -0.0)
and the size and performace gain of the asm version was negligible

d79ac8c3

28 3月, 2012 1 次提交
- N
  math: fix typo in i386 remquof and remquol asm · ad23771c
  由 nsz 提交于 3月 27, 2012
```
(fldl instruction was used instead of flds and fldt)
```
  ad23771c
23 3月, 2012 1 次提交

asm for hypot and hypotf · ad2d2b96

由 Rich Felker 提交于 3月 23, 2012

special care is made to avoid any inexact computations when either arg
is zero (in which case the exact absolute value of the other arg
should be returned) and to support the special condition that
hypot(±inf,nan) yields inf.

hypotl is not yet implemented since avoiding overflow is nontrivial.

ad2d2b96

22 3月, 2012 1 次提交

acos.s fix: use the formula acos(x) = atan2(sqrt(1-x),sqrt(1+x)) · a4a0c912

由 nsz 提交于 3月 22, 2012

the old formula atan2(1,sqrt((1+x)/(1-x))) was faster but
could give nan result at x=1 when the rounding mode is
FE_DOWNWARD (so 1-1 == -0 and 2/-0 == -inf), the new formula
gives -0 at x=+-1 with downward rounding.

a4a0c912

20 3月, 2012 5 次提交

optimize scalbn family · baa43bca

由 Rich Felker 提交于 3月 20, 2012

the fscale instruction is slow everywhere, probably because it
involves a costly and unnecessary integer truncation operation that
ends up being a no-op in common usages. instead, construct a floating
point scale value with integer arithmetic and simply multiply by it,
when possible.

for float and double, this is always possible by going to the
next-larger type. we use some cheap but effective saturating
arithmetic tricks to make sure even very large-magnitude exponents
fit. for long double, if the scaling exponent is too large to fit in
the exponent of a long double value, we simply fallback to the
expensive fscale method.

on atom cpu, these changes speed up scalbn by over 30%. (min rdtsc
timing dropped from 110 cycles to 70 cycles.)

baa43bca

remquo asm: return quotient mod 8, as intended by the spec · 7513d3ec

由 Rich Felker 提交于 3月 19, 2012

this is a lot more efficient and also what is generally wanted.
perhaps the bit shuffling could be more efficient...

7513d3ec

R

use alternate formula for acos asm to avoid loss of precision · 804fbf0b
由 Rich Felker 提交于 3月 19, 2012

804fbf0b

fix exp asm · acb74492

由 Rich Felker 提交于 3月 19, 2012

exponents (base 2) near 16383 were broken due to (1) wrong cutoff, and
(2) inability to fit the necessary range of scalings into a long
double value.

as a solution, we fall back to using frndint/fscale for insanely large
exponents, and also have to special-case infinities here to avoid
inf-inf generating nan.

thankfully the costly code never runs in normal usage cases.

acb74492

R

bug fix: wrong opcode for writing long long · d9c1d72c
由 Rich Felker 提交于 3月 19, 2012

d9c1d72c

19 3月, 2012 6 次提交
- R
  
  asm for log1p · b04b5887
  由 Rich Felker 提交于 3月 19, 2012
  
  b04b5887
- R
  
  asm for log2 · 9d82a15e
  由 Rich Felker 提交于 3月 19, 2012
  
  9d82a15e
- R
  asm for remquo · 27deb538
  由 Rich Felker 提交于 3月 19, 2012
```
this could perhaps use some additional testing for corner cases, but
it seems to be correct.
```
  27deb538
- R
  optimize exponential asm for i386 · 02db27d9
  由 Rich Felker 提交于 3月 19, 2012
```
up to 30% faster exp2 by avoiding slow frndint and fscale functions.
expm1 also takes a much more direct path for small arguments (the
expected usage case).
```
  02db27d9
- R
  
  fix broken modf family functions · be5b01f8
  由 Rich Felker 提交于 3月 19, 2012
  
  be5b01f8
- R
  
  asm for modf functions · 1bf4dad3
  由 Rich Felker 提交于 3月 19, 2012
  
  1bf4dad3

OpenHarmony / Third Party Musl 大约 1 年 前同步成功

OpenHarmony / Third Party Musl
大约 1 年前同步成功