提交 · d49cf07541bb54a5ac7aec1feec8514db33db8ea · OpenHarmony / Third Party Musl

18 8月, 2020 1 次提交

由 Rich Felker 提交于 8月 17, 2020

this is a prerequisite for addition of other interfaces that use
kernel tids, including futex and SIGEV_THREAD_ID.

there is some ambiguity as to whether the semantic return type should
be int or pid_t. either way, futex API imposes a contract that the
values fit in int (excluding some upper reserved bits). glibc used
pid_t, so in the interest of not having gratuitous mismatch (the
underlying types are the same anyway), pid_t is used here as well.

while conceptually this is a syscall, the copy stored in the thread
structure is always valid in all contexts where it's valid to call
libc functions, so it's used to avoid the syscall.

d49cf075

13 8月, 2020 2 次提交

aarch64: fix setjmp return value · 22359b54

由 Szabolcs Nagy 提交于 8月 12, 2020

longjmp should set the return value of setjmp, but 64bit
registers were used for the 0 check while the type is int.

use the code that gcc generates for return val ? val : 1;

22359b54

setjmp: optimize longjmp prologues · 4554f155

由 Alexander Monakov 提交于 8月 12, 2020

Use a branchless sequence that is one byte shorter on 64-bit, same size
on 32-bit. Thanks to Pete Cawley for suggesting this variant.

4554f155

12 8月, 2020 3 次提交

A

setjmp: optimize x86 longjmp epilogues · 59b64ff6
由 Alexander Monakov 提交于 8月 11, 2020

59b64ff6
A

setjmp: avoid useless REX-prefix on xor %eax, %eax · c6a6fe4c
由 Alexander Monakov 提交于 8月 11, 2020

c6a6fe4c

setjmp: fix x86-64 longjmp argument adjustment · 21431a0e

由 Alexander Monakov 提交于 8月 11, 2020

longjmp 'val' argument is an int, but the assembly is referencing 64-bit
registers as if the argument was a long, or the caller was responsible
for extending the argument. Though the psABI is not clear on this, the
interpretation in GCC is that high bits may be arbitrary and the callee
is responsible for sign/zero-extending the value as needed (likewise for
return values: callers must anticipate that high bits may be garbage).

Therefore testing %rax is a functional bug: setjmp would wrongly return
zero if longjmp was called with val==0, but high bits of %rsi happened
to be non-zero.

Rewrite the prologue to refer to 32-bit registers. In passing, change
'test' to use %rsi, as there's no advantage to using %rax and the new
form is cheaper on processors that do not perform move elimination.

21431a0e

09 8月, 2020 1 次提交

prefer new socket syscalls, fallback to SYS_socketcall only if needed · c2feda4e

由 Rich Felker 提交于 8月 08, 2020

a number of users performing seccomp filtering have requested use of
the new individual syscall numbers for socket syscalls, rather than
the legacy multiplexed socketcall, since the latter has the arguments
all in memory where they can't participate in filter decisions.

previously, some archs used the multiplexed socketcall if it was
historically all that was available, while other archs used the
separate syscalls. the intent was that the latter set only include
archs that have "always" had separate socket syscalls, at least going
back to linux 2.6.0. however, at least powerpc, powerpc64, and sh were
wrongly included in this set, and thus socket operations completely
failed on old kernels for these archs.

with the changes made here, the separate syscalls are always
preferred, but fallback code is compiled for archs that also define
SYS_socketcall. two such archs, mips (plain o32) and microblaze,
define SYS_socketcall despite never having needed it, so it's now
undefined by their versions of syscall_arch.h to prevent inclusion of
useless fallback code.

some archs, where the separate syscalls were only added after the
addition of SYS_accept4, lack SYS_accept. because socket calls are
always made with zeros in the unused argument positions, it suffices
to just use SYS_accept4 to provide a definition of SYS_accept, and
this is done to make happy the macro machinery that concatenates the
socket call name onto __SC_ and SYS_.

c2feda4e

06 8月, 2020 5 次提交

math: new software sqrtl · 933f8e72

由 Szabolcs Nagy 提交于 6月 14, 2020

same approach as in sqrt.

sqrtl was broken on aarch64, riscv64 and s390x targets because
of missing quad precision support and on m68k-sf because of
missing ld80 sqrtl.

this implementation is written for quad precision and then
edited to make it work for both m68k and x86 style ld80 formats
too, but it is not expected to be optimal for them.

note: using fp instructions for the initial estimate when such
instructions are available (e.g. double prec sqrt or rsqrt) is
avoided because of fenv correctness.

933f8e72

S
math: add __math_invalidl · 4f893997
由 Szabolcs Nagy 提交于 6月 29, 2020
```
for targets where long double is different from double.
```
4f893997

math: new software sqrtf · b1756ec8

由 Szabolcs Nagy 提交于 6月 12, 2020

same method as in sqrt, this was tested on all inputs against
an sqrtf instruction. (the only difference found was that x86
sqrtf does not signal the x86 specific input-denormal exception
on negative subnormal inputs while the software sqrtf does,
this is fine as it was designed for ieee754 exceptions only.)

there is known faster method:
"Computing Floating-Point Square Roots via Bivariate Polynomial Evaluation"
that computes sqrtf directly via pipelined polynomial evaluation
which allows more parallelism, but the design does not generalize
easily to higher precisions.

b1756ec8

math: new software sqrt · 97e9b73d

由 Szabolcs Nagy 提交于 6月 13, 2020

approximate 1/sqrt(x) and sqrt(x) with goldschmidt iterations.
this is known to be a fast method for computing sqrt, but it is
tricky to get right, so added detailed comments.

use a lookup table for the initial estimate, this adds 256bytes
rodata but it can be shared between sqrt, sqrtf and sqrtl.
this saves one iteration compared to a linear estimate.

this is for soft float targets, but it supports fenv by using a
floating-point operation to get the final result.  the result
is correctly rounded in all rounding modes.  if fenv support is
turned off then the nearest rounded result is computed and
inexact exception is not signaled.

assumes fast 32bit integer arithmetics and 32 to 64bit mul.

97e9b73d

in hosts file lookups, honor first canonical name regardless of family · f1198ea3

由 Rich Felker 提交于 8月 05, 2020

prior to this change, the canonical name came from the first hosts
file line matching the requested family, so the canonical name for a
given hostname could differ depending on whether it was requested with
AF_UNSPEC or a particular family (AF_INET or AF_INET6). now, the
canonical name is deterministically the first one to appear with the
requested name as an alias.

f1198ea3

05 8月, 2020 1 次提交

in hosts file lookups, use only first match for canonical name · 20c6d83f

由 Rich Felker 提交于 8月 04, 2020

the existing code clobbered the canonical name already discovered
every time another matching line was found, which will necessarily be
the case when a hostname has both IPv4 and v6 definitions.

patch by Wolf.

20c6d83f

04 8月, 2020 1 次提交
- R
  
  release 1.2.1 · 73cc775b
  由 Rich Felker 提交于 8月 04, 2020
  
  73cc775b
03 8月, 2020 1 次提交

add m68k sqrtl using native instruction · 845e4f66

由 Rich Felker 提交于 8月 02, 2020

this is actually a functional fix at present, since the C sqrtl does
not support ld80 and just wraps double sqrt. once that's fixed it will
just be an optimization.

845e4f66

25 7月, 2020 1 次提交
- B
  getentropy: fix UB if len==0 · ddf1750e
  由 Bartosz Brachaczek 提交于 7月 17, 2020
```
if len==0, an uninitalized variable would be returned
```
  ddf1750e
07 7月, 2020 2 次提交

fix async-cancel-safety of pthread_cancel · 52ee0dd6

由 Rich Felker 提交于 7月 06, 2020

the previous commit addressing async-signal-safety issues around
pthread_kill did not fully fix pthread_cancel, which is also required
(albeit rather irrationally) to be async-cancel-safe.

without blocking implementation-internal signals, it's possible that,
when async cancellation is enabled, a cancel signal sent by another
thread interrupts pthread_kill while the killlock for a targeted
thread is held. as a result, the calling thread will terminate due to
cancellation without ever unlocking the targeted thread's killlock,
and thus the targeted thread will be unable to exit.

52ee0dd6

make thread killlock async-signal-safe for pthread_kill · 7cc9496a

由 Rich Felker 提交于 7月 06, 2020

pthread_kill is required to be AS-safe. that requirement can't be met
if the target thread's killlock can be taken in contexts where
application-installed signal handlers can run.

block signals around use of this lock in all pthread_* functions which
target a tid, and reorder blocking/unblocking of signals in
pthread_exit so that they're blocked whenever the killlock is held.

7cc9496a

06 7月, 2020 1 次提交

fix C implementation of a_clz_32 · 0a005f49

由 Rich Felker 提交于 7月 05, 2020

this broke mallocng size_to_class on archs without a native
implementation of a_clz_32. the incorrect logic seems to have been
something i derived from a related but distinct log2-type operation.
with the change made here, it passes an exhaustive test.

as this function is new and presently only used by mallocng, no other
functionality was affected.

0a005f49

02 7月, 2020 1 次提交
- J
  vfscanf: fix possible invalid free due to uninitialized variable use · a62df9c9
  由 Julien Ramseier 提交于 7月 01, 2020
```
vfscanf() may use the variable 'alloc' uninitialized when taking the
branch introduced by commit b287cd74.
Spotted by clang.
```
  a62df9c9
01 7月, 2020 2 次提交
- R
  
  make mallocng the default malloc implementation · ea6d7847
  由 Rich Felker 提交于 6月 30, 2020
  
  ea6d7847
- R
  add malloc implementation selection to configure · e71188fa
  由 Rich Felker 提交于 6月 30, 2020
```
the intent here is to keep oldmalloc as an option, at least for the
short term, in case any users are negatively impacted in some way by
mallocng and need to fallback until their issues are resolved.
```
  e71188fa
30 6月, 2020 2 次提交

import mallocng · 503bd397

由 Rich Felker 提交于 6月 30, 2020

the files added come from the mallocng development repo, commit
2ed58817cca5bc055974e5a0e43c280d106e696b. they comprise a new malloc
implementation, developed over the past 9 months, to replace the old
allocator (since dubbed "oldmalloc") with one that retains low code
size and minimal baseline memory overhead while avoiding fundamental
flaws in oldmalloc and making significant enhancements. these include
highly controlled fragmentation, fine-grained ability to return memory
to the system when freed, and strong hardening against dynamic memory
usage errors by the caller.

internally, mallocng derives most of these properties from tightly
structuring memory, creating space for allocations as uniform-sized
slots within individually mmapped (and individually freeable)
allocation groups. smaller-than-pagesize groups are created within
slots of larger ones. minimal group size is very small, and larger
sizes (in geometric progression) only come into play when usage is
high.

all data necessary for maintaining consistency of the allocator state
is tracked in out-of-band metadata, reachable via a validated path
from minimal in-band metadata. all pointers passed (to free, etc.) are
validated before any stores to memory take place. early reuse of freed
slots is avoided via approximate LRU order of freed slots. further
hardening against use-after-free and double-free, even in the case
where the freed slot has been reused, is made by cycling the offset
within the slot at which the allocation is placed; this is possible
whenever the slot size is larger than the requested allocation.

503bd397

add glue code for mallocng merge · 785752a5

由 Rich Felker 提交于 6月 29, 2020

this includes both an implementation of reclaimed-gap donation from
ldso and a version of mallocng's glue.h with namespace-safe linkage to
underlying syscalls, integration with AT_RANDOM initialization, and
internal locking that's optimized out when the process is
single-threaded.

785752a5

27 6月, 2020 1 次提交

add optimized aarch64 memcpy and memset · fdf8b2ad

由 Rich Felker 提交于 6月 26, 2020

these are based on the ARM optimized-routines repository v20.05
(ef907c7a799a), with macro dependencies flattened out and memmove code
removed from memcpy. this change is somewhat unfortunate since having
the branch for memmove support in the large n case of memcpy is the
performance-optimal and size-optimal way to do both, but it makes
memcpy alone (static-linked) about 40% larger and suggests a policy
that use of memcpy as memmove is supported.

tabs used for alignment have also been replaced with spaces.

fdf8b2ad

26 6月, 2020 1 次提交

add big-endian support to ARM assembler memcpy · 9dce93ac

由 Andre McCurdy 提交于 1月 21, 2020

Allow the existing ARM assembler memcpy implementation to be used for
both big and little endian targets.

9dce93ac

21 6月, 2020 1 次提交

clear need_locks in child after fork · 8ed2bd8b

由 Rich Felker 提交于 6月 21, 2020

the child is single-threaded, but may still need to synchronize with
last changes made to memory by another thread in the parent, so set
need_locks to -1 whereby the next lock-taker will drop to 0 and
prevent further barriers/locking.

8ed2bd8b

16 6月, 2020 3 次提交

only use memcpy realloc to shrink if an exact-sized free chunk exists · fca7428c

由 Rich Felker 提交于 6月 16, 2020

otherwise, shrink in-place. as explained in the description of commit
3e16313f, the split here is valid
without holding split_merge_lock because all chunks involved are in
the in-use state.

fca7428c

fix memset overflow in oldmalloc race fix overhaul · cb5babdc

由 Rich Felker 提交于 6月 16, 2020

commit 3e16313f introduced this bug by
making the copy case reachable with n (new size) smaller than n0
(original size). this was left as the only way of shrinking an
allocation because it reduces fragmentation if a free chunk of the
appropriate size is available. when that's not the case, another
approach may be better, but any such improvement would be independent
of fixing this bug.

cb5babdc

fix invalid use of access function in nftw · 4bd22b8f

由 Rich Felker 提交于 6月 15, 2020

access always computes result with real ids not effective ones, so it
is not a valid means of determining whether the directory is readable.
instead, attempt to open it before reporting whether it's readable,
and then use fdopendir rather than opendir to open and read the
entries.

effort is made here to keep fd_limit behavior the same as before even
if it was not correct.

4bd22b8f

11 6月, 2020 6 次提交

add fallback a_clz_32 implementation · ca36573e

由 Rich Felker 提交于 6月 11, 2020

some archs already have a_clz_32, used to provide a_ctz_32, but it
hasn't been mandatory because it's not used anywhere yet. mallocng
will need it, however, so add it now. it should probably be optimized
better, but doesn't seem to make a difference at present.

ca36573e

only disable aligned_alloc if malloc was replaced but it wasn't · 1fc67fc1

由 Rich Felker 提交于 6月 10, 2020

it both malloc and aligned_alloc have been replaced but the internal
aligned_alloc still gets called, the replacement is a wrapper of some
sort. it's not clear if this usage should be officially supported, but
it's at least a plausibly interesting debugging usage, and easy to do.
it should not be relied upon unless it's documented as supported at
some later time.

1fc67fc1

R
have ldso track replacement of aligned_alloc · e9f4fd11
由 Rich Felker 提交于 6月 10, 2020
```
this is in preparation for improving behavior of malloc interposition.
```
e9f4fd11

reintroduce calloc elison of memset for direct-mmapped allocations · 25cef5c5

由 Rich Felker 提交于 6月 10, 2020

a new weak predicate function replacable by the malloc implementation,
__malloc_allzerop, is introduced. by default it's always false; the
default version will be used when static linking if the bump allocator
was used (in which case performance doesn't matter) or if malloc was
replaced by the application. only if the real internal malloc is
linked (always the case with dynamic linking) does the real version
get used.

if malloc was replaced dynamically, as indicated by __malloc_replaced,
the predicate function is ignored and conditional-memset is always
performed.

25cef5c5

R
move __malloc_replaced to a top-level malloc file · 501a9266
由 Rich Felker 提交于 6月 10, 2020
```
it's not part of the malloc implementation but glue with musl dynamic
linker.
```
501a9266

switch to a common calloc implementation · 28f64fa6

由 Rich Felker 提交于 6月 10, 2020

abstractly, calloc is completely malloc-implementation-independent;
it's malloc followed by memset, or as we do it, a "conditional memset"
that avoids touching fresh zero pages.

previously, calloc was kept separate for the bump allocator, which can
always skip memset, and the version of calloc provided with the full
malloc conditionally skipped the clearing for large direct-mmapped
allocations. the latter is a moderately attractive optimization, and
can be added back if needed. however, further consideration to make it
correct under malloc replacement would be needed.

commit b4b1e103 documented the
contract for malloc replacement as allowing omission of calloc, and
indeed that worked for dynamic linking, but for static linking it was
possible to get the non-clearing definition from the bump allocator;
if not for that, it would have been a link error trying to pull in
malloc.o.

the conditional-clearing code for the new common calloc is taken from
mal0_clear in oldmalloc, but drops the need to access actual page size
and just uses a fixed value of 4096. this avoids potentially needing
access to global data for the sake of an optimization that at best
marginally helps archs with offensively-large page sizes.

28f64fa6

04 6月, 2020 4 次提交

move oldmalloc to its own directory under src/malloc · 384c0131

由 Rich Felker 提交于 6月 03, 2020

this sets the stage for replacement, and makes it practical to keep
oldmalloc around as a build option for a while if that ends up being
useful.

only the files which are actually part of the implementation are
moved. memalign and posix_memalign are entirely generic. in theory
calloc could be pulled out too, but it's useful to have it tied to the
implementation so as to optimize out unnecessary memset when
implementation details make it possible to know the memory is already
clear.

384c0131

move __expand_heap into malloc.c · eaa0f249

由 Rich Felker 提交于 6月 03, 2020

this function is no longer used elsewhere, and moving it reduces the
number of source files specific to the malloc implementation.

eaa0f249

R

rename memalign source file back to its proper name · e07138b8
由 Rich Felker 提交于 6月 03, 2020

e07138b8
R

rename aligned_alloc source file back to its proper name · fc18facf
由 Rich Felker 提交于 6月 03, 2020

fc18facf

OpenHarmony / Third Party Musl 大约 1 年 前同步成功

OpenHarmony / Third Party Musl
大约 1 年前同步成功