提交 · ca36573ecfbbef7a1563aaa1a8486081f8c9fdda · OpenHarmony / Third Party Musl

You need to sign in or sign up before continuing.

11 6月, 2020 6 次提交

add fallback a_clz_32 implementation · ca36573e

由 Rich Felker 提交于 6月 11, 2020

some archs already have a_clz_32, used to provide a_ctz_32, but it
hasn't been mandatory because it's not used anywhere yet. mallocng
will need it, however, so add it now. it should probably be optimized
better, but doesn't seem to make a difference at present.

ca36573e

only disable aligned_alloc if malloc was replaced but it wasn't · 1fc67fc1

由 Rich Felker 提交于 6月 10, 2020

it both malloc and aligned_alloc have been replaced but the internal
aligned_alloc still gets called, the replacement is a wrapper of some
sort. it's not clear if this usage should be officially supported, but
it's at least a plausibly interesting debugging usage, and easy to do.
it should not be relied upon unless it's documented as supported at
some later time.

1fc67fc1

R
have ldso track replacement of aligned_alloc · e9f4fd11
由 Rich Felker 提交于 6月 10, 2020
```
this is in preparation for improving behavior of malloc interposition.
```
e9f4fd11

reintroduce calloc elison of memset for direct-mmapped allocations · 25cef5c5

由 Rich Felker 提交于 6月 10, 2020

a new weak predicate function replacable by the malloc implementation,
__malloc_allzerop, is introduced. by default it's always false; the
default version will be used when static linking if the bump allocator
was used (in which case performance doesn't matter) or if malloc was
replaced by the application. only if the real internal malloc is
linked (always the case with dynamic linking) does the real version
get used.

if malloc was replaced dynamically, as indicated by __malloc_replaced,
the predicate function is ignored and conditional-memset is always
performed.

25cef5c5

R
move __malloc_replaced to a top-level malloc file · 501a9266
由 Rich Felker 提交于 6月 10, 2020
```
it's not part of the malloc implementation but glue with musl dynamic
linker.
```
501a9266

switch to a common calloc implementation · 28f64fa6

由 Rich Felker 提交于 6月 10, 2020

abstractly, calloc is completely malloc-implementation-independent;
it's malloc followed by memset, or as we do it, a "conditional memset"
that avoids touching fresh zero pages.

previously, calloc was kept separate for the bump allocator, which can
always skip memset, and the version of calloc provided with the full
malloc conditionally skipped the clearing for large direct-mmapped
allocations. the latter is a moderately attractive optimization, and
can be added back if needed. however, further consideration to make it
correct under malloc replacement would be needed.

commit b4b1e103 documented the
contract for malloc replacement as allowing omission of calloc, and
indeed that worked for dynamic linking, but for static linking it was
possible to get the non-clearing definition from the bump allocator;
if not for that, it would have been a link error trying to pull in
malloc.o.

the conditional-clearing code for the new common calloc is taken from
mal0_clear in oldmalloc, but drops the need to access actual page size
and just uses a fixed value of 4096. this avoids potentially needing
access to global data for the sake of an optimization that at best
marginally helps archs with offensively-large page sizes.

28f64fa6

04 6月, 2020 8 次提交

move oldmalloc to its own directory under src/malloc · 384c0131

由 Rich Felker 提交于 6月 03, 2020

this sets the stage for replacement, and makes it practical to keep
oldmalloc around as a build option for a while if that ends up being
useful.

only the files which are actually part of the implementation are
moved. memalign and posix_memalign are entirely generic. in theory
calloc could be pulled out too, but it's useful to have it tied to the
implementation so as to optimize out unnecessary memset when
implementation details make it possible to know the memory is already
clear.

384c0131

move __expand_heap into malloc.c · eaa0f249

由 Rich Felker 提交于 6月 03, 2020

this function is no longer used elsewhere, and moving it reduces the
number of source files specific to the malloc implementation.

eaa0f249

R

rename memalign source file back to its proper name · e07138b8
由 Rich Felker 提交于 6月 03, 2020

e07138b8
R

rename aligned_alloc source file back to its proper name · fc18facf
由 Rich Felker 提交于 6月 03, 2020

fc18facf

reverse dependency order of memalign and aligned_alloc · d1e6fdd3

由 Rich Felker 提交于 6月 03, 2020

this change eliminates the internal __memalign function and makes the
memalign and posix_memalign functions completely independent of the
malloc implementation, written portably in terms of aligned_alloc.

d1e6fdd3

rename aligned_alloc source file · de798308

由 Rich Felker 提交于 6月 03, 2020

this is the first step of swapping the name of the actual
implementation to aligned_alloc while preserving history follow.

de798308

remove stale document from malloc src directory · 96490a4a

由 Rich Felker 提交于 6月 03, 2020

this was an unfinished draft document present since the initial
check-in, that was never intended to ship in its current form. remove
it as part of reorganizing for replacement of the allocator.

96490a4a

rewrite bump allocator to fix corner cases, decouple from expand_heap · c4694f40

由 Rich Felker 提交于 6月 03, 2020

this affects the bump allocator used when static linking in programs
that don't need allocation metadata due to not using realloc, free,
etc.

commit e3bc22f1 refactored the bump
allocator to share code with __expand_heap, used by malloc, for the
purpose of fixing the case (mainly nommu) where brk doesn't work.
however, the geometric growth behavior of __expand_heap is not
actually well-suited to the bump allocator, and can produce
significant excessive memory usage. in particular, by repeatedly
requesting just over the remaining free space in the current
mmap-allocated area, the total mapped memory will be roughly double
the nominal usage. and since the main user of the no-brk mmap fallback
in the bump allocator is nommu, this excessive usage is not just
virtual address space but physical memory.

in addition, even on systems with brk, having a unified size request
to __expand_heap without knowing whether the brk or mmap backend would
get used made it so the brk could be expanded twice as far as needed.
for example, with malloc(n) and n-1 bytes available before the current
brk, the brk would be expanded by n bytes rounded up to page size,
when expansion by just one page would have sufficed.

the new implementation computes request size separately for the cases
where brk expansion is being attempted vs using mmap, and also
performs individual mmap of large allocations without moving to a new
bump area and throwing away the rest of the old one. this greatly
reduces the need for geometric area size growth and limits the extent
to which free space at the end of one bump area might be unusable for
future allocations.

as a bonus, the resulting code size is somewhat smaller than the
combined old version plus __expand_heap.

c4694f40

03 6月, 2020 6 次提交

R
move malloc_impl.h from src/internal to src/malloc · 135c94f0
由 Rich Felker 提交于 6月 02, 2020
```
this reflects that it is no longer intended for consumption outside of
the malloc implementation.
```
135c94f0
R
move declaration of interfaces between malloc and ldso to dynlink.h · cee88b76
由 Rich Felker 提交于 6月 02, 2020
```
this eliminates consumers of malloc_impl.h outside of the malloc
implementation.
```
cee88b76
R

reformat clock_adjtime with always-true condition removed · 28be6122
由 Rich Felker 提交于 6月 02, 2020

28be6122

always use time64 syscall first for clock_adjtime · e0b17ef8

由 Rich Felker 提交于 6月 02, 2020

clock_adjtime always returns the current clock setting in struct
timex, so it's always possible that the time64 version is needed.

e0b17ef8

fix broken time64 clock_adjtime · ef51b762

由 Rich Felker 提交于 6月 02, 2020

the 64-bit time code path used the wrong (time32) syscall. fortunately
this code path is not yet taken unless attempting to set a post-Y2038
time.

ef51b762

fix unbounded heap expansion race in malloc · 3e16313f

由 Rich Felker 提交于 6月 02, 2020

this has been a longstanding issue reported many times over the years,
with it becoming increasingly clear that it could be hit in practice.
under concurrent malloc and free from multiple threads, it's possible
to hit usage patterns where unbounded amounts of new memory are
obtained via brk/mmap despite the total nominal usage being small and
bounded.

the underlying cause is that, as a fundamental consequence of keeping
locking as fine-grained as possible, the state where free has unbinned
an already-free chunk to merge it with a newly-freed one, but has not
yet re-binned the combined chunk, is exposed to other threads. this is
bad even with small chunks, and leads to suboptimal use of memory, but
where it really blows up is where the already-freed chunk in question
is the large free region "at the top of the heap". in this situation,
other threads momentarily see a state of having almost no free memory,
and conclude that they need to obtain more.

as far as I can tell there is no fix for this that does not harm
performance. the fix made here forces all split/merge of free chunks
to take place under a single lock, which also takes the place of the
old free_lock, being held at least momentarily at the time of free to
determine whether there are neighboring free chunks that need merging.

as a consequence, the pretrim, alloc_fwd, and alloc_rev operations no
longer make sense and are deleted. simplified merging now takes place
inline in free (__bin_chunk) and realloc.

as commented in the source, holding the split_merge_lock precludes any
chunk transition from in-use to free state. for the most part, it also
precludes change to chunk header sizes. however, __memalign may still
modify the sizes of an in-use chunk to split it into two in-use
chunks. arguably this should require holding the split_merge_lock, but
that would necessitate refactoring to expose it externally, which is a
mess. and it turns out not to be necessary, at least assuming the
existing sloppy memory model malloc has been using, because if free
(__bin_chunk) or realloc sees any unsynchronized change to the size,
it will also see the in-use bit being set, and thereby can't do
anything with the neighboring chunk that changed size.

3e16313f

02 6月, 2020 1 次提交

suppress unwanted warnings when configuring with clang · c40157d8

由 Rich Felker 提交于 6月 01, 2020

coding style warnings enabled by default in clang have long been a
source of spurious questions/bug-reports. since clang provides a -w
that behaves differently from gcc's, and that lets us enable any
warnings we may actually want after turning them all off to start with
a clean slate, use it at configure time if clang is detected.

c40157d8

23 5月, 2020 4 次提交

restore lock-skipping for processes that return to single-threaded state · 8d81ba8c

由 Rich Felker 提交于 5月 22, 2020

the design used here relies on the barrier provided by the first lock
operation after the process returns to single-threaded state to
synchronize with actions by the last thread that exited. by storing
the intent to change modes in the same object used to detect whether
locking is needed, it's possible to avoid an extra (possibly costly)
memory load after the lock is taken.

8d81ba8c

R
cut down size of some libc struct members · f12888e9
由 Rich Felker 提交于 5月 22, 2020
```
these are all flags that can be single-byte values.
```
f12888e9

don't use libc.threads_minus_1 as relaxed atomic for skipping locks · e01b5939

由 Rich Felker 提交于 5月 21, 2020

after all but the last thread exits, the next thread to observe
libc.threads_minus_1==0 and conclude that it can skip locking fails to
synchronize with any changes to memory that were made by the
last-exiting thread. this can produce data races.

on some archs, at least x86, memory synchronization is unlikely to be
a problem; however, with the inline locks in malloc, skipping the lock
also eliminated the compiler barrier, and caused code that needed to
re-check chunk in-use bits after obtaining the lock to reuse a stale
value, possibly from before the process became single-threaded. this
in turn produced corruption of the heap state.

some uses of libc.threads_minus_1 remain, especially for allocation of
new TLS in the dynamic linker; otherwise, it could be removed
entirely. it's made non-volatile to reflect that the remaining
accesses are only made under lock on the thread list.

instead of libc.threads_minus_1, libc.threaded is now used for
skipping locks. the difference is that libc.threaded is permanently
true once an additional thread has been created. this will produce
some performance regression in processes that are mostly
single-threaded but occasionally creating threads. in the future it
may be possible to bring back the full lock-skipping, but more care
needs to be taken to produce a safe design.

e01b5939

reorder thread list unlink in pthread_exit after all locks · 4d5aa20a

由 Rich Felker 提交于 5月 22, 2020

since the backend for LOCK() skips locking if single-threaded, it's
unsafe to make the process appear single-threaded before the last use
of lock.

this fixes potential unsynchronized access to a linked list via
__dl_thread_cleanup.

4d5aa20a

22 5月, 2020 2 次提交

fix incorrect SIGSTKFLT on all mips archs · cabc3696

由 Rich Felker 提交于 5月 21, 2020

signal 7 is SIGEMT on Linux mips* ABI according to the man pages and
kernel. it's not clear where the wrong name came from but it dates
back to original mips commit.

cabc3696

R
handle possibility that SIGEMT replaces SIGSTKFLT in strsignal · 09c54607
由 Rich Felker 提交于 5月 21, 2020
```
presently all archs define SIGSTKFLT but this is not correct. change
strsignal as a prerequisite for fixing that.
```
09c54607

20 5月, 2020 2 次提交

fix return value of res_send, res_query on errors from nameserver · 1b4e84c5

由 Rich Felker 提交于 5月 19, 2020

the internal __res_msend returns 0 on timeout without having obtained
any conclusive answer, but in this case has not filled in meaningful
anslen. res_send wrongly treated that as success, but returned a zero
answer length. any reasonable caller would eventually end up treating
that as an error when attempting to parse/validate it, but it should
just be reported as an error.

alternatively we could return the last-received inconclusive answer
(typically servfail), but doing so would require internal changes in
__res_msend. this may be considered later.

1b4e84c5

fix handling of errors resolving one of paired A+AAAA query · 5cf1ac24

由 Rich Felker 提交于 5月 19, 2020

the old logic here likely dates back, at least in inspiration, to
before it was recognized that transient errors must not be allowed to
reflect the contents of successful results and must be reported to the
application.

here, the dns backend for getaddrinfo, when performing a paired query
for v4 and v6 addresses, accepted results for one address family even
if the other timed out. (the __res_msend backend does not propagate
error rcodes back to the caller, but continues to retry until timeout,
so other error conditions were not actually possible.)

this patch moves the checks to take place before answer parsing, and
performs them for each answer rather than only the answer to the first
query. if nxdomain is seen it's assumed to apply to both queries since
that's how dns semantics work.

5cf1ac24

19 5月, 2020 1 次提交

set AD bit in dns queries, suppress for internal use · fd7ec068

由 Rich Felker 提交于 5月 18, 2020

the AD (authenticated data) bit in outgoing dns queries is defined by
rfc3655 to request that the nameserver report (via the same bit in the
response) whether the result is authenticated by DNSSEC. while all
results returned by a DNSSEC conforming nameserver will be either
authenticated or cryptographically proven to lack DNSSEC protection,
for some applications it's necessary to be able to distinguish these
two cases. in particular, conforming and compatible handling of DANE
(TLSA) records requires enforcing them only in signed zones.

when the AD bit was first defined for queries, there were reports of
compatibility problems with broken firewalls and nameservers dropping
queries with it set. these problems are probably a thing of the past,
and broken nameservers are already unsupported. however, since there
is no use in the AD bit with the netdb.h interfaces, explicitly clear
it in the queries they make. this ensures that, even with broken
setups, the standard functions will work, and at most the res_*
functions break.

fd7ec068

01 5月, 2020 1 次提交

fix undefined behavior from signed overflow in strstr and memmem · 593caa45

由 Rich Felker 提交于 4月 30, 2020

unsigned char promotes to int, which can overflow when shifted left by
24 bits or more. this has been reported multiple times but then
forgotten. it's expected to be benign UB, but can trap when built with
explicit overflow catching (ubsan or similar). fix it now.

note that promotion to uint32_t is safe and portable even outside of
the assumptions usually made in musl, since either uint32_t has rank
at least unsigned int, so that no further default promotions happen,
or int is wide enough that the shift can't overflow. this is a
desirable property to have in case someone wants to reuse the code
elsewhere.

593caa45

27 4月, 2020 1 次提交

remove arm (32-bit) support for vdso clock_gettime · 4486c579

由 Rich Felker 提交于 4月 26, 2020

it's been reported that the vdso clock_gettime64 function on (32-bit)
arm is broken, producing erratic results that grow at a rate far
greater than one reported second per actual elapsed second. the vdso
function seems to have been added sometime between linux 5.4 and 5.6,
so if there's ever been a working version, it was only present for a
very short window.

it's not clear what the eventual upstream kernel solution will be, but
something needs to be done on the libc side so as not to be producing
binaries that seem to work on older/existing/lts kernels (which lack
the function and thus lack the bug) but will break fantastically when
moving to newer kernels.

hopefully vdso support will be added back soon, but with a new symbol
name or version from the kernel to allow continued rejection of broken
ones.

4486c579

24 4月, 2020 1 次提交

fix undefined behavior in wcsto[ld] family functions · f3ecdc10

由 Rich Felker 提交于 4月 24, 2020

analogous to commit b287cd74 but for
the custom FILE stream type the wcstol and wcstod family use. __toread
could be used here as well, but there's a simple direct fix to make
the buffer pointers initially valid for subtraction, so just do that
to avoid pulling in stdio exit code in programs that don't use stdio.

f3ecdc10

18 4月, 2020 6 次提交

fix sh fesetround failure to clear old mode · 043c6e31

由 Rich Felker 提交于 4月 18, 2020

the sh version of fesetround or'd the new rounding mode onto the
control register without clearing the old rounding mode bits, making
changes sticky. this was the root cause of multiple test failures.

043c6e31

move __string_read into vsscanf source file · 2e0907ce

由 Rich Felker 提交于 4月 17, 2020

apparently this function was intended at some point to be used by
strto* family as well, and thus was put in its own file; however, as
far as I can tell, it's only ever been used by vsscanf. move it to the
same file to reduce the number of source files and external symbols.

2e0907ce

R

remove spurious repeated semicolon in fmemopen · 2acf3bce
由 Rich Felker 提交于 4月 17, 2020

2acf3bce

combine two calls to memset in fmemopen · 74fa4aac

由 Rich Felker 提交于 4月 17, 2020

this idea came up when I thought we might need to zero the UNGET
portion of buf as well, but it seems like a useful improvement even
when that turned out not to be necessary.

74fa4aac

fix possible access to uninitialized memory in shgetc (via scanf) · 086542fb

由 Rich Felker 提交于 4月 17, 2020

shgetc sets up to be able to perform an "unget" operation without the
caller having to remember and pass back the character value, and for
this purpose used a conditional store idiom:

    if (f->rpos[-1] != c) f->rpos[-1] = c

to make it safe to use with non-writable buffers (setup by the
sh_fromstring macro or __string_read with sscanf).

however, validity of this depends on the buffer space at rpos[-1]
being initialized, which is not the case under some conditions
(including at least unbuffered files and fmemopen ones).

whenever data was read "through the buffer", the desired character
value is already in place and does not need to be written. thus,
rather than testing for the absence of the value, we can test for
rpos<=buf, indicating that the last character read could not have come
from the buffer, and thereby that we have a "real" buffer (possibly of
zero length) with writable pushback (UNGET bytes) below it.

086542fb

fix undefined behavior in scanf core · b287cd74

由 Rich Felker 提交于 4月 17, 2020

as reported/analyzed by Pascal Cuoq, the shlim and shcnt
macros/functions are called by the scanf core (vfscanf) with f->rpos
potentially null (if the FILE is not yet activated for reading at the
time of the call). in this case, they compute differences between a
null pointer (f->rpos) and a non-null one (f->buf), resulting in
undefined behavior.

it's unlikely that any observably wrong behavior occurred in practice,
at least without LTO, due to limits on what's visible to the compiler
from translation unit boundaries, but this has not been checked.

fix is simply ensuring that the FILE is activated for read mode before
entering the main scanf loop, and erroring out early if it can't be.

b287cd74

25 3月, 2020 1 次提交
- A
  
  math: add x86_64 remquol · 19f870c3
  由 Alexander Monakov 提交于 1月 16, 2020
  
  19f870c3

OpenHarmony / Third Party Musl 大约 1 年 前同步成功

OpenHarmony / Third Party Musl
大约 1 年前同步成功