- 05 8月, 2020 1 次提交
-
-
由 Rich Felker 提交于
the existing code clobbered the canonical name already discovered every time another matching line was found, which will necessarily be the case when a hostname has both IPv4 and v6 definitions. patch by Wolf.
-
- 03 8月, 2020 1 次提交
-
-
由 Rich Felker 提交于
this is actually a functional fix at present, since the C sqrtl does not support ld80 and just wraps double sqrt. once that's fixed it will just be an optimization.
-
- 25 7月, 2020 1 次提交
-
-
由 Bartosz Brachaczek 提交于
if len==0, an uninitalized variable would be returned
-
- 07 7月, 2020 2 次提交
-
-
由 Rich Felker 提交于
the previous commit addressing async-signal-safety issues around pthread_kill did not fully fix pthread_cancel, which is also required (albeit rather irrationally) to be async-cancel-safe. without blocking implementation-internal signals, it's possible that, when async cancellation is enabled, a cancel signal sent by another thread interrupts pthread_kill while the killlock for a targeted thread is held. as a result, the calling thread will terminate due to cancellation without ever unlocking the targeted thread's killlock, and thus the targeted thread will be unable to exit.
-
由 Rich Felker 提交于
pthread_kill is required to be AS-safe. that requirement can't be met if the target thread's killlock can be taken in contexts where application-installed signal handlers can run. block signals around use of this lock in all pthread_* functions which target a tid, and reorder blocking/unblocking of signals in pthread_exit so that they're blocked whenever the killlock is held.
-
- 06 7月, 2020 1 次提交
-
-
由 Rich Felker 提交于
this broke mallocng size_to_class on archs without a native implementation of a_clz_32. the incorrect logic seems to have been something i derived from a related but distinct log2-type operation. with the change made here, it passes an exhaustive test. as this function is new and presently only used by mallocng, no other functionality was affected.
-
- 02 7月, 2020 1 次提交
-
-
由 Julien Ramseier 提交于
vfscanf() may use the variable 'alloc' uninitialized when taking the branch introduced by commit b287cd74. Spotted by clang.
-
- 30 6月, 2020 2 次提交
-
-
由 Rich Felker 提交于
the files added come from the mallocng development repo, commit 2ed58817cca5bc055974e5a0e43c280d106e696b. they comprise a new malloc implementation, developed over the past 9 months, to replace the old allocator (since dubbed "oldmalloc") with one that retains low code size and minimal baseline memory overhead while avoiding fundamental flaws in oldmalloc and making significant enhancements. these include highly controlled fragmentation, fine-grained ability to return memory to the system when freed, and strong hardening against dynamic memory usage errors by the caller. internally, mallocng derives most of these properties from tightly structuring memory, creating space for allocations as uniform-sized slots within individually mmapped (and individually freeable) allocation groups. smaller-than-pagesize groups are created within slots of larger ones. minimal group size is very small, and larger sizes (in geometric progression) only come into play when usage is high. all data necessary for maintaining consistency of the allocator state is tracked in out-of-band metadata, reachable via a validated path from minimal in-band metadata. all pointers passed (to free, etc.) are validated before any stores to memory take place. early reuse of freed slots is avoided via approximate LRU order of freed slots. further hardening against use-after-free and double-free, even in the case where the freed slot has been reused, is made by cycling the offset within the slot at which the allocation is placed; this is possible whenever the slot size is larger than the requested allocation.
-
由 Rich Felker 提交于
this includes both an implementation of reclaimed-gap donation from ldso and a version of mallocng's glue.h with namespace-safe linkage to underlying syscalls, integration with AT_RANDOM initialization, and internal locking that's optimized out when the process is single-threaded.
-
- 27 6月, 2020 1 次提交
-
-
由 Rich Felker 提交于
these are based on the ARM optimized-routines repository v20.05 (ef907c7a799a), with macro dependencies flattened out and memmove code removed from memcpy. this change is somewhat unfortunate since having the branch for memmove support in the large n case of memcpy is the performance-optimal and size-optimal way to do both, but it makes memcpy alone (static-linked) about 40% larger and suggests a policy that use of memcpy as memmove is supported. tabs used for alignment have also been replaced with spaces.
-
- 26 6月, 2020 1 次提交
-
-
由 Andre McCurdy 提交于
Allow the existing ARM assembler memcpy implementation to be used for both big and little endian targets.
-
- 21 6月, 2020 1 次提交
-
-
由 Rich Felker 提交于
the child is single-threaded, but may still need to synchronize with last changes made to memory by another thread in the parent, so set need_locks to -1 whereby the next lock-taker will drop to 0 and prevent further barriers/locking.
-
- 16 6月, 2020 3 次提交
-
-
由 Rich Felker 提交于
otherwise, shrink in-place. as explained in the description of commit 3e16313f, the split here is valid without holding split_merge_lock because all chunks involved are in the in-use state.
-
由 Rich Felker 提交于
commit 3e16313f introduced this bug by making the copy case reachable with n (new size) smaller than n0 (original size). this was left as the only way of shrinking an allocation because it reduces fragmentation if a free chunk of the appropriate size is available. when that's not the case, another approach may be better, but any such improvement would be independent of fixing this bug.
-
由 Rich Felker 提交于
access always computes result with real ids not effective ones, so it is not a valid means of determining whether the directory is readable. instead, attempt to open it before reporting whether it's readable, and then use fdopendir rather than opendir to open and read the entries. effort is made here to keep fd_limit behavior the same as before even if it was not correct.
-
- 11 6月, 2020 6 次提交
-
-
由 Rich Felker 提交于
some archs already have a_clz_32, used to provide a_ctz_32, but it hasn't been mandatory because it's not used anywhere yet. mallocng will need it, however, so add it now. it should probably be optimized better, but doesn't seem to make a difference at present.
-
由 Rich Felker 提交于
it both malloc and aligned_alloc have been replaced but the internal aligned_alloc still gets called, the replacement is a wrapper of some sort. it's not clear if this usage should be officially supported, but it's at least a plausibly interesting debugging usage, and easy to do. it should not be relied upon unless it's documented as supported at some later time.
-
由 Rich Felker 提交于
this is in preparation for improving behavior of malloc interposition.
-
由 Rich Felker 提交于
a new weak predicate function replacable by the malloc implementation, __malloc_allzerop, is introduced. by default it's always false; the default version will be used when static linking if the bump allocator was used (in which case performance doesn't matter) or if malloc was replaced by the application. only if the real internal malloc is linked (always the case with dynamic linking) does the real version get used. if malloc was replaced dynamically, as indicated by __malloc_replaced, the predicate function is ignored and conditional-memset is always performed.
-
由 Rich Felker 提交于
it's not part of the malloc implementation but glue with musl dynamic linker.
-
由 Rich Felker 提交于
abstractly, calloc is completely malloc-implementation-independent; it's malloc followed by memset, or as we do it, a "conditional memset" that avoids touching fresh zero pages. previously, calloc was kept separate for the bump allocator, which can always skip memset, and the version of calloc provided with the full malloc conditionally skipped the clearing for large direct-mmapped allocations. the latter is a moderately attractive optimization, and can be added back if needed. however, further consideration to make it correct under malloc replacement would be needed. commit b4b1e103 documented the contract for malloc replacement as allowing omission of calloc, and indeed that worked for dynamic linking, but for static linking it was possible to get the non-clearing definition from the bump allocator; if not for that, it would have been a link error trying to pull in malloc.o. the conditional-clearing code for the new common calloc is taken from mal0_clear in oldmalloc, but drops the need to access actual page size and just uses a fixed value of 4096. this avoids potentially needing access to global data for the sake of an optimization that at best marginally helps archs with offensively-large page sizes.
-
- 04 6月, 2020 8 次提交
-
-
由 Rich Felker 提交于
this sets the stage for replacement, and makes it practical to keep oldmalloc around as a build option for a while if that ends up being useful. only the files which are actually part of the implementation are moved. memalign and posix_memalign are entirely generic. in theory calloc could be pulled out too, but it's useful to have it tied to the implementation so as to optimize out unnecessary memset when implementation details make it possible to know the memory is already clear.
-
由 Rich Felker 提交于
this function is no longer used elsewhere, and moving it reduces the number of source files specific to the malloc implementation.
-
由 Rich Felker 提交于
-
由 Rich Felker 提交于
-
由 Rich Felker 提交于
this change eliminates the internal __memalign function and makes the memalign and posix_memalign functions completely independent of the malloc implementation, written portably in terms of aligned_alloc.
-
由 Rich Felker 提交于
this is the first step of swapping the name of the actual implementation to aligned_alloc while preserving history follow.
-
由 Rich Felker 提交于
this was an unfinished draft document present since the initial check-in, that was never intended to ship in its current form. remove it as part of reorganizing for replacement of the allocator.
-
由 Rich Felker 提交于
this affects the bump allocator used when static linking in programs that don't need allocation metadata due to not using realloc, free, etc. commit e3bc22f1 refactored the bump allocator to share code with __expand_heap, used by malloc, for the purpose of fixing the case (mainly nommu) where brk doesn't work. however, the geometric growth behavior of __expand_heap is not actually well-suited to the bump allocator, and can produce significant excessive memory usage. in particular, by repeatedly requesting just over the remaining free space in the current mmap-allocated area, the total mapped memory will be roughly double the nominal usage. and since the main user of the no-brk mmap fallback in the bump allocator is nommu, this excessive usage is not just virtual address space but physical memory. in addition, even on systems with brk, having a unified size request to __expand_heap without knowing whether the brk or mmap backend would get used made it so the brk could be expanded twice as far as needed. for example, with malloc(n) and n-1 bytes available before the current brk, the brk would be expanded by n bytes rounded up to page size, when expansion by just one page would have sufficed. the new implementation computes request size separately for the cases where brk expansion is being attempted vs using mmap, and also performs individual mmap of large allocations without moving to a new bump area and throwing away the rest of the old one. this greatly reduces the need for geometric area size growth and limits the extent to which free space at the end of one bump area might be unusable for future allocations. as a bonus, the resulting code size is somewhat smaller than the combined old version plus __expand_heap.
-
- 03 6月, 2020 6 次提交
-
-
由 Rich Felker 提交于
this reflects that it is no longer intended for consumption outside of the malloc implementation.
-
由 Rich Felker 提交于
this eliminates consumers of malloc_impl.h outside of the malloc implementation.
-
由 Rich Felker 提交于
-
由 Rich Felker 提交于
clock_adjtime always returns the current clock setting in struct timex, so it's always possible that the time64 version is needed.
-
由 Rich Felker 提交于
the 64-bit time code path used the wrong (time32) syscall. fortunately this code path is not yet taken unless attempting to set a post-Y2038 time.
-
由 Rich Felker 提交于
this has been a longstanding issue reported many times over the years, with it becoming increasingly clear that it could be hit in practice. under concurrent malloc and free from multiple threads, it's possible to hit usage patterns where unbounded amounts of new memory are obtained via brk/mmap despite the total nominal usage being small and bounded. the underlying cause is that, as a fundamental consequence of keeping locking as fine-grained as possible, the state where free has unbinned an already-free chunk to merge it with a newly-freed one, but has not yet re-binned the combined chunk, is exposed to other threads. this is bad even with small chunks, and leads to suboptimal use of memory, but where it really blows up is where the already-freed chunk in question is the large free region "at the top of the heap". in this situation, other threads momentarily see a state of having almost no free memory, and conclude that they need to obtain more. as far as I can tell there is no fix for this that does not harm performance. the fix made here forces all split/merge of free chunks to take place under a single lock, which also takes the place of the old free_lock, being held at least momentarily at the time of free to determine whether there are neighboring free chunks that need merging. as a consequence, the pretrim, alloc_fwd, and alloc_rev operations no longer make sense and are deleted. simplified merging now takes place inline in free (__bin_chunk) and realloc. as commented in the source, holding the split_merge_lock precludes any chunk transition from in-use to free state. for the most part, it also precludes change to chunk header sizes. however, __memalign may still modify the sizes of an in-use chunk to split it into two in-use chunks. arguably this should require holding the split_merge_lock, but that would necessitate refactoring to expose it externally, which is a mess. and it turns out not to be necessary, at least assuming the existing sloppy memory model malloc has been using, because if free (__bin_chunk) or realloc sees any unsynchronized change to the size, it will also see the in-use bit being set, and thereby can't do anything with the neighboring chunk that changed size.
-
- 23 5月, 2020 4 次提交
-
-
由 Rich Felker 提交于
the design used here relies on the barrier provided by the first lock operation after the process returns to single-threaded state to synchronize with actions by the last thread that exited. by storing the intent to change modes in the same object used to detect whether locking is needed, it's possible to avoid an extra (possibly costly) memory load after the lock is taken.
-
由 Rich Felker 提交于
these are all flags that can be single-byte values.
-
由 Rich Felker 提交于
after all but the last thread exits, the next thread to observe libc.threads_minus_1==0 and conclude that it can skip locking fails to synchronize with any changes to memory that were made by the last-exiting thread. this can produce data races. on some archs, at least x86, memory synchronization is unlikely to be a problem; however, with the inline locks in malloc, skipping the lock also eliminated the compiler barrier, and caused code that needed to re-check chunk in-use bits after obtaining the lock to reuse a stale value, possibly from before the process became single-threaded. this in turn produced corruption of the heap state. some uses of libc.threads_minus_1 remain, especially for allocation of new TLS in the dynamic linker; otherwise, it could be removed entirely. it's made non-volatile to reflect that the remaining accesses are only made under lock on the thread list. instead of libc.threads_minus_1, libc.threaded is now used for skipping locks. the difference is that libc.threaded is permanently true once an additional thread has been created. this will produce some performance regression in processes that are mostly single-threaded but occasionally creating threads. in the future it may be possible to bring back the full lock-skipping, but more care needs to be taken to produce a safe design.
-
由 Rich Felker 提交于
since the backend for LOCK() skips locking if single-threaded, it's unsafe to make the process appear single-threaded before the last use of lock. this fixes potential unsynchronized access to a linked list via __dl_thread_cleanup.
-
- 22 5月, 2020 1 次提交
-
-
由 Rich Felker 提交于
presently all archs define SIGSTKFLT but this is not correct. change strsignal as a prerequisite for fixing that.
-