提交 · 503bd3976623493a10b0f32c617feb51f9ba04c8 · OpenHarmony / Third Party Musl

30 6月, 2020 2 次提交

由 Rich Felker 提交于 6月 30, 2020

the files added come from the mallocng development repo, commit
2ed58817cca5bc055974e5a0e43c280d106e696b. they comprise a new malloc
implementation, developed over the past 9 months, to replace the old
allocator (since dubbed "oldmalloc") with one that retains low code
size and minimal baseline memory overhead while avoiding fundamental
flaws in oldmalloc and making significant enhancements. these include
highly controlled fragmentation, fine-grained ability to return memory
to the system when freed, and strong hardening against dynamic memory
usage errors by the caller.

internally, mallocng derives most of these properties from tightly
structuring memory, creating space for allocations as uniform-sized
slots within individually mmapped (and individually freeable)
allocation groups. smaller-than-pagesize groups are created within
slots of larger ones. minimal group size is very small, and larger
sizes (in geometric progression) only come into play when usage is
high.

all data necessary for maintaining consistency of the allocator state
is tracked in out-of-band metadata, reachable via a validated path
from minimal in-band metadata. all pointers passed (to free, etc.) are
validated before any stores to memory take place. early reuse of freed
slots is avoided via approximate LRU order of freed slots. further
hardening against use-after-free and double-free, even in the case
where the freed slot has been reused, is made by cycling the offset
within the slot at which the allocation is placed; this is possible
whenever the slot size is larger than the requested allocation.

503bd397

add glue code for mallocng merge · 785752a5

由 Rich Felker 提交于 6月 29, 2020

this includes both an implementation of reclaimed-gap donation from
ldso and a version of mallocng's glue.h with namespace-safe linkage to
underlying syscalls, integration with AT_RANDOM initialization, and
internal locking that's optimized out when the process is
single-threaded.

785752a5

16 6月, 2020 2 次提交

only use memcpy realloc to shrink if an exact-sized free chunk exists · fca7428c

由 Rich Felker 提交于 6月 16, 2020

otherwise, shrink in-place. as explained in the description of commit
3e16313f, the split here is valid
without holding split_merge_lock because all chunks involved are in
the in-use state.

fca7428c

fix memset overflow in oldmalloc race fix overhaul · cb5babdc

由 Rich Felker 提交于 6月 16, 2020

commit 3e16313f introduced this bug by
making the copy case reachable with n (new size) smaller than n0
(original size). this was left as the only way of shrinking an
allocation because it reduces fragmentation if a free chunk of the
appropriate size is available. when that's not the case, another
approach may be better, but any such improvement would be independent
of fixing this bug.

cb5babdc

11 6月, 2020 5 次提交

only disable aligned_alloc if malloc was replaced but it wasn't · 1fc67fc1

由 Rich Felker 提交于 6月 10, 2020

it both malloc and aligned_alloc have been replaced but the internal
aligned_alloc still gets called, the replacement is a wrapper of some
sort. it's not clear if this usage should be officially supported, but
it's at least a plausibly interesting debugging usage, and easy to do.
it should not be relied upon unless it's documented as supported at
some later time.

1fc67fc1

R
have ldso track replacement of aligned_alloc · e9f4fd11
由 Rich Felker 提交于 6月 10, 2020
```
this is in preparation for improving behavior of malloc interposition.
```
e9f4fd11

reintroduce calloc elison of memset for direct-mmapped allocations · 25cef5c5

由 Rich Felker 提交于 6月 10, 2020

a new weak predicate function replacable by the malloc implementation,
__malloc_allzerop, is introduced. by default it's always false; the
default version will be used when static linking if the bump allocator
was used (in which case performance doesn't matter) or if malloc was
replaced by the application. only if the real internal malloc is
linked (always the case with dynamic linking) does the real version
get used.

if malloc was replaced dynamically, as indicated by __malloc_replaced,
the predicate function is ignored and conditional-memset is always
performed.

25cef5c5

R
move __malloc_replaced to a top-level malloc file · 501a9266
由 Rich Felker 提交于 6月 10, 2020
```
it's not part of the malloc implementation but glue with musl dynamic
linker.
```
501a9266

switch to a common calloc implementation · 28f64fa6

由 Rich Felker 提交于 6月 10, 2020

abstractly, calloc is completely malloc-implementation-independent;
it's malloc followed by memset, or as we do it, a "conditional memset"
that avoids touching fresh zero pages.

previously, calloc was kept separate for the bump allocator, which can
always skip memset, and the version of calloc provided with the full
malloc conditionally skipped the clearing for large direct-mmapped
allocations. the latter is a moderately attractive optimization, and
can be added back if needed. however, further consideration to make it
correct under malloc replacement would be needed.

commit b4b1e103 documented the
contract for malloc replacement as allowing omission of calloc, and
indeed that worked for dynamic linking, but for static linking it was
possible to get the non-clearing definition from the bump allocator;
if not for that, it would have been a link error trying to pull in
malloc.o.

the conditional-clearing code for the new common calloc is taken from
mal0_clear in oldmalloc, but drops the need to access actual page size
and just uses a fixed value of 4096. this avoids potentially needing
access to global data for the sake of an optimization that at best
marginally helps archs with offensively-large page sizes.

28f64fa6

04 6月, 2020 8 次提交

move oldmalloc to its own directory under src/malloc · 384c0131

由 Rich Felker 提交于 6月 03, 2020

this sets the stage for replacement, and makes it practical to keep
oldmalloc around as a build option for a while if that ends up being
useful.

only the files which are actually part of the implementation are
moved. memalign and posix_memalign are entirely generic. in theory
calloc could be pulled out too, but it's useful to have it tied to the
implementation so as to optimize out unnecessary memset when
implementation details make it possible to know the memory is already
clear.

384c0131

move __expand_heap into malloc.c · eaa0f249

由 Rich Felker 提交于 6月 03, 2020

this function is no longer used elsewhere, and moving it reduces the
number of source files specific to the malloc implementation.

eaa0f249

R

rename memalign source file back to its proper name · e07138b8
由 Rich Felker 提交于 6月 03, 2020

e07138b8
R

rename aligned_alloc source file back to its proper name · fc18facf
由 Rich Felker 提交于 6月 03, 2020

fc18facf

reverse dependency order of memalign and aligned_alloc · d1e6fdd3

由 Rich Felker 提交于 6月 03, 2020

this change eliminates the internal __memalign function and makes the
memalign and posix_memalign functions completely independent of the
malloc implementation, written portably in terms of aligned_alloc.

d1e6fdd3

rename aligned_alloc source file · de798308

由 Rich Felker 提交于 6月 03, 2020

this is the first step of swapping the name of the actual
implementation to aligned_alloc while preserving history follow.

de798308

remove stale document from malloc src directory · 96490a4a

由 Rich Felker 提交于 6月 03, 2020

this was an unfinished draft document present since the initial
check-in, that was never intended to ship in its current form. remove
it as part of reorganizing for replacement of the allocator.

96490a4a

rewrite bump allocator to fix corner cases, decouple from expand_heap · c4694f40

由 Rich Felker 提交于 6月 03, 2020

this affects the bump allocator used when static linking in programs
that don't need allocation metadata due to not using realloc, free,
etc.

commit e3bc22f1 refactored the bump
allocator to share code with __expand_heap, used by malloc, for the
purpose of fixing the case (mainly nommu) where brk doesn't work.
however, the geometric growth behavior of __expand_heap is not
actually well-suited to the bump allocator, and can produce
significant excessive memory usage. in particular, by repeatedly
requesting just over the remaining free space in the current
mmap-allocated area, the total mapped memory will be roughly double
the nominal usage. and since the main user of the no-brk mmap fallback
in the bump allocator is nommu, this excessive usage is not just
virtual address space but physical memory.

in addition, even on systems with brk, having a unified size request
to __expand_heap without knowing whether the brk or mmap backend would
get used made it so the brk could be expanded twice as far as needed.
for example, with malloc(n) and n-1 bytes available before the current
brk, the brk would be expanded by n bytes rounded up to page size,
when expansion by just one page would have sufficed.

the new implementation computes request size separately for the cases
where brk expansion is being attempted vs using mmap, and also
performs individual mmap of large allocations without moving to a new
bump area and throwing away the rest of the old one. this greatly
reduces the need for geometric area size growth and limits the extent
to which free space at the end of one bump area might be unusable for
future allocations.

as a bonus, the resulting code size is somewhat smaller than the
combined old version plus __expand_heap.

c4694f40

03 6月, 2020 2 次提交

R
move malloc_impl.h from src/internal to src/malloc · 135c94f0
由 Rich Felker 提交于 6月 02, 2020
```
this reflects that it is no longer intended for consumption outside of
the malloc implementation.
```
135c94f0

fix unbounded heap expansion race in malloc · 3e16313f

由 Rich Felker 提交于 6月 02, 2020

this has been a longstanding issue reported many times over the years,
with it becoming increasingly clear that it could be hit in practice.
under concurrent malloc and free from multiple threads, it's possible
to hit usage patterns where unbounded amounts of new memory are
obtained via brk/mmap despite the total nominal usage being small and
bounded.

the underlying cause is that, as a fundamental consequence of keeping
locking as fine-grained as possible, the state where free has unbinned
an already-free chunk to merge it with a newly-freed one, but has not
yet re-binned the combined chunk, is exposed to other threads. this is
bad even with small chunks, and leads to suboptimal use of memory, but
where it really blows up is where the already-freed chunk in question
is the large free region "at the top of the heap". in this situation,
other threads momentarily see a state of having almost no free memory,
and conclude that they need to obtain more.

as far as I can tell there is no fix for this that does not harm
performance. the fix made here forces all split/merge of free chunks
to take place under a single lock, which also takes the place of the
old free_lock, being held at least momentarily at the time of free to
determine whether there are neighboring free chunks that need merging.

as a consequence, the pretrim, alloc_fwd, and alloc_rev operations no
longer make sense and are deleted. simplified merging now takes place
inline in free (__bin_chunk) and realloc.

as commented in the source, holding the split_merge_lock precludes any
chunk transition from in-use to free state. for the most part, it also
precludes change to chunk header sizes. however, __memalign may still
modify the sizes of an in-use chunk to split it into two in-use
chunks. arguably this should require holding the split_merge_lock, but
that would necessitate refactoring to expose it externally, which is a
mess. and it turns out not to be necessary, at least assuming the
existing sloppy memory model malloc has been using, because if free
(__bin_chunk) or realloc sees any unsynchronized change to the size,
it will also see the in-use bit being set, and thereby can't do
anything with the neighboring chunk that changed size.

3e16313f

23 5月, 2020 2 次提交

restore lock-skipping for processes that return to single-threaded state · 8d81ba8c

由 Rich Felker 提交于 5月 22, 2020

the design used here relies on the barrier provided by the first lock
operation after the process returns to single-threaded state to
synchronize with actions by the last thread that exited. by storing
the intent to change modes in the same object used to detect whether
locking is needed, it's possible to avoid an extra (possibly costly)
memory load after the lock is taken.

8d81ba8c

don't use libc.threads_minus_1 as relaxed atomic for skipping locks · e01b5939

由 Rich Felker 提交于 5月 21, 2020

after all but the last thread exits, the next thread to observe
libc.threads_minus_1==0 and conclude that it can skip locking fails to
synchronize with any changes to memory that were made by the
last-exiting thread. this can produce data races.

on some archs, at least x86, memory synchronization is unlikely to be
a problem; however, with the inline locks in malloc, skipping the lock
also eliminated the compiler barrier, and caused code that needed to
re-check chunk in-use bits after obtaining the lock to reuse a stale
value, possibly from before the process became single-threaded. this
in turn produced corruption of the heap state.

some uses of libc.threads_minus_1 remain, especially for allocation of
new TLS in the dynamic linker; otherwise, it could be removed
entirely. it's made non-volatile to reflect that the remaining
accesses are only made under lock on the thread list.

instead of libc.threads_minus_1, libc.threaded is now used for
skipping locks. the difference is that libc.threaded is permanently
true once an additional thread has been created. this will produce
some performance regression in processes that are mostly
single-threaded but occasionally creating threads. in the future it
may be possible to bring back the full lock-skipping, but more care
needs to be taken to produce a safe design.

e01b5939

13 9月, 2018 6 次提交

split internal lock API out of libc.h, creating lock.h · 5f12ffe1

由 Rich Felker 提交于 9月 12, 2018

this further reduces the number of source files which need to include
libc.h and thereby be potentially exposed to libc global state and
internals.

this will also facilitate further improvements like adding an inline
fast-path, if we want to do so later.

5f12ffe1

reduce spurious inclusion of libc.h · 5ce37379

由 Rich Felker 提交于 9月 12, 2018

libc.h was intended to be a header for access to global libc state and
related interfaces, but ended up included all over the place because
it was the way to get the weak_alias macro. most of the inclusions
removed here are places where weak_alias was needed. a few were
recently introduced for hidden. some go all the way back to when
libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented)
cancellation points had to include it.

remaining spurious users are mostly callers of the LOCK/UNLOCK macros
and files that use the LFS64 macro to define the awful *64 aliases.

in a few places, new inclusion of libc.h is added because several
internal headers no longer implicitly include libc.h.

declarations for __lockfile and __unlockfile are moved from libc.h to
stdio_impl.h so that the latter does not need libc.h. putting them in
libc.h made no sense at all, since the macros in stdio_impl.h are
needed to use them correctly anyway.

5ce37379

R

hide dependency-triggering pointer object in malloc_usable_size.c · 239c1556
由 Rich Felker 提交于 9月 11, 2018

239c1556
R

rework malloc_usable_size to use malloc_impl.h · ef8d45d6
由 Rich Felker 提交于 9月 11, 2018

ef8d45d6

move __memalign declaration to malloc_impl.h · b07a5d66

由 Rich Felker 提交于 9月 10, 2018

the malloc-implementation-private header is the only right place for
this, because, being in the reserved namespace, __memalign is not
interposable and thus not valid to use anywhere else. anything outside
of the malloc implementation must call an appropriate-namespace public
function (aligned_alloc or posix_memalign).

b07a5d66

R

move declarations for malloc internals to malloc_impl.h · 55a1c9c8
由 Rich Felker 提交于 9月 06, 2018

55a1c9c8

20 4月, 2018 5 次提交

reintroduce hardening against partially-replaced allocator · b4b1e103

由 Rich Felker 提交于 4月 19, 2018

commit 618b18c7 removed the previous
detection and hardening since it was incorrect. commit
72141795 already handled all that
remained for hardening the static-linked case. in the dynamic-linked
case, have the dynamic linker check whether malloc was replaced and
make that information available.

with these changes, the properties documented in commit
c9f415d7 are restored: if calloc is
not provided, it will behave as malloc+memset, and any of the
memalign-family functions not provided will fail with ENOMEM.

b4b1e103

return chunks split off by memalign using __bin_chunk instead of free · 72141795

由 Rich Felker 提交于 4月 19, 2018

this change serves multiple purposes:

1. it ensures that static linking of memalign-family functions will
pull in the system malloc implementation, thereby causing link errors
if an attempt is made to link the system memalign functions with a
replacement malloc (incomplete allocator replacement).

2. it eliminates calls to free that are unpaired with allocations,
which are confusing when setting breakpoints or tracing execution.

as a bonus, making __bin_chunk external may discourage aggressive and
unnecessary inlining of it.

72141795

using malloc implementation types/macros/idioms for memalign · 3c2cbbe7

由 Rich Felker 提交于 4月 19, 2018

the generated code should be mostly unchanged, except for explicit use
of C_INUSE in place of copying the low bits from existing chunk
headers/footers.

these changes also remove mild UB due to dubious arithmetic on
pointers into imaginary size_t[] arrays.

3c2cbbe7

R

move malloc implementation types and macros to an internal header · 23389b19
由 Rich Felker 提交于 4月 19, 2018

23389b19

revert detection of partially-replaced allocator · 618b18c7

由 Rich Felker 提交于 4月 19, 2018

commit c9f415d7 included checks to
make calloc fallback to memset if used with a replaced malloc that
didn't also replace calloc, and the memalign family fail if free has
been replaced. however, the checks gave false positives for
replacement whenever malloc or free resolved to a PLT entry in the
main program.

for now, disable the checks so as not to leave libc in a broken state.
this means that the properties documented in the above commit are no
longer satisfied; failure to replace calloc and the memalign family
along with malloc is unsafe if they are ever called.

the calloc checks were correct but useless for static linking. in both
cases (simple or full malloc), calloc and malloc are in a source file
together, so replacement of one but not the other would give linking
errors. the memalign-family check was useful for static linking, but
broken for dynamic as described above, and can be replaced with a
better link-time check.

618b18c7

19 4月, 2018 1 次提交

allow interposition/replacement of allocator (malloc) · c9f415d7

由 Rich Felker 提交于 4月 17, 2018

replacement is subject to conditions on the replacement functions.
they may only call functions which are async-signal-safe, as specified
either by POSIX or as an implementation-defined extension. if any
allocator functions are replaced, at least malloc, realloc, and free
must be provided. if calloc is not provided, it will behave as
malloc+memset. any of the memalign-family functions not provided will
fail with ENOMEM.

in order to implement the above properties, calloc and __memalign
check that they are using their own malloc or free, respectively.
choice to check malloc or free is based on considerations of
supporting __simple_malloc. in order to make this work, calloc is
split into separate versions for __simple_malloc and full malloc;
commit ba819787 already did most of
the split anyway, and completing it saves an extra call frame.

previously, use of -Bsymbolic-functions made dynamic interposition
impossible. now, we are using an explicit dynamic-list, so add
allocator functions to the list. most are not referenced anyway, but
all are added for completeness.

c9f415d7

18 4月, 2018 4 次提交
- R
  remove unused __brk function/source file · 50202754
  由 Rich Felker 提交于 4月 17, 2018
```
commit e3bc22f1 removed all references
to __brk.
```
  50202754
- R
  
  comment __malloc_donate overflow logic · 14032c30
  由 Rich Felker 提交于 4月 17, 2018
  
  14032c30
- A
  ldso, malloc: implement reclaim_gaps via __malloc_donate · ce7ae11a
  由 Alexander Monakov 提交于 4月 16, 2018
```
Split 'free' into unmap_chunk and bin_chunk, use the latter to introduce
__malloc_donate and use it in reclaim_gaps instead of calling 'free'.
```
  ce7ae11a
- A
  malloc: fix an over-allocation bug · d889cc34
  由 Alexander Monakov 提交于 4月 16, 2018
```
Fix an instance where realloc code would overallocate by OVERHEAD bytes
amount. Manually arrange for reuse of memcpy-free-return exit sequence.
```
  d889cc34
12 4月, 2018 1 次提交

optimize malloc0 · 424eab22

由 Alexander Monakov 提交于 12月 16, 2017

Implementation of __malloc0 in malloc.c takes care to preserve zero
pages by overwriting only non-zero data. However, malloc must have
already modified auxiliary heap data just before and beyond the
allocated region, so we know that edge pages need not be preserved.

For allocations smaller than one page, pass them immediately to memset.
Otherwise, use memset to handle partial pages at the head and tail of
the allocation, and scan complete pages in the interior. Optimize the
scanning loop by processing 16 bytes per iteration and handling rest of
page via memset as soon as a non-zero byte is found.

424eab22

10 1月, 2018 1 次提交
- J
  revise the definition of multiple basic locks in the code · 32482f61
  由 Jens Gustedt 提交于 1月 03, 2018
```
In all cases this is just a change from two volatile int to one.
```
  32482f61
05 7月, 2017 1 次提交
- A
  
  fix undefined behavior in free · 60ab365c
  由 Alexander Monakov 提交于 6月 27, 2017
  
  60ab365c

OpenHarmony / Third Party Musl 大约 1 年 前同步成功

OpenHarmony / Third Party Musl
大约 1 年前同步成功