1. 05 8月, 2020 1 次提交
  2. 03 8月, 2020 1 次提交
    • R
      add m68k sqrtl using native instruction · 845e4f66
      Rich Felker 提交于
      this is actually a functional fix at present, since the C sqrtl does
      not support ld80 and just wraps double sqrt. once that's fixed it will
      just be an optimization.
      845e4f66
  3. 25 7月, 2020 1 次提交
  4. 07 7月, 2020 2 次提交
    • R
      fix async-cancel-safety of pthread_cancel · 52ee0dd6
      Rich Felker 提交于
      the previous commit addressing async-signal-safety issues around
      pthread_kill did not fully fix pthread_cancel, which is also required
      (albeit rather irrationally) to be async-cancel-safe.
      
      without blocking implementation-internal signals, it's possible that,
      when async cancellation is enabled, a cancel signal sent by another
      thread interrupts pthread_kill while the killlock for a targeted
      thread is held. as a result, the calling thread will terminate due to
      cancellation without ever unlocking the targeted thread's killlock,
      and thus the targeted thread will be unable to exit.
      52ee0dd6
    • R
      make thread killlock async-signal-safe for pthread_kill · 7cc9496a
      Rich Felker 提交于
      pthread_kill is required to be AS-safe. that requirement can't be met
      if the target thread's killlock can be taken in contexts where
      application-installed signal handlers can run.
      
      block signals around use of this lock in all pthread_* functions which
      target a tid, and reorder blocking/unblocking of signals in
      pthread_exit so that they're blocked whenever the killlock is held.
      7cc9496a
  5. 06 7月, 2020 1 次提交
    • R
      fix C implementation of a_clz_32 · 0a005f49
      Rich Felker 提交于
      this broke mallocng size_to_class on archs without a native
      implementation of a_clz_32. the incorrect logic seems to have been
      something i derived from a related but distinct log2-type operation.
      with the change made here, it passes an exhaustive test.
      
      as this function is new and presently only used by mallocng, no other
      functionality was affected.
      0a005f49
  6. 02 7月, 2020 1 次提交
  7. 30 6月, 2020 2 次提交
    • R
      import mallocng · 503bd397
      Rich Felker 提交于
      the files added come from the mallocng development repo, commit
      2ed58817cca5bc055974e5a0e43c280d106e696b. they comprise a new malloc
      implementation, developed over the past 9 months, to replace the old
      allocator (since dubbed "oldmalloc") with one that retains low code
      size and minimal baseline memory overhead while avoiding fundamental
      flaws in oldmalloc and making significant enhancements. these include
      highly controlled fragmentation, fine-grained ability to return memory
      to the system when freed, and strong hardening against dynamic memory
      usage errors by the caller.
      
      internally, mallocng derives most of these properties from tightly
      structuring memory, creating space for allocations as uniform-sized
      slots within individually mmapped (and individually freeable)
      allocation groups. smaller-than-pagesize groups are created within
      slots of larger ones. minimal group size is very small, and larger
      sizes (in geometric progression) only come into play when usage is
      high.
      
      all data necessary for maintaining consistency of the allocator state
      is tracked in out-of-band metadata, reachable via a validated path
      from minimal in-band metadata. all pointers passed (to free, etc.) are
      validated before any stores to memory take place. early reuse of freed
      slots is avoided via approximate LRU order of freed slots. further
      hardening against use-after-free and double-free, even in the case
      where the freed slot has been reused, is made by cycling the offset
      within the slot at which the allocation is placed; this is possible
      whenever the slot size is larger than the requested allocation.
      503bd397
    • R
      add glue code for mallocng merge · 785752a5
      Rich Felker 提交于
      this includes both an implementation of reclaimed-gap donation from
      ldso and a version of mallocng's glue.h with namespace-safe linkage to
      underlying syscalls, integration with AT_RANDOM initialization, and
      internal locking that's optimized out when the process is
      single-threaded.
      785752a5
  8. 27 6月, 2020 1 次提交
    • R
      add optimized aarch64 memcpy and memset · fdf8b2ad
      Rich Felker 提交于
      these are based on the ARM optimized-routines repository v20.05
      (ef907c7a799a), with macro dependencies flattened out and memmove code
      removed from memcpy. this change is somewhat unfortunate since having
      the branch for memmove support in the large n case of memcpy is the
      performance-optimal and size-optimal way to do both, but it makes
      memcpy alone (static-linked) about 40% larger and suggests a policy
      that use of memcpy as memmove is supported.
      
      tabs used for alignment have also been replaced with spaces.
      fdf8b2ad
  9. 26 6月, 2020 1 次提交
  10. 21 6月, 2020 1 次提交
    • R
      clear need_locks in child after fork · 8ed2bd8b
      Rich Felker 提交于
      the child is single-threaded, but may still need to synchronize with
      last changes made to memory by another thread in the parent, so set
      need_locks to -1 whereby the next lock-taker will drop to 0 and
      prevent further barriers/locking.
      8ed2bd8b
  11. 16 6月, 2020 3 次提交
    • R
      only use memcpy realloc to shrink if an exact-sized free chunk exists · fca7428c
      Rich Felker 提交于
      otherwise, shrink in-place. as explained in the description of commit
      3e16313f, the split here is valid
      without holding split_merge_lock because all chunks involved are in
      the in-use state.
      fca7428c
    • R
      fix memset overflow in oldmalloc race fix overhaul · cb5babdc
      Rich Felker 提交于
      commit 3e16313f introduced this bug by
      making the copy case reachable with n (new size) smaller than n0
      (original size). this was left as the only way of shrinking an
      allocation because it reduces fragmentation if a free chunk of the
      appropriate size is available. when that's not the case, another
      approach may be better, but any such improvement would be independent
      of fixing this bug.
      cb5babdc
    • R
      fix invalid use of access function in nftw · 4bd22b8f
      Rich Felker 提交于
      access always computes result with real ids not effective ones, so it
      is not a valid means of determining whether the directory is readable.
      instead, attempt to open it before reporting whether it's readable,
      and then use fdopendir rather than opendir to open and read the
      entries.
      
      effort is made here to keep fd_limit behavior the same as before even
      if it was not correct.
      4bd22b8f
  12. 11 6月, 2020 6 次提交
    • R
      add fallback a_clz_32 implementation · ca36573e
      Rich Felker 提交于
      some archs already have a_clz_32, used to provide a_ctz_32, but it
      hasn't been mandatory because it's not used anywhere yet. mallocng
      will need it, however, so add it now. it should probably be optimized
      better, but doesn't seem to make a difference at present.
      ca36573e
    • R
      only disable aligned_alloc if malloc was replaced but it wasn't · 1fc67fc1
      Rich Felker 提交于
      it both malloc and aligned_alloc have been replaced but the internal
      aligned_alloc still gets called, the replacement is a wrapper of some
      sort. it's not clear if this usage should be officially supported, but
      it's at least a plausibly interesting debugging usage, and easy to do.
      it should not be relied upon unless it's documented as supported at
      some later time.
      1fc67fc1
    • R
      have ldso track replacement of aligned_alloc · e9f4fd11
      Rich Felker 提交于
      this is in preparation for improving behavior of malloc interposition.
      e9f4fd11
    • R
      reintroduce calloc elison of memset for direct-mmapped allocations · 25cef5c5
      Rich Felker 提交于
      a new weak predicate function replacable by the malloc implementation,
      __malloc_allzerop, is introduced. by default it's always false; the
      default version will be used when static linking if the bump allocator
      was used (in which case performance doesn't matter) or if malloc was
      replaced by the application. only if the real internal malloc is
      linked (always the case with dynamic linking) does the real version
      get used.
      
      if malloc was replaced dynamically, as indicated by __malloc_replaced,
      the predicate function is ignored and conditional-memset is always
      performed.
      25cef5c5
    • R
      move __malloc_replaced to a top-level malloc file · 501a9266
      Rich Felker 提交于
      it's not part of the malloc implementation but glue with musl dynamic
      linker.
      501a9266
    • R
      switch to a common calloc implementation · 28f64fa6
      Rich Felker 提交于
      abstractly, calloc is completely malloc-implementation-independent;
      it's malloc followed by memset, or as we do it, a "conditional memset"
      that avoids touching fresh zero pages.
      
      previously, calloc was kept separate for the bump allocator, which can
      always skip memset, and the version of calloc provided with the full
      malloc conditionally skipped the clearing for large direct-mmapped
      allocations. the latter is a moderately attractive optimization, and
      can be added back if needed. however, further consideration to make it
      correct under malloc replacement would be needed.
      
      commit b4b1e103 documented the
      contract for malloc replacement as allowing omission of calloc, and
      indeed that worked for dynamic linking, but for static linking it was
      possible to get the non-clearing definition from the bump allocator;
      if not for that, it would have been a link error trying to pull in
      malloc.o.
      
      the conditional-clearing code for the new common calloc is taken from
      mal0_clear in oldmalloc, but drops the need to access actual page size
      and just uses a fixed value of 4096. this avoids potentially needing
      access to global data for the sake of an optimization that at best
      marginally helps archs with offensively-large page sizes.
      28f64fa6
  13. 04 6月, 2020 8 次提交
    • R
      move oldmalloc to its own directory under src/malloc · 384c0131
      Rich Felker 提交于
      this sets the stage for replacement, and makes it practical to keep
      oldmalloc around as a build option for a while if that ends up being
      useful.
      
      only the files which are actually part of the implementation are
      moved. memalign and posix_memalign are entirely generic. in theory
      calloc could be pulled out too, but it's useful to have it tied to the
      implementation so as to optimize out unnecessary memset when
      implementation details make it possible to know the memory is already
      clear.
      384c0131
    • R
      move __expand_heap into malloc.c · eaa0f249
      Rich Felker 提交于
      this function is no longer used elsewhere, and moving it reduces the
      number of source files specific to the malloc implementation.
      eaa0f249
    • R
      rename memalign source file back to its proper name · e07138b8
      Rich Felker 提交于
      e07138b8
    • R
      fc18facf
    • R
      reverse dependency order of memalign and aligned_alloc · d1e6fdd3
      Rich Felker 提交于
      this change eliminates the internal __memalign function and makes the
      memalign and posix_memalign functions completely independent of the
      malloc implementation, written portably in terms of aligned_alloc.
      d1e6fdd3
    • R
      rename aligned_alloc source file · de798308
      Rich Felker 提交于
      this is the first step of swapping the name of the actual
      implementation to aligned_alloc while preserving history follow.
      de798308
    • R
      remove stale document from malloc src directory · 96490a4a
      Rich Felker 提交于
      this was an unfinished draft document present since the initial
      check-in, that was never intended to ship in its current form. remove
      it as part of reorganizing for replacement of the allocator.
      96490a4a
    • R
      rewrite bump allocator to fix corner cases, decouple from expand_heap · c4694f40
      Rich Felker 提交于
      this affects the bump allocator used when static linking in programs
      that don't need allocation metadata due to not using realloc, free,
      etc.
      
      commit e3bc22f1 refactored the bump
      allocator to share code with __expand_heap, used by malloc, for the
      purpose of fixing the case (mainly nommu) where brk doesn't work.
      however, the geometric growth behavior of __expand_heap is not
      actually well-suited to the bump allocator, and can produce
      significant excessive memory usage. in particular, by repeatedly
      requesting just over the remaining free space in the current
      mmap-allocated area, the total mapped memory will be roughly double
      the nominal usage. and since the main user of the no-brk mmap fallback
      in the bump allocator is nommu, this excessive usage is not just
      virtual address space but physical memory.
      
      in addition, even on systems with brk, having a unified size request
      to __expand_heap without knowing whether the brk or mmap backend would
      get used made it so the brk could be expanded twice as far as needed.
      for example, with malloc(n) and n-1 bytes available before the current
      brk, the brk would be expanded by n bytes rounded up to page size,
      when expansion by just one page would have sufficed.
      
      the new implementation computes request size separately for the cases
      where brk expansion is being attempted vs using mmap, and also
      performs individual mmap of large allocations without moving to a new
      bump area and throwing away the rest of the old one. this greatly
      reduces the need for geometric area size growth and limits the extent
      to which free space at the end of one bump area might be unusable for
      future allocations.
      
      as a bonus, the resulting code size is somewhat smaller than the
      combined old version plus __expand_heap.
      c4694f40
  14. 03 6月, 2020 6 次提交
    • R
      move malloc_impl.h from src/internal to src/malloc · 135c94f0
      Rich Felker 提交于
      this reflects that it is no longer intended for consumption outside of
      the malloc implementation.
      135c94f0
    • R
      move declaration of interfaces between malloc and ldso to dynlink.h · cee88b76
      Rich Felker 提交于
      this eliminates consumers of malloc_impl.h outside of the malloc
      implementation.
      cee88b76
    • R
      28be6122
    • R
      always use time64 syscall first for clock_adjtime · e0b17ef8
      Rich Felker 提交于
      clock_adjtime always returns the current clock setting in struct
      timex, so it's always possible that the time64 version is needed.
      e0b17ef8
    • R
      fix broken time64 clock_adjtime · ef51b762
      Rich Felker 提交于
      the 64-bit time code path used the wrong (time32) syscall. fortunately
      this code path is not yet taken unless attempting to set a post-Y2038
      time.
      ef51b762
    • R
      fix unbounded heap expansion race in malloc · 3e16313f
      Rich Felker 提交于
      this has been a longstanding issue reported many times over the years,
      with it becoming increasingly clear that it could be hit in practice.
      under concurrent malloc and free from multiple threads, it's possible
      to hit usage patterns where unbounded amounts of new memory are
      obtained via brk/mmap despite the total nominal usage being small and
      bounded.
      
      the underlying cause is that, as a fundamental consequence of keeping
      locking as fine-grained as possible, the state where free has unbinned
      an already-free chunk to merge it with a newly-freed one, but has not
      yet re-binned the combined chunk, is exposed to other threads. this is
      bad even with small chunks, and leads to suboptimal use of memory, but
      where it really blows up is where the already-freed chunk in question
      is the large free region "at the top of the heap". in this situation,
      other threads momentarily see a state of having almost no free memory,
      and conclude that they need to obtain more.
      
      as far as I can tell there is no fix for this that does not harm
      performance. the fix made here forces all split/merge of free chunks
      to take place under a single lock, which also takes the place of the
      old free_lock, being held at least momentarily at the time of free to
      determine whether there are neighboring free chunks that need merging.
      
      as a consequence, the pretrim, alloc_fwd, and alloc_rev operations no
      longer make sense and are deleted. simplified merging now takes place
      inline in free (__bin_chunk) and realloc.
      
      as commented in the source, holding the split_merge_lock precludes any
      chunk transition from in-use to free state. for the most part, it also
      precludes change to chunk header sizes. however, __memalign may still
      modify the sizes of an in-use chunk to split it into two in-use
      chunks. arguably this should require holding the split_merge_lock, but
      that would necessitate refactoring to expose it externally, which is a
      mess. and it turns out not to be necessary, at least assuming the
      existing sloppy memory model malloc has been using, because if free
      (__bin_chunk) or realloc sees any unsynchronized change to the size,
      it will also see the in-use bit being set, and thereby can't do
      anything with the neighboring chunk that changed size.
      3e16313f
  15. 23 5月, 2020 4 次提交
    • R
      restore lock-skipping for processes that return to single-threaded state · 8d81ba8c
      Rich Felker 提交于
      the design used here relies on the barrier provided by the first lock
      operation after the process returns to single-threaded state to
      synchronize with actions by the last thread that exited. by storing
      the intent to change modes in the same object used to detect whether
      locking is needed, it's possible to avoid an extra (possibly costly)
      memory load after the lock is taken.
      8d81ba8c
    • R
      cut down size of some libc struct members · f12888e9
      Rich Felker 提交于
      these are all flags that can be single-byte values.
      f12888e9
    • R
      don't use libc.threads_minus_1 as relaxed atomic for skipping locks · e01b5939
      Rich Felker 提交于
      after all but the last thread exits, the next thread to observe
      libc.threads_minus_1==0 and conclude that it can skip locking fails to
      synchronize with any changes to memory that were made by the
      last-exiting thread. this can produce data races.
      
      on some archs, at least x86, memory synchronization is unlikely to be
      a problem; however, with the inline locks in malloc, skipping the lock
      also eliminated the compiler barrier, and caused code that needed to
      re-check chunk in-use bits after obtaining the lock to reuse a stale
      value, possibly from before the process became single-threaded. this
      in turn produced corruption of the heap state.
      
      some uses of libc.threads_minus_1 remain, especially for allocation of
      new TLS in the dynamic linker; otherwise, it could be removed
      entirely. it's made non-volatile to reflect that the remaining
      accesses are only made under lock on the thread list.
      
      instead of libc.threads_minus_1, libc.threaded is now used for
      skipping locks. the difference is that libc.threaded is permanently
      true once an additional thread has been created. this will produce
      some performance regression in processes that are mostly
      single-threaded but occasionally creating threads. in the future it
      may be possible to bring back the full lock-skipping, but more care
      needs to be taken to produce a safe design.
      e01b5939
    • R
      reorder thread list unlink in pthread_exit after all locks · 4d5aa20a
      Rich Felker 提交于
      since the backend for LOCK() skips locking if single-threaded, it's
      unsafe to make the process appear single-threaded before the last use
      of lock.
      
      this fixes potential unsynchronized access to a linked list via
      __dl_thread_cleanup.
      4d5aa20a
  16. 22 5月, 2020 1 次提交