1. 18 8月, 2020 1 次提交
    • R
      add gettid function · d49cf075
      Rich Felker 提交于
      this is a prerequisite for addition of other interfaces that use
      kernel tids, including futex and SIGEV_THREAD_ID.
      
      there is some ambiguity as to whether the semantic return type should
      be int or pid_t. either way, futex API imposes a contract that the
      values fit in int (excluding some upper reserved bits). glibc used
      pid_t, so in the interest of not having gratuitous mismatch (the
      underlying types are the same anyway), pid_t is used here as well.
      
      while conceptually this is a syscall, the copy stored in the thread
      structure is always valid in all contexts where it's valid to call
      libc functions, so it's used to avoid the syscall.
      d49cf075
  2. 13 8月, 2020 2 次提交
    • S
      aarch64: fix setjmp return value · 22359b54
      Szabolcs Nagy 提交于
      longjmp should set the return value of setjmp, but 64bit
      registers were used for the 0 check while the type is int.
      
      use the code that gcc generates for return val ? val : 1;
      22359b54
    • A
      setjmp: optimize longjmp prologues · 4554f155
      Alexander Monakov 提交于
      Use a branchless sequence that is one byte shorter on 64-bit, same size
      on 32-bit. Thanks to Pete Cawley for suggesting this variant.
      4554f155
  3. 12 8月, 2020 3 次提交
    • A
      setjmp: optimize x86 longjmp epilogues · 59b64ff6
      Alexander Monakov 提交于
      59b64ff6
    • A
      c6a6fe4c
    • A
      setjmp: fix x86-64 longjmp argument adjustment · 21431a0e
      Alexander Monakov 提交于
      longjmp 'val' argument is an int, but the assembly is referencing 64-bit
      registers as if the argument was a long, or the caller was responsible
      for extending the argument. Though the psABI is not clear on this, the
      interpretation in GCC is that high bits may be arbitrary and the callee
      is responsible for sign/zero-extending the value as needed (likewise for
      return values: callers must anticipate that high bits may be garbage).
      
      Therefore testing %rax is a functional bug: setjmp would wrongly return
      zero if longjmp was called with val==0, but high bits of %rsi happened
      to be non-zero.
      
      Rewrite the prologue to refer to 32-bit registers. In passing, change
      'test' to use %rsi, as there's no advantage to using %rax and the new
      form is cheaper on processors that do not perform move elimination.
      21431a0e
  4. 09 8月, 2020 1 次提交
    • R
      prefer new socket syscalls, fallback to SYS_socketcall only if needed · c2feda4e
      Rich Felker 提交于
      a number of users performing seccomp filtering have requested use of
      the new individual syscall numbers for socket syscalls, rather than
      the legacy multiplexed socketcall, since the latter has the arguments
      all in memory where they can't participate in filter decisions.
      
      previously, some archs used the multiplexed socketcall if it was
      historically all that was available, while other archs used the
      separate syscalls. the intent was that the latter set only include
      archs that have "always" had separate socket syscalls, at least going
      back to linux 2.6.0. however, at least powerpc, powerpc64, and sh were
      wrongly included in this set, and thus socket operations completely
      failed on old kernels for these archs.
      
      with the changes made here, the separate syscalls are always
      preferred, but fallback code is compiled for archs that also define
      SYS_socketcall. two such archs, mips (plain o32) and microblaze,
      define SYS_socketcall despite never having needed it, so it's now
      undefined by their versions of syscall_arch.h to prevent inclusion of
      useless fallback code.
      
      some archs, where the separate syscalls were only added after the
      addition of SYS_accept4, lack SYS_accept. because socket calls are
      always made with zeros in the unused argument positions, it suffices
      to just use SYS_accept4 to provide a definition of SYS_accept, and
      this is done to make happy the macro machinery that concatenates the
      socket call name onto __SC_ and SYS_.
      c2feda4e
  5. 06 8月, 2020 5 次提交
    • S
      math: new software sqrtl · 933f8e72
      Szabolcs Nagy 提交于
      same approach as in sqrt.
      
      sqrtl was broken on aarch64, riscv64 and s390x targets because
      of missing quad precision support and on m68k-sf because of
      missing ld80 sqrtl.
      
      this implementation is written for quad precision and then
      edited to make it work for both m68k and x86 style ld80 formats
      too, but it is not expected to be optimal for them.
      
      note: using fp instructions for the initial estimate when such
      instructions are available (e.g. double prec sqrt or rsqrt) is
      avoided because of fenv correctness.
      933f8e72
    • S
      math: add __math_invalidl · 4f893997
      Szabolcs Nagy 提交于
      for targets where long double is different from double.
      4f893997
    • S
      math: new software sqrtf · b1756ec8
      Szabolcs Nagy 提交于
      same method as in sqrt, this was tested on all inputs against
      an sqrtf instruction. (the only difference found was that x86
      sqrtf does not signal the x86 specific input-denormal exception
      on negative subnormal inputs while the software sqrtf does,
      this is fine as it was designed for ieee754 exceptions only.)
      
      there is known faster method:
      "Computing Floating-Point Square Roots via Bivariate Polynomial Evaluation"
      that computes sqrtf directly via pipelined polynomial evaluation
      which allows more parallelism, but the design does not generalize
      easily to higher precisions.
      b1756ec8
    • S
      math: new software sqrt · 97e9b73d
      Szabolcs Nagy 提交于
      approximate 1/sqrt(x) and sqrt(x) with goldschmidt iterations.
      this is known to be a fast method for computing sqrt, but it is
      tricky to get right, so added detailed comments.
      
      use a lookup table for the initial estimate, this adds 256bytes
      rodata but it can be shared between sqrt, sqrtf and sqrtl.
      this saves one iteration compared to a linear estimate.
      
      this is for soft float targets, but it supports fenv by using a
      floating-point operation to get the final result.  the result
      is correctly rounded in all rounding modes.  if fenv support is
      turned off then the nearest rounded result is computed and
      inexact exception is not signaled.
      
      assumes fast 32bit integer arithmetics and 32 to 64bit mul.
      97e9b73d
    • R
      in hosts file lookups, honor first canonical name regardless of family · f1198ea3
      Rich Felker 提交于
      prior to this change, the canonical name came from the first hosts
      file line matching the requested family, so the canonical name for a
      given hostname could differ depending on whether it was requested with
      AF_UNSPEC or a particular family (AF_INET or AF_INET6). now, the
      canonical name is deterministically the first one to appear with the
      requested name as an alias.
      f1198ea3
  6. 05 8月, 2020 1 次提交
  7. 04 8月, 2020 1 次提交
  8. 03 8月, 2020 1 次提交
    • R
      add m68k sqrtl using native instruction · 845e4f66
      Rich Felker 提交于
      this is actually a functional fix at present, since the C sqrtl does
      not support ld80 and just wraps double sqrt. once that's fixed it will
      just be an optimization.
      845e4f66
  9. 25 7月, 2020 1 次提交
  10. 07 7月, 2020 2 次提交
    • R
      fix async-cancel-safety of pthread_cancel · 52ee0dd6
      Rich Felker 提交于
      the previous commit addressing async-signal-safety issues around
      pthread_kill did not fully fix pthread_cancel, which is also required
      (albeit rather irrationally) to be async-cancel-safe.
      
      without blocking implementation-internal signals, it's possible that,
      when async cancellation is enabled, a cancel signal sent by another
      thread interrupts pthread_kill while the killlock for a targeted
      thread is held. as a result, the calling thread will terminate due to
      cancellation without ever unlocking the targeted thread's killlock,
      and thus the targeted thread will be unable to exit.
      52ee0dd6
    • R
      make thread killlock async-signal-safe for pthread_kill · 7cc9496a
      Rich Felker 提交于
      pthread_kill is required to be AS-safe. that requirement can't be met
      if the target thread's killlock can be taken in contexts where
      application-installed signal handlers can run.
      
      block signals around use of this lock in all pthread_* functions which
      target a tid, and reorder blocking/unblocking of signals in
      pthread_exit so that they're blocked whenever the killlock is held.
      7cc9496a
  11. 06 7月, 2020 1 次提交
    • R
      fix C implementation of a_clz_32 · 0a005f49
      Rich Felker 提交于
      this broke mallocng size_to_class on archs without a native
      implementation of a_clz_32. the incorrect logic seems to have been
      something i derived from a related but distinct log2-type operation.
      with the change made here, it passes an exhaustive test.
      
      as this function is new and presently only used by mallocng, no other
      functionality was affected.
      0a005f49
  12. 02 7月, 2020 1 次提交
  13. 01 7月, 2020 2 次提交
  14. 30 6月, 2020 2 次提交
    • R
      import mallocng · 503bd397
      Rich Felker 提交于
      the files added come from the mallocng development repo, commit
      2ed58817cca5bc055974e5a0e43c280d106e696b. they comprise a new malloc
      implementation, developed over the past 9 months, to replace the old
      allocator (since dubbed "oldmalloc") with one that retains low code
      size and minimal baseline memory overhead while avoiding fundamental
      flaws in oldmalloc and making significant enhancements. these include
      highly controlled fragmentation, fine-grained ability to return memory
      to the system when freed, and strong hardening against dynamic memory
      usage errors by the caller.
      
      internally, mallocng derives most of these properties from tightly
      structuring memory, creating space for allocations as uniform-sized
      slots within individually mmapped (and individually freeable)
      allocation groups. smaller-than-pagesize groups are created within
      slots of larger ones. minimal group size is very small, and larger
      sizes (in geometric progression) only come into play when usage is
      high.
      
      all data necessary for maintaining consistency of the allocator state
      is tracked in out-of-band metadata, reachable via a validated path
      from minimal in-band metadata. all pointers passed (to free, etc.) are
      validated before any stores to memory take place. early reuse of freed
      slots is avoided via approximate LRU order of freed slots. further
      hardening against use-after-free and double-free, even in the case
      where the freed slot has been reused, is made by cycling the offset
      within the slot at which the allocation is placed; this is possible
      whenever the slot size is larger than the requested allocation.
      503bd397
    • R
      add glue code for mallocng merge · 785752a5
      Rich Felker 提交于
      this includes both an implementation of reclaimed-gap donation from
      ldso and a version of mallocng's glue.h with namespace-safe linkage to
      underlying syscalls, integration with AT_RANDOM initialization, and
      internal locking that's optimized out when the process is
      single-threaded.
      785752a5
  15. 27 6月, 2020 1 次提交
    • R
      add optimized aarch64 memcpy and memset · fdf8b2ad
      Rich Felker 提交于
      these are based on the ARM optimized-routines repository v20.05
      (ef907c7a799a), with macro dependencies flattened out and memmove code
      removed from memcpy. this change is somewhat unfortunate since having
      the branch for memmove support in the large n case of memcpy is the
      performance-optimal and size-optimal way to do both, but it makes
      memcpy alone (static-linked) about 40% larger and suggests a policy
      that use of memcpy as memmove is supported.
      
      tabs used for alignment have also been replaced with spaces.
      fdf8b2ad
  16. 26 6月, 2020 1 次提交
  17. 21 6月, 2020 1 次提交
    • R
      clear need_locks in child after fork · 8ed2bd8b
      Rich Felker 提交于
      the child is single-threaded, but may still need to synchronize with
      last changes made to memory by another thread in the parent, so set
      need_locks to -1 whereby the next lock-taker will drop to 0 and
      prevent further barriers/locking.
      8ed2bd8b
  18. 16 6月, 2020 3 次提交
    • R
      only use memcpy realloc to shrink if an exact-sized free chunk exists · fca7428c
      Rich Felker 提交于
      otherwise, shrink in-place. as explained in the description of commit
      3e16313f, the split here is valid
      without holding split_merge_lock because all chunks involved are in
      the in-use state.
      fca7428c
    • R
      fix memset overflow in oldmalloc race fix overhaul · cb5babdc
      Rich Felker 提交于
      commit 3e16313f introduced this bug by
      making the copy case reachable with n (new size) smaller than n0
      (original size). this was left as the only way of shrinking an
      allocation because it reduces fragmentation if a free chunk of the
      appropriate size is available. when that's not the case, another
      approach may be better, but any such improvement would be independent
      of fixing this bug.
      cb5babdc
    • R
      fix invalid use of access function in nftw · 4bd22b8f
      Rich Felker 提交于
      access always computes result with real ids not effective ones, so it
      is not a valid means of determining whether the directory is readable.
      instead, attempt to open it before reporting whether it's readable,
      and then use fdopendir rather than opendir to open and read the
      entries.
      
      effort is made here to keep fd_limit behavior the same as before even
      if it was not correct.
      4bd22b8f
  19. 11 6月, 2020 6 次提交
    • R
      add fallback a_clz_32 implementation · ca36573e
      Rich Felker 提交于
      some archs already have a_clz_32, used to provide a_ctz_32, but it
      hasn't been mandatory because it's not used anywhere yet. mallocng
      will need it, however, so add it now. it should probably be optimized
      better, but doesn't seem to make a difference at present.
      ca36573e
    • R
      only disable aligned_alloc if malloc was replaced but it wasn't · 1fc67fc1
      Rich Felker 提交于
      it both malloc and aligned_alloc have been replaced but the internal
      aligned_alloc still gets called, the replacement is a wrapper of some
      sort. it's not clear if this usage should be officially supported, but
      it's at least a plausibly interesting debugging usage, and easy to do.
      it should not be relied upon unless it's documented as supported at
      some later time.
      1fc67fc1
    • R
      have ldso track replacement of aligned_alloc · e9f4fd11
      Rich Felker 提交于
      this is in preparation for improving behavior of malloc interposition.
      e9f4fd11
    • R
      reintroduce calloc elison of memset for direct-mmapped allocations · 25cef5c5
      Rich Felker 提交于
      a new weak predicate function replacable by the malloc implementation,
      __malloc_allzerop, is introduced. by default it's always false; the
      default version will be used when static linking if the bump allocator
      was used (in which case performance doesn't matter) or if malloc was
      replaced by the application. only if the real internal malloc is
      linked (always the case with dynamic linking) does the real version
      get used.
      
      if malloc was replaced dynamically, as indicated by __malloc_replaced,
      the predicate function is ignored and conditional-memset is always
      performed.
      25cef5c5
    • R
      move __malloc_replaced to a top-level malloc file · 501a9266
      Rich Felker 提交于
      it's not part of the malloc implementation but glue with musl dynamic
      linker.
      501a9266
    • R
      switch to a common calloc implementation · 28f64fa6
      Rich Felker 提交于
      abstractly, calloc is completely malloc-implementation-independent;
      it's malloc followed by memset, or as we do it, a "conditional memset"
      that avoids touching fresh zero pages.
      
      previously, calloc was kept separate for the bump allocator, which can
      always skip memset, and the version of calloc provided with the full
      malloc conditionally skipped the clearing for large direct-mmapped
      allocations. the latter is a moderately attractive optimization, and
      can be added back if needed. however, further consideration to make it
      correct under malloc replacement would be needed.
      
      commit b4b1e103 documented the
      contract for malloc replacement as allowing omission of calloc, and
      indeed that worked for dynamic linking, but for static linking it was
      possible to get the non-clearing definition from the bump allocator;
      if not for that, it would have been a link error trying to pull in
      malloc.o.
      
      the conditional-clearing code for the new common calloc is taken from
      mal0_clear in oldmalloc, but drops the need to access actual page size
      and just uses a fixed value of 4096. this avoids potentially needing
      access to global data for the sake of an optimization that at best
      marginally helps archs with offensively-large page sizes.
      28f64fa6
  20. 04 6月, 2020 4 次提交