1. 31 8月, 2020 1 次提交
  2. 17 8月, 2020 1 次提交
  3. 14 9月, 2019 1 次提交
    • R
      harden thread start with failed scheduling against broken __clone · f5eee489
      Rich Felker 提交于
      commit 8a544ee3 introduced a
      dependency of the failure path for explicit scheduling at thread
      creation on __clone's handling of the start function returning, which
      should result in SYS_exit.
      
      as noted in commit 05870abe, the arm
      version of __clone was broken in this case. in the past, the mips
      version was also broken; it was fixed in commit
      8b2b61e0.
      
      since this code path is pretty much entirely untested (previously only
      reachable in applications that call the public clone() and return from
      the start function) and consists of fragile per-arch asm, don't assume
      it works, at least not until it's been thoroughly tested. instead make
      the SYS_exit syscall from the start function's failure path.
      f5eee489
  4. 12 9月, 2019 2 次提交
    • R
      fix arm __a_barrier_oldkuser when built as thumb · b0301f47
      Rich Felker 提交于
      as noted in commit 05870abe, mov lr,pc
      is not a valid method for saving the return address in code that might
      be built as thumb.
      
      this one is unlikely to matter, since any ISA level that has thumb2
      should also have native implementations of atomics that don't involve
      kuser_helper, and the affected code is only used on very old kernels
      to begin with.
      b0301f47
    • R
      fix code path where child function returns in arm __clone built as thumb · 05870abe
      Rich Felker 提交于
      mov lr,pc is not a valid way to save the return address in thumb mode
      since it omits the thumb bit. use a chain of bl and bx to emulate blx.
      this could be avoided by converting to a .S file with preprocessor
      conditions to use blx if available, but the time cost here is
      dominated by the syscall anyway.
      
      while making this change, also remove the remnants of support for
      pre-bx ISA levels. commit 9f290a49
      removed the hack from the parent code paths, but left the unnecessary
      code in the child. keeping it would require rewriting two code paths
      rather than one, and is useless for reasons described in that commit.
      05870abe
  5. 07 9月, 2019 3 次提交
    • R
      synchronously clean up pthread_create failure due to scheduling errors · 8a544ee3
      Rich Felker 提交于
      previously, when pthread_create failed due to inability to set
      explicit scheduling according to the requested attributes, the nascent
      thread was detached and made responsible for its own cleanup via the
      standard pthread_exit code path. this left it consuming resources
      potentially well after pthread_create returned, in a way that the
      application could not see or mitigate, and unnecessarily exposed its
      existence to the rest of the implementation via the global thread
      list.
      
      instead, attempt explicit scheduling early and reuse the failure path
      for __clone failure if it fails. the nascent thread's exit futex is
      not needed for unlocking the thread list, since the thread calling
      pthread_create holds the thread list lock the whole time, so it can be
      repurposed to ensure the thread has finished exiting. no pthread_exit
      is needed, and freeing the stack, if needed, can happen just as it
      would if __clone failed.
      8a544ee3
    • R
      set explicit scheduling for new thread from calling thread, not self · 022f27d5
      Rich Felker 提交于
      if setting scheduling properties succeeds, the new thread may end up
      with lower priority than the caller, and may be unable to continue
      running due to another intermediate-priority thread. this produces a
      priority inversion situation for the thread calling pthread_create,
      since it cannot return until the new thread reports success.
      
      originally, the parent was responsible for setting the new thread's
      priority; commits b8742f32 and
      40bae2d3 changed it as part of
      trimming down the pthread structure. since then, commit
      04335d92 partly reversed the changes,
      but did not switch responsibilities back. do that now.
      022f27d5
    • R
      fix unsynchronized decrement of thread count on pthread_create error · dd0a23dd
      Rich Felker 提交于
      commit 8f11e612 wrongly documented
      that all changes to libc.threads_minus_1 were guarded by the thread
      list lock, but the decrement for failed SYS_clone took place after the
      thread list lock was released.
      dd0a23dd
  6. 07 8月, 2019 1 次提交
  7. 02 8月, 2019 1 次提交
    • R
      fix missing declarations for pthread_join extensions in source file · 3541925f
      Rich Felker 提交于
      per policy, define the feature test macro to get declarations for the
      pthread_tryjoin_np and pthread_timedjoin_np functions. in the past
      this has been only for checking; with 32-bit archs getting 64-bit
      time_t it will also be necessary for symbols to get redirected
      correctly.
      3541925f
  8. 29 7月, 2019 2 次提交
    • R
      remove x32 syscall timespec fixup hacks · 4c307bed
      Rich Felker 提交于
      the x32 syscall interfaces treat timespec's tv_nsec member as 64-bit
      despite the API type being long and long being 32-bit in the ABI. this
      is no problem for syscalls that store timespecs to userspace as
      results, but caused uninitialized padding to be misinterpreted as the
      high bits in syscalls that take timespecs as input.
      
      since the beginning of the port, we've dealt with this situation with
      hacks in syscall_arch.h, and injected between __syscall_cp_c and
      __syscall_cp_asm, to special-case the syscall numbers that involve
      timespecs as inputs and copy them to a form suitable to pass to the
      kernel.
      
      commit 40aa18d5 set the stage for
      removal of these hacks by letting us treat the "normal" x32 syscalls
      dealing with timespec as if they're x32's "time64" syscalls,
      effectively making x32 ax "time64-only 32-bit arch" like riscv32 will
      be when it's added. since then, all users of syscalls that x32's
      syscall_arch.h had hacks for have been updated to use time64 syscalls,
      so the hacks can be removed.
      
      there are still at least a few other timespec-related syscalls broken
      on x32, which were overlooked when the x32 hacks were done or added
      later. these include at least recvmmsg, adjtimex/clock_adjtime, and
      timerfd_settime, and they will be fixed independently later on.
      4c307bed
    • R
      futex wait operations: add time64 syscall support, decouple 32-bit time_t · 1492bdf5
      Rich Felker 提交于
      thanks to the original factorization using the __timedwait function,
      there are no FUTEX_WAIT calls anywhere else, giving us a single point
      of change to make nearly all the timed thread primitives time64-ready.
      the one exception is the FUTEX_LOCK_PI command for PI mutex timedlock.
      I haven't tried to make these two points share code, since they have
      different fallbacks (no non-private fallback needed for PI since PI
      was added later) and FUTEX_LOCK_PI isn't a cancellation point (thus
      allowing the whole code path to inline into pthread_mutex_timedlock).
      
      as for other changes in this series, the time64 syscall is used only
      if it's the only one defined for the arch, or if the requested timeout
      does not fit in 32 bits. on current 32-bit archs where time_t is a
      32-bit type, this makes it statically unreachable.
      
      on 64-bit archs, there are only superficial changes to the code after
      preprocessing. on current 32-bit archs, the time is passed via an
      intermediate copy to remove the assumption that time_t is a 32-bit
      type.
      1492bdf5
  9. 27 7月, 2019 1 次提交
    • R
      refactor thrd_sleep and nanosleep in terms of clock_nanosleep · 331993e3
      Rich Felker 提交于
      for namespace-safety with thrd_sleep, this requires an alias, which is
      also added. this eliminates all but one direct call point for
      nanosleep syscalls, and arranges that 64-bit time_t conversion logic
      will only need to exist in one file rather than three.
      
      as a bonus, clock_nanosleep with CLOCK_REALTIME and empty flags is now
      implemented as SYS_nanosleep, thereby working on older kernels that
      may lack POSIX clocks functionality.
      331993e3
  10. 15 6月, 2019 1 次提交
    • R
      add riscv64 architecture support · 0a48860c
      Rich Felker 提交于
      Author: Alex Suykov <alex.suykov@gmail.com>
      Author: Aric Belsito <lluixhi@gmail.com>
      Author: Drew DeVault <sir@cmpwn.com>
      Author: Michael Clark <mjc@sifive.com>
      Author: Michael Forney <mforney@mforney.org>
      Author: Stefan O'Rear <sorear2@gmail.com>
      
      This port has involved the work of many people over several years. I
      have tried to ensure that everyone with substantial contributions has
      been credited above; if any omissions are found they will be noted
      later in an update to the authors/contributors list in the COPYRIGHT
      file.
      
      The version committed here comes from the riscv/riscv-musl repo's
      commit 3fe7e2c75df78eef42dcdc352a55757729f451e2, with minor changes by
      me for issues found during final review:
      
      - a_ll/a_sc atomics are removed (according to the ISA spec, lr/sc
        are not safe to use in separate inline asm fragments)
      
      - a_cas[_p] is fixed to be a memory barrier
      
      - the call from the _start assembly into the C part of crt1/ldso is
        changed to allow for the possibility that the linker does not place
        them nearby each other.
      
      - DTP_OFFSET is defined correctly so that local-dynamic TLS works
      
      - reloc.h LDSO_ARCH logic is simplified and made explicit.
      
      - unused, non-functional crti/n asm files are removed.
      
      - an empty .sdata section is added to crt1 so that the
        __global_pointer reference is resolvable.
      
      - indentation style errors in some asm files are fixed.
      0a48860c
  11. 11 4月, 2019 2 次提交
    • R
      remove external __syscall function and last remaining users · 788d5e24
      Rich Felker 提交于
      the weak version of __syscall_cp_c was using a tail call to __syscall
      to avoid duplicating the 6-argument syscall code inline in small
      static-linked programs, but now that __syscall no longer exists, the
      inline expansion is no longer duplication.
      
      the syscall.h machinery suppported up to 7 syscall arguments, only via
      an external __syscall function, but we presently have no syscall call
      points that actually make use of that many, and the kernel only
      defines 7-argument calling conventions for arm, powerpc (32-bit), and
      sh. if it turns out we need them in the future, they can easily be
      added.
      788d5e24
    • R
      overhaul i386 syscall mechanism not to depend on external asm source · 22e5bbd0
      Rich Felker 提交于
      this is the first part of a series of patches intended to make
      __syscall fully self-contained in the object file produced using
      syscall.h, which will make it possible for crt1 code to perform
      syscalls.
      
      the (confusingly named) i386 __vsyscall mechanism, which this commit
      removes, was introduced before the presence of a valid thread pointer
      was mandatory; back then the thread pointer was setup lazily only if
      threads were used. the intent was to be able to perform syscalls using
      the kernel's fast entry point in the VDSO, which can use the sysenter
      (Intel) or syscall (AMD) instruction instead of int $128, but without
      inlining an access to the __syscall global at the point of each
      syscall, which would incur a significant size cost from PIC setup
      everywhere. the mechanism also shuffled registers/calling convention
      around to avoid spills of call-saved registers, and to avoid
      allocating ebx or ebp via asm constraints, since there are plenty of
      broken-but-supported compiler versions which are incapable of
      allocating ebx with -fPIC or ebp with -fno-omit-frame-pointer.
      
      the new mechanism preserves the properties of avoiding spills and
      avoiding allocation of ebx/ebp in constraints, but does it inline,
      using some fairly simple register shuffling, and uses a field of the
      thread structure rather than global data for the vdso-provided syscall
      code address.
      
      for now, the external __syscall function is refactored not to use the
      old __vsyscall so it can be kept, but the intent is to remove it too.
      22e5bbd0
  12. 02 4月, 2019 1 次提交
  13. 01 4月, 2019 1 次提交
    • R
      implement priority inheritance mutexes · 54ca6779
      Rich Felker 提交于
      priority inheritance is a feature to mitigate priority inversion
      situations, where a execution of a medium-priority thread can
      unboundedly block forward progress of a high-priority thread when a
      lock it needs is held by a low-priority thread.
      
      the natural way to do priority inheritance would be with a simple
      futex flag to donate the calling thread's priority to a target thread
      while it waits on the futex. unfortunately, linux does not offer such
      an interface, but instead insists on implementing the whole locking
      protocol in kernelspace with special futex commands that exist solely
      for the purpose of doing PI mutexes. this would require the entire
      "trylock" logic to be duplicated in the timedlock code path for PI
      mutexes, since, once the previous lock holder releases the lock and
      the futex call returns, the lock is already held by the caller.
      obviously such code duplication is undesirable.
      
      instead, I've made the PI timedlock success path set the mutex lock
      count to -1, which can be thought of as "not yet complete", since a
      lock count of 0 is "locked, with no recursive references". a simple
      branch in a non-hot path of pthread_mutex_trylock can then see and act
      on this state, skipping past the code that would check and take the
      lock to the same code path that runs after the lock is obtained for a
      non-PI mutex.
      
      because we're forced to let the kernel perform the actual lock and
      unlock operations whenever the mutex is contended, we have to patch
      things up when it does the wrong thing:
      
      1. the lock operation is not aware of whether the mutex is
         error-checking, so it will always fail with EDEADLK rather than
         deadlocking.
      
      2. the lock operation is not aware of whether the mutex is robust, so
         it will successfully obtain mutexes in the owner-died state even if
         they're non-robust, whereas this operation should deadlock.
      
      3. the unlock operation always sets the lock value to zero, whereas
         for robust mutexes, we want to set it to a special value indicating
         that the mutex obtained after its owner died was unlocked without
         marking it consistent, so that future operations all fail with
         ENOTRECOVERABLE.
      
      the first of these is easy to solve, just by performing a futex wait
      on a dummy futex address to simulate deadlock or ETIMEDOUT as
      appropriate. but problems 2 and 3 interact in a nasty way. to solve
      problem 2, we need to back out the spurious success. but if waiters
      are present -- which we can't just ignore, because even if we don't
      want to wake them, the calling thread is incorrectly inheriting their
      priorities -- this requires using the kernel's unlock operation, which
      will zero the lock value, thereby losing the "owner died with lock
      held" state.
      
      to solve these problems, we overload the mutex's waiters field, which
      is unused for PI mutexes since they don't call the normal futex wait
      functions, as an indicator that the PI mutex is permanently
      non-lockable. originally I wanted to use the count field, but there is
      one code path that needs to access this flag without synchronization:
      trylock's CAS failure path needs to be able to decide whether to fail
      with EBUSY or ENOTRECOVERABLE, the waiters field is already treated as
      a relaxed-order atomic in our memory model, so this works out nicely.
      54ca6779
  14. 30 3月, 2019 1 次提交
    • R
      clean up access to mutex type in pthread_mutex_trylock · 2142cafd
      Rich Felker 提交于
      there was no point in masking off the pshared bit when first loading
      the type, since every subsequent access involves a mask anyway. not
      masking it may avoid a subsequent load to check the pshared flag, and
      it's just simpler.
      2142cafd
  15. 22 3月, 2019 1 次提交
    • R
      fix data race choosing next key slot in pthread_key_create · 59f88d76
      Rich Felker 提交于
      commit 84d061d5 wrongly moved the
      access to the global next_key outside of the scope of the lock. the
      error manifested as spurious failure to find an available key slot
      under concurrent calls to pthread_key_create, since the stopping
      condition could be met after only a small number of slots were
      examined.
      59f88d76
  16. 14 3月, 2019 1 次提交
  17. 22 2月, 2019 2 次提交
    • R
      add membarrier syscall wrapper, refactor dynamic tls install to use it · ba18c1ec
      Rich Felker 提交于
      the motivation for this change is twofold. first, it gets the fallback
      logic out of the dynamic linker, improving code readability and
      organization. second, it provides application code that wants to use
      the membarrier syscall, which depends on preregistration of intent
      before the process becomes multithreaded unless unbounded latency is
      acceptable, with a symbol that, when linked, ensures that this
      registration happens.
      ba18c1ec
    • R
      make thread list lock a recursive lock · 7865d569
      Rich Felker 提交于
      this is a prerequisite for factoring the membarrier fallback code into
      a function that can be called from a context with the thread list
      already locked or independently.
      7865d569
  18. 19 2月, 2019 1 次提交
    • R
      install dynamic tls synchronously at dlopen, streamline access · 9d44b646
      Rich Felker 提交于
      previously, dynamic loading of new libraries with thread-local storage
      allocated the storage needed for all existing threads at load-time,
      precluding late failure that can't be handled, but left installation
      in existing threads to take place lazily on first access. this imposed
      an additional memory access and branch on every dynamic tls access,
      and imposed a requirement, which was not actually met, that the
      dynamic tlsdesc asm functions preserve all call-clobbered registers
      before calling C code to to install new dynamic tls on first access.
      the x86[_64] versions of this code wrongly omitted saving and
      restoring of fpu/vector registers, assuming the compiler would not
      generate anything using them in the called C code. the arm and aarch64
      versions saved known existing registers, but failed to be future-proof
      against expansion of the register file.
      
      now that we track live threads in a list, it's possible to install the
      new dynamic tls for each thread at dlopen time. for the most part,
      synchronization is not needed, because if a thread has not
      synchronized with completion of the dlopen, there is no way it can
      meaningfully request access to a slot past the end of the old dtv,
      which remains valid for accessing slots which already existed.
      however, it is necessary to ensure that, if a thread sees its new dtv
      pointer, it sees correct pointers in each of the slots that existed
      prior to the dlopen. my understanding is that, on most real-world
      coherency architectures including all the ones we presently support, a
      built-in consume order guarantees this; however, don't rely on that.
      instead, the SYS_membarrier syscall is used to ensure that all threads
      see the stores to the slots of their new dtv prior to the installation
      of the new dtv. if it is not supported, the same is implemented in
      userspace via signals, using the same mechanism as __synccall.
      
      the __tls_get_addr function, variants, and dynamic tlsdesc asm
      functions are all updated to remove the fallback paths for claiming
      new dynamic tls, and are now all branch-free.
      9d44b646
  19. 18 2月, 2019 1 次提交
    • R
      fix data race between new pthread_key_delete and dtor execution · 80528892
      Rich Felker 提交于
      access to clear the entry in each thread's tsd array for the key being
      deleted was not synchronized with __pthread_tsd_run_dtors. I probably
      made this mistake from a mistaken belief that the thread list lock was
      held during the latter, which of course is not possible since it
      executes application code in a still-live-thread context.
      
      while we're at it, expand the interval during which signals are
      blocked to cover taking the write lock on key_lock, so that a signal
      at an inopportune time doesn't block forward progress of readers.
      80528892
  20. 17 2月, 2019 1 次提交
  21. 16 2月, 2019 4 次提交
    • R
      rewrite pthread_key_delete to use global thread list · ba74a42c
      Rich Felker 提交于
      with the availability of the thread list, there is no need to mark tsd
      key slots dirty and clean them up only when a free slot can't be
      found. instead, directly iterate threads and clear any value
      associated with the key being deleted.
      
      no synchronization is necessary for the clearing, since there is no
      way the slot can be accessed without having synchronized with the
      creation of a new key occupying the same slot, which is already
      sequenced after and synchronized with the deletion of the old key.
      ba74a42c
    • R
      rewrite __synccall in terms of global thread list · e4235d70
      Rich Felker 提交于
      the __synccall mechanism provides stop-the-world synchronous execution
      of a callback in all threads of the process. it is used to implement
      multi-threaded setuid/setgid operations, since Linux lacks them at the
      kernel level, and for some other less-critical purposes.
      
      this change eliminates dependency on /proc/self/task to determine the
      set of live threads, which in addition to being an unwanted dependency
      and a potential point of resource-exhaustion failure, turned out to be
      inaccurate. test cases provided by Alexey Izbyshev showed that it
      could fail to reflect newly created threads. due to how the
      presignaling phase worked, this usually yielded a deadlock if hit, but
      in the worst case it could also result in threads being silently
      missed (allowed to continue running without executing the callback).
      e4235d70
    • R
      track all live threads in an AS-safe, fully-consistent linked list · 8f11e612
      Rich Felker 提交于
      the hard problem here is unlinking threads from a list when they exit
      without creating a window of inconsistency where the kernel task for a
      thread still exists and is still executing instructions in userspace,
      but is not reflected in the list. the magic solution here is getting
      rid of per-thread exit futex addresses (set_tid_address), and instead
      using the exit futex to unlock the global thread list.
      
      since pthread_join can no longer see the thread enter a detach_state
      of EXITED (which depended on the exit futex address pointing to the
      detach_state), it must now observe the unlocking of the thread list
      lock before it can unmap the joined thread and return. it doesn't
      actually have to take the lock. for this, a __tl_sync primitive is
      offered, with a signature that will allow it to be enhanced for quick
      return even under contention on the lock, if needed. for now, the
      exiting thread always performs a futex wake on its detach_state. a
      future change could optimize this out except when there is already a
      joiner waiting.
      
      initial/dynamic variants of detached state no longer need to be
      tracked separately, since the futex address is always set to the
      global list lock, not a thread-local address that could become invalid
      on detached thread exit. all detached threads, however, must perform a
      second sigprocmask syscall to block implementation-internal signals,
      since locking the thread list with them already blocked is not
      permissible.
      
      the arch-independent C version of __unmapself no longer needs to take
      a lock or setup its own futex address to release the lock, since it
      must necessarily be called with the thread list lock already held,
      guaranteeing exclusive access to the temporary stack.
      
      changes to libc.threads_minus_1 no longer need to be atomic, since
      they are guarded by the thread list lock. it is largely vestigial at
      this point, and can be replaced with a cheaper boolean indicating
      whether the process is multithreaded at some point in the future.
      8f11e612
    • R
      always block signals for starting new threads, refactor start args · 04335d92
      Rich Felker 提交于
      whether signals need to be blocked at thread start, and whether
      unblocking is necessary in the entry point function, has historically
      depended on intricacies of the cancellation design and on whether
      there are scheduling operations to perform on the new thread before
      its successful creation can be committed. future changes to track an
      AS-safe list of live threads will require signals to be blocked
      whenever changes are made to the list, so ...
      
      prior to commits b8742f32 and
      40bae2d3, a signal mask for the entry
      function to restore was part of the pthread structure. it was removed
      to trim down the size of the structure, which both saved a small
      amount of stack space and improved code generation on archs where
      small immediate displacements are less costly than arbitrary ones, by
      limiting the range of offsets between the base of the thread
      structure, its members, and the thread pointer. these commits moved
      the saved mask to a special structure used only when special
      scheduling was needed, in which case the pthread_create caller and new
      thread had to synchronize with each other and could use this memory to
      pass a mask.
      
      this commit partially reverts the above two commits, but instead of
      putting the mask back in the pthread structure, it moves all "start
      argument" members out of the pthread structure, trimming it down
      further, and puts them in a separate structure passed on the new
      thread's stack. the code path for explicit scheduling of the new
      thread is also changed to synchronize with the calling thread in such
      a way to avoid spurious futex wakes.
      04335d92
  22. 13 2月, 2019 1 次提交
    • R
      redesign robust mutex states to eliminate data races on type field · 099b89d3
      Rich Felker 提交于
      in order to implement ENOTRECOVERABLE, the implementation has
      traditionally used a bit of the mutex type field to indicate that it's
      recovered after EOWNERDEAD and will go into ENOTRECOVERABLE state if
      pthread_mutex_consistent is not called before unlocking. while it's
      only the thread that holds the lock that needs access to this
      information (except possibly for the sake of pthread_mutex_consistent
      choosing between EINVAL and EPERM for erroneous calls), the change to
      the type field is formally a data race with all other threads that
      perform any operation on the mutex. no individual bits race, and no
      write races are possible, so things are "okay" in some sense, but it's
      still not good.
      
      this patch moves the recovery/consistency state to the mutex
      owner/lock field which is rightfully mutable. bit 30, the same bit the
      kernel uses with a zero owner to indicate that the previous owner died
      holding the lock, is now used with a nonzero owner to indicate that
      the mutex is held but has not yet been marked consistent. note that
      the kernel ABI also reserves bit 29 not to appear in any tid, so the
      sentinel value we use for ENOTRECOVERABLE, 0x7fffffff, does not clash
      with any tid plus bit 30.
      099b89d3
  23. 17 1月, 2019 1 次提交
    • R
      fix unintended linking dependency of pthread_key_create on __synccall · 16a522ba
      Rich Felker 提交于
      commit 84d061d5 attempted to do this
      already, but omitted from pthread_key_create.c the weak definition of
      __pthread_key_delete_synccall, so that the definition provided by
      pthread_key_delete.c was always pulled in.
      
      based on patch by Markus Wichmann, but with a weak alias rather than
      weak reference for consistency/policy about dependence on tooling
      features.
      16a522ba
  24. 20 12月, 2018 1 次提交
    • R
      make sem_wait and sem_timedwait interruptible by signals · 21a172dd
      Rich Felker 提交于
      this reverts commit c0ed5a20, which
      was based on a mistaken reading of POSIX due to inconsistency between
      the description (which requires return upon interruption by a signal)
      and the errors list (which wrongly lists EINTR as "may fail").
      
      since the previously-introduced behavior was a workaround for an old
      kernel bug to ensure safety of correct programs that were not hardened
      against the bug, an effort has been made to preserve it for programs
      which do not use interrupting signal handlers. the stage for this was
      set in commit a63c0104, which makes
      the futex __timedwait backend suppress EINTR if it's seen when no
      interrupting signal handlers have been installed.
      
      based loosely on a patch submitted by Orivej Desh, but with
      unnecessary additional changes removed.
      21a172dd
  25. 19 12月, 2018 2 次提交
    • R
      don't fail pthread_sigmask/sigprocmask on invalid how when set is null · 1ec71c53
      Rich Felker 提交于
      the resolution of Austin Group issue #1132 changes the requirement to
      fail so that it only applies when the set argument (new mask) is
      non-null. this change was made for consistency with the description,
      which specified "if set is a null pointer, the value of the argument
      how is not significant".
      1ec71c53
    • R
      add __timedwait backend workaround for old kernels where futex EINTRs · a63c0104
      Rich Felker 提交于
      prior to linux 2.6.22, futex wait could fail with EINTR even for
      non-interrupting (SA_RESTART) signals. this was no problem provided
      the caller simply restarted the wait, but sem_[timed]wait is required
      by POSIX to return when interrupted by a signal. commit
      a113434c introduced this behavior, and
      commit c0ed5a20 reverted it based on a
      mistaken belief that it was not required. this belief stems from a bug
      in the specification: the description requires the function to return
      when interrupted, but the errors section marks EINTR as a "may fail"
      condition rather than a "shall fail" one.
      
      since there does seem to be significant value in the change made in
      commit c0ed5a20, making it so that
      programs that call sem_wait without checking for EINTR don't silently
      make forward progress without obtaining the semaphore or treat it as a
      fatal error and abort, add a behind-the-scenes mechanism in the
      __timedwait backend to suppress EINTR in programs that have never
      installed interrupting signal handlers, and have sigaction track and
      report this state. this way the semaphore code is not cluttered by
      workarounds and can be updated (to be done in next commit) to reflect
      the high-level logic for conforming behavior.
      
      these changes are based loosely on a patch by Markus Wichmann, with
      the main changes being atomic update to flag object and moving the
      workaround from sem_timedwait to the __timedwait futex backend.
      a63c0104
  26. 12 10月, 2018 1 次提交
    • R
      combine arch ABI's DTP_OFFSET into DTV pointers · b6d701a4
      Rich Felker 提交于
      as explained in commit 6ba5517a, some
      archs use an offset (typicaly -0x8000) with their DTPOFF relocations,
      which __tls_get_addr needs to invert. on affected archs, which lack
      direct support for large immediates, this can cost multiple extra
      instructions in the hot path. instead, incorporate the DTP_OFFSET into
      the DTV entries. this means they are no longer valid pointers, so
      store them as an array of uintptr_t rather than void *; this also
      makes it easier to access slot 0 as a valid slot count.
      
      commit e75b16cf left behind cruft in
      two places, __reset_tls and __tls_get_new, from back when it was
      possible to have uninitialized gap slots indicated by a null pointer
      in the DTV. since the concept of null pointer is no longer meaningful
      with an offset applied, remove this cruft.
      
      presently there are no archs with both TLSDESC and nonzero DTP_OFFSET,
      but the dynamic TLSDESC relocation code is also updated to apply an
      inverted offset to its offset field, so that the offset DTV would not
      impose a runtime cost in TLSDESC resolver functions.
      b6d701a4
  27. 19 9月, 2018 3 次提交
  28. 18 9月, 2018 1 次提交
    • R
      fix deletion of pthread tsd keys that still have non-null values stored · 84d061d5
      Rich Felker 提交于
      per POSIX, deletion of a key for which some threads still have values
      stored is permitted, and newly created keys must initially hold the
      null value in all threads. these properties were not met by our
      implementation; if a key was deleted with values left and a new key
      was created in the same slot, the old values were still visible.
      
      moreover, due to lack of any synchronization in pthread_key_delete,
      there was a TOCTOU race whereby a concurrent pthread_exit could
      attempt to call a null destructor pointer for the newly orphaned
      value.
      
      this commit introduces a solution based on __synccall, stopping the
      world to zero out the values for deleted keys, but only does so lazily
      when all key slots have been exhausted. pthread_key_delete is split
      off into a separate translation unit so that static-linked programs
      which only create keys but never delete them will not pull in the
      __synccall machinery.
      
      a global rwlock is added to synchronize creation and deletion of keys
      with dtor execution. since the dtor execution loop now has to release
      and retake the lock around its call to each dtor, checks are made not
      to call the nodtor dummy function for keys which lack a dtor.
      84d061d5