1. 24 4月, 2015 1 次提交
    • R
      make __init_tp function static when static linking · 5f51d529
      Rich Felker 提交于
      this slightly reduces the code size cost of TLS/thread-pointer for
      static linking since __init_tp can be inlined into its only caller and
      removed. this is analogous to the handling of __init_libc in
      __libc_start_main, where the function only has external linkage when
      it needs to be called from the dynamic linker.
      5f51d529
  2. 22 4月, 2015 1 次提交
  3. 14 4月, 2015 1 次提交
    • R
      remove remnants of support for running in no-thread-pointer mode · 19a1fe67
      Rich Felker 提交于
      since 1.1.0, musl has nominally required a thread pointer to be setup.
      most of the remaining code that was checking for its availability was
      doing so for the sake of being usable by the dynamic linker. as of
      commit 71f099cb, this is no longer
      necessary; the thread pointer is now valid before any libc code
      (outside of dynamic linker bootstrap functions) runs.
      
      this commit essentially concludes "phase 3" of the "transition path
      for removing lazy init of thread pointer" project that began during
      the 1.1.0 release cycle.
      19a1fe67
  4. 10 4月, 2015 1 次提交
    • R
      optimize out setting up robust list with kernel when not needed · 4e98cce1
      Rich Felker 提交于
      as a result of commit 12e1e324, kernel
      processing of the robust list is only needed for process-shared
      mutexes. previously the first attempt to lock any owner-tracked mutex
      resulted in robust list initialization and a set_robust_list syscall.
      this is no longer necessary, and since the kernel's record of the
      robust list must now be cleared at thread exit time for detached
      threads, optimizing it out is more worthwhile than before too.
      4e98cce1
  5. 12 3月, 2015 1 次提交
    • S
      copy the dtv pointer to the end of the pthread struct for TLS_ABOVE_TP archs · 204a69d2
      Szabolcs Nagy 提交于
      There are two main abi variants for thread local storage layout:
      
       (1) TLS is above the thread pointer at a fixed offset and the pthread
       struct is below that. So the end of the struct is at known offset.
      
       (2) the thread pointer points to the pthread struct and TLS starts
       below it. So the start of the struct is at known (zero) offset.
      
      Assembly code for the dynamic TLSDESC callback needs to access the
      dynamic thread vector (dtv) pointer which is currently at the front
      of the pthread struct. So in case of (1) the asm code needs to hard
      code the offset from the end of the struct which can easily break if
      the struct changes.
      
      This commit adds a copy of the dtv at the end of the struct. New members
      must not be added after dtv_copy, only before it. The size of the struct
      is increased a bit, but there is opportunity for size optimizations.
      204a69d2
  6. 07 3月, 2015 1 次提交
    • R
      fix over-alignment of TLS, insufficient builtin TLS on 64-bit archs · bd67959f
      Rich Felker 提交于
      a conservative estimate of 4*sizeof(size_t) was used as the minimum
      alignment for thread-local storage, despite the only requirements
      being alignment suitable for struct pthread and void* (which struct
      pthread already contains). additional alignment required by the
      application or libraries is encoded in their headers and is already
      applied.
      
      over-alignment prevented the builtin_tls array from ever being used in
      dynamic-linked programs on 64-bit archs, thereby requiring allocation
      at startup even in programs with no TLS of their own.
      bd67959f
  7. 13 8月, 2014 1 次提交
  8. 06 7月, 2014 1 次提交
    • R
      eliminate use of cached pid from thread structure · 83dc6eb0
      Rich Felker 提交于
      the main motivation for this change is to remove the assumption that
      the tid of the main thread is also the pid of the process. (the value
      returned by the set_tid_address syscall was used to fill both fields
      despite it semantically being the tid.) this is historically and
      presently true on linux and unlikely to change, but it conceivably
      could be false on other systems that otherwise reproduce the linux
      syscall api/abi.
      
      only a few parts of the code were actually still using the cached pid.
      in a couple places (aio and synccall) it was a minor optimization to
      avoid a syscall. caching could be reintroduced, but lazily as part of
      the public getpid function rather than at program startup, if it's
      deemed important for performance later. in other places (cancellation
      and pthread_kill) the pid was completely unnecessary; the tkill
      syscall can be used instead of tgkill. this is actually a rather
      subtle issue, since tgkill is supposedly a solution to race conditions
      that can affect use of tkill. however, as documented in the commit
      message for commit 7779dbd2, tgkill
      does not actually solve this race; it just limits it to happening
      within one process rather than between processes. we use a lock that
      avoids the race in pthread_kill, and the use in the cancellation
      signal handler is self-targeted and thus not subject to tid reuse
      races, so both are safe regardless of which syscall (tgkill or tkill)
      is used.
      83dc6eb0
  9. 03 7月, 2014 1 次提交
    • R
      add locale framework · 0bc03091
      Rich Felker 提交于
      this commit adds non-stub implementations of setlocale, duplocale,
      newlocale, and uselocale, along with the data structures and minimal
      code needed for representing the active locale on a per-thread basis
      and optimizing the common case where thread-local locale settings are
      not in use.
      
      at this point, the data structures only contain what is necessary to
      represent LC_CTYPE (a single flag) and LC_MESSAGES (a name for use in
      finding message translation files). representation for the other
      categories will be added later; the expectation is that a single
      pointer will suffice for each.
      
      for LC_CTYPE, the strings "C" and "POSIX" are treated as special; any
      other string is accepted and treated as "C.UTF-8". for other
      categories, any string is accepted after being truncated to a maximum
      supported length (currently 15 bytes). for LC_MESSAGES, the name is
      kept regardless of whether libc itself can use such a message
      translation locale, since applications using catgets or gettext should
      be able to use message locales libc is not aware of. for other
      categories, names which are not successfully loaded as locales (which,
      at present, means all names) are treated as aliases for "C". setlocale
      never fails.
      
      locale settings are not yet used anywhere, so this commit should have
      no visible effects except for the contents of the string returned by
      setlocale.
      0bc03091
  10. 19 6月, 2014 1 次提交
    • R
      separate __tls_get_addr implementation from dynamic linker/init_tls · 5ba238e1
      Rich Felker 提交于
      such separation serves multiple purposes:
      
      - by having the common path for __tls_get_addr alone in its own
        function with a tail call to the slow case, code generation is
        greatly improved.
      
      - by having __tls_get_addr in it own file, it can be replaced on a
        per-arch basis as needed, for optimization or ABI-specific purposes.
      
      - by removing __tls_get_addr from __init_tls.c, a few bytes of code
        are shaved off of static binaries (which are unlikely to use this
        function unless the linker messed up).
      5ba238e1
  11. 10 6月, 2014 2 次提交
    • R
      simplify errno implementation · ac31bf27
      Rich Felker 提交于
      the motivation for the errno_ptr field in the thread structure, which
      this commit removes, was to allow the main thread's errno to keep its
      address when lazy thread pointer initialization was used. &errno was
      evaluated prior to setting up the thread pointer and stored in
      errno_ptr for the main thread; subsequently created threads would have
      errno_ptr pointing to their own errno_val in the thread structure.
      
      since lazy initialization was removed, there is no need for this extra
      level of indirection; __errno_location can simply return the address
      of the thread's errno_val directly. this does cause &errno to change,
      but the change happens before entry to application code, and thus is
      not observable.
      ac31bf27
    • R
      add thread-pointer support for pre-2.6 kernels on i386 · 64e32287
      Rich Felker 提交于
      such kernels cannot support threads, but the thread pointer is also
      important for other purposes, most notably stack protector. without a
      valid thread pointer, all code compiled with stack protector will
      crash. the same applies to any use of thread-local storage by
      applications or libraries.
      
      the concept of this patch is to fall back to using the modify_ldt
      syscall, which has been around since linux 1.0, to setup the gs
      segment register. since the kernel does not have a way to
      automatically assign ldt entries, use of slot zero is hard-coded. if
      this fallback path is used, __set_thread_area returns a positive value
      (rather than the usual zero for success, or negative for error)
      indicating to the caller that the thread pointer was successfully set,
      but only for the main thread, and that thread creation will not work
      properly. the code in __init_tp has been changed accordingly to record
      this result for later use by pthread_create.
      64e32287
  12. 07 4月, 2014 1 次提交
  13. 05 4月, 2014 1 次提交
  14. 25 3月, 2014 1 次提交
    • R
      always initialize thread pointer at program start · dab441ae
      Rich Felker 提交于
      this is the first step in an overhaul aimed at greatly simplifying and
      optimizing everything dealing with thread-local state.
      
      previously, the thread pointer was initialized lazily on first access,
      or at program startup if stack protector was in use, or at certain
      random places where inconsistent state could be reached if it were not
      initialized early. while believed to be fully correct, the logic was
      fragile and non-obvious.
      
      in the first phase of the thread pointer overhaul, support is retained
      (and in some cases improved) for systems/situation where loading the
      thread pointer fails, e.g. old kernels.
      
      some notes on specific changes:
      
      - the confusing use of libc.main_thread as an indicator that the
        thread pointer is initialized is eliminated in favor of an explicit
        has_thread_pointer predicate.
      
      - sigaction no longer needs to ensure that the thread pointer is
        initialized before installing a signal handler (this was needed to
        prevent a situation where the signal handler caused the thread
        pointer to be initialized and the subsequent sigreturn cleared it
        again) but it still needs to ensure that implementation-internal
        thread-related signals are not blocked.
      
      - pthread tsd initialization for the main thread is deferred in a new
        manner to minimize bloat in the static-linked __init_tp code.
      
      - pthread_setcancelstate no longer needs special handling for the
        situation before the thread pointer is initialized. it simply fails
        on systems that cannot support a thread pointer, which are
        non-conforming anyway.
      
      - pthread_cleanup_push/pop now check for missing thread pointer and
        nop themselves out in this case, so stdio no longer needs to avoid
        the cancellable path when the thread pointer is not available.
      
      a number of cases remain where certain interfaces may crash if the
      system does not support a thread pointer. at this point, these should
      be limited to pthread interfaces, and the number of such cases should
      be fewer than before.
      dab441ae
  15. 24 3月, 2014 1 次提交
    • R
      reduce static linking overhead from TLS support by inlining mmap syscall · 98221c36
      Rich Felker 提交于
      the external mmap function is heavy because it has to handle error
      reporting that the kernel cannot do, and has to do some locking for
      arcane race-condition-avoidance purposes. for allocating initial TLS,
      we do not need any of that; the raw syscall suffices.
      
      on i386, this change shaves off 13% of the size of .text for the empty
      program.
      98221c36
  16. 04 8月, 2013 1 次提交
    • R
      add system for resetting TLS to initial values · 7c6c2906
      Rich Felker 提交于
      this is needed for reused threads in the SIGEV_THREAD timer
      notification system, and could be reused elsewhere in the future if
      needed, though it should be refactored for such use.
      
      for static linking, __init_tls.c is simply modified to export the TLS
      info in a structure with external linkage, rather than using statics.
      this perhaps makes the code more clear, since the statics were poorly
      named for statics. the new __reset_tls.c is only linked if it is used.
      
      for dynamic linking, the code is in dynlink.c. sharing code with
      __copy_tls is not practical since __reset_tls must also re-zero
      thread-local bss.
      7c6c2906
  17. 14 7月, 2013 1 次提交
    • R
      fix omission of dtv setup in static linked programs on TLS variant I archs · f1292e3d
      Rich Felker 提交于
      apparently this was never noticed before because the linker normally
      optimizes dynamic TLS models to non-dynamic ones when static linking,
      thus eliminating the calls to __tls_get_addr which crash when the dtv
      is missing. however, some libsupc++ code on ARM was calling
      __tls_get_addr when static linked and crashing. the reason is unclear
      to me, but with this issue fixed it should work now anyway.
      f1292e3d
  18. 26 12月, 2012 1 次提交
    • R
      fix reference to libc struct in static tls init code · e172c7b4
      Rich Felker 提交于
      libc is the macro, __libc is the internal symbol, but under some
      configurations on old/broken compilers, the symbol might not actually
      exist and the libc macro might instead use __libc_loc() to obtain
      access to the object.
      e172c7b4
  19. 09 11月, 2012 1 次提交
    • R
      clean up sloppy nested inclusion from pthread_impl.h · efd4d87a
      Rich Felker 提交于
      this mirrors the stdio_impl.h cleanup. one header which is not
      strictly needed, errno.h, is left in pthread_impl.h, because since
      pthread functions return their error codes rather than using errno,
      nearly every single pthread function needs the errno constants.
      
      in a few places, rather than bringing in string.h to use memset, the
      memset was replaced by direct assignment. this seems to generate much
      better code anyway, and makes many functions which were previously
      non-leaf functions into leaf functions (possibly eliminating a great
      deal of bloat on some platforms where non-leaf functions require ugly
      prologue and/or epilogue).
      efd4d87a
  20. 02 11月, 2012 1 次提交
  21. 19 10月, 2012 1 次提交
  22. 16 10月, 2012 1 次提交
    • R
      add support for TLS variant I, presently needed for arm and mips · 9ec4283b
      Rich Felker 提交于
      despite documentation that makes it sound a lot different, the only
      ABI-constraint difference between TLS variants II and I seems to be
      that variant II stores the initial TLS segment immediately below the
      thread pointer (i.e. the thread pointer points to the end of it) and
      variant I stores the initial TLS segment above the thread pointer,
      requiring the thread descriptor to be stored below. the actual value
      stored in the thread pointer register also tends to have per-arch
      random offsets applied to it for silly micro-optimization purposes.
      
      with these changes applied, TLS should be basically working on all
      supported archs except microblaze. I'm still working on getting the
      necessary information and a working toolchain that can build TLS
      binaries for microblaze, but in theory, static-linked programs with
      TLS and dynamic-linked programs where only the main executable uses
      TLS should already work on microblaze.
      
      alignment constraints have not yet been heavily tested, so it's
      possible that this code does not always align TLS segments correctly
      on archs that need TLS variant I.
      9ec4283b
  23. 08 10月, 2012 1 次提交
    • R
      clean up and refactor program initialization · 0a96a37f
      Rich Felker 提交于
      the code in __libc_start_main is now responsible for parsing auxv,
      rather than duplicating the parsing all over the place. this should
      shave off a few cycles and some code size. __init_libc is left as an
      external-linkage function despite the fact that it could be static, to
      prevent it from being inlined and permanently wasting stack space when
      main is called.
      
      a few other minor changes are included, like eliminating per-thread
      ssp canaries (they were likely broken when combined with certain
      dlopen usages, and completely unnecessary) and some other unnecessary
      checks. since this code gets linked into every program, it should be
      as small and simple as possible.
      0a96a37f
  24. 07 10月, 2012 1 次提交
  25. 05 10月, 2012 3 次提交
    • R
      support for TLS in dynamic-loaded (dlopen) modules · dcd60371
      Rich Felker 提交于
      unlike other implementations, this one reserves memory for new TLS in
      all pre-existing threads at dlopen-time, and dlopen will fail with no
      resources consumed and no new libraries loaded if memory is not
      available. memory is not immediately distributed to running threads;
      that would be too complex and too costly. instead, assurances are made
      that threads needing the new TLS can obtain it in an async-signal-safe
      way from a buffer belonging to the dynamic linker/new module (via
      atomic fetch-and-add based allocator).
      
      I've re-appropriated the lock that was previously used for __synccall
      (synchronizing set*id() syscalls between threads) as a general
      pthread_create lock. it's a "backwards" rwlock where the "read"
      operation is safe atomic modification of the live thread count, which
      multiple threads can perform at the same time, and the "write"
      operation is making sure the count does not increase during an
      operation that depends on it remaining bounded (__synccall or dlopen).
      in static-linked programs that don't use __synccall, this lock is a
      no-op and has no cost.
      dcd60371
    • R
      partial TLS support for dynamic-linked programs · bc6a35fb
      Rich Felker 提交于
      only TLS in the main program is supported so far; TLS defined in
      shared libraries will not work yet.
      bc6a35fb
    • R
      TLS (GNU/C11 thread-local storage) support for static-linked programs · 8431d797
      Rich Felker 提交于
      the design for TLS in dynamic-linked programs is mostly complete too,
      but I have not yet implemented it. cost is nonzero but still low for
      programs which do not use TLS and/or do not use threads (a few hundred
      bytes of new code, plus dependency on memcpy). i believe it can be
      made smaller at some point by merging __init_tls and __init_security
      into __libc_start_main and avoiding duplicate auxv-parsing code.
      
      at the same time, I've also slightly changed the logic pthread_create
      uses to allocate guard pages to ensure that guard pages are not
      counted towards commit charge.
      8431d797