提交 · 5f51d529155429e607c5c51d5d461b0b98e6be52 · OpenHarmony / Third Party Musl

24 4月, 2015 1 次提交

make __init_tp function static when static linking · 5f51d529

由 Rich Felker 提交于 4月 23, 2015

this slightly reduces the code size cost of TLS/thread-pointer for
static linking since __init_tp can be inlined into its only caller and
removed. this is analogous to the handling of __init_libc in
__libc_start_main, where the function only has external linkage when
it needs to be called from the dynamic linker.

5f51d529

22 4月, 2015 1 次提交

remove useless visibility application from static-linking-only code · c267fb84

由 Rich Felker 提交于 4月 22, 2015

part of the goal here is to eliminate use of the ATTR_LIBC_VISIBILITY
macro outside of libc.h, since it was never intended to be 'public'.

c267fb84

14 4月, 2015 1 次提交

remove remnants of support for running in no-thread-pointer mode · 19a1fe67

由 Rich Felker 提交于 4月 13, 2015

since 1.1.0, musl has nominally required a thread pointer to be setup.
most of the remaining code that was checking for its availability was
doing so for the sake of being usable by the dynamic linker. as of
commit 71f099cb, this is no longer
necessary; the thread pointer is now valid before any libc code
(outside of dynamic linker bootstrap functions) runs.

this commit essentially concludes "phase 3" of the "transition path
for removing lazy init of thread pointer" project that began during
the 1.1.0 release cycle.

19a1fe67

10 4月, 2015 1 次提交

optimize out setting up robust list with kernel when not needed · 4e98cce1

由 Rich Felker 提交于 4月 10, 2015

as a result of commit 12e1e324, kernel
processing of the robust list is only needed for process-shared
mutexes. previously the first attempt to lock any owner-tracked mutex
resulted in robust list initialization and a set_robust_list syscall.
this is no longer necessary, and since the kernel's record of the
robust list must now be cleared at thread exit time for detached
threads, optimizing it out is more worthwhile than before too.

4e98cce1

12 3月, 2015 1 次提交

copy the dtv pointer to the end of the pthread struct for TLS_ABOVE_TP archs · 204a69d2

由 Szabolcs Nagy 提交于 3月 11, 2015

There are two main abi variants for thread local storage layout:

 (1) TLS is above the thread pointer at a fixed offset and the pthread
 struct is below that. So the end of the struct is at known offset.

 (2) the thread pointer points to the pthread struct and TLS starts
 below it. So the start of the struct is at known (zero) offset.

Assembly code for the dynamic TLSDESC callback needs to access the
dynamic thread vector (dtv) pointer which is currently at the front
of the pthread struct. So in case of (1) the asm code needs to hard
code the offset from the end of the struct which can easily break if
the struct changes.

This commit adds a copy of the dtv at the end of the struct. New members
must not be added after dtv_copy, only before it. The size of the struct
is increased a bit, but there is opportunity for size optimizations.

204a69d2

07 3月, 2015 1 次提交

fix over-alignment of TLS, insufficient builtin TLS on 64-bit archs · bd67959f

由 Rich Felker 提交于 3月 06, 2015

a conservative estimate of 4*sizeof(size_t) was used as the minimum
alignment for thread-local storage, despite the only requirements
being alignment suitable for struct pthread and void* (which struct
pthread already contains). additional alignment required by the
application or libraries is encoded in their headers and is already
applied.

over-alignment prevented the builtin_tls array from ever being used in
dynamic-linked programs on 64-bit archs, thereby requiring allocation
at startup even in programs with no TLS of their own.

bd67959f

13 8月, 2014 1 次提交
- S
  fix #ifdef inside a macro argument list in __init_tls.c · d86af2a0
  由 Szabolcs Nagy 提交于 8月 13, 2014
```
C99 6.10.3p11 disallows such constructs
so use an #ifdef outside of the argument list of __syscall
```
  d86af2a0
06 7月, 2014 1 次提交

eliminate use of cached pid from thread structure · 83dc6eb0

由 Rich Felker 提交于 7月 05, 2014

the main motivation for this change is to remove the assumption that
the tid of the main thread is also the pid of the process. (the value
returned by the set_tid_address syscall was used to fill both fields
despite it semantically being the tid.) this is historically and
presently true on linux and unlikely to change, but it conceivably
could be false on other systems that otherwise reproduce the linux
syscall api/abi.

only a few parts of the code were actually still using the cached pid.
in a couple places (aio and synccall) it was a minor optimization to
avoid a syscall. caching could be reintroduced, but lazily as part of
the public getpid function rather than at program startup, if it's
deemed important for performance later. in other places (cancellation
and pthread_kill) the pid was completely unnecessary; the tkill
syscall can be used instead of tgkill. this is actually a rather
subtle issue, since tgkill is supposedly a solution to race conditions
that can affect use of tkill. however, as documented in the commit
message for commit 7779dbd2, tgkill
does not actually solve this race; it just limits it to happening
within one process rather than between processes. we use a lock that
avoids the race in pthread_kill, and the use in the cancellation
signal handler is self-targeted and thus not subject to tid reuse
races, so both are safe regardless of which syscall (tgkill or tkill)
is used.

83dc6eb0

03 7月, 2014 1 次提交

add locale framework · 0bc03091

由 Rich Felker 提交于 7月 02, 2014

this commit adds non-stub implementations of setlocale, duplocale,
newlocale, and uselocale, along with the data structures and minimal
code needed for representing the active locale on a per-thread basis
and optimizing the common case where thread-local locale settings are
not in use.

at this point, the data structures only contain what is necessary to
represent LC_CTYPE (a single flag) and LC_MESSAGES (a name for use in
finding message translation files). representation for the other
categories will be added later; the expectation is that a single
pointer will suffice for each.

for LC_CTYPE, the strings "C" and "POSIX" are treated as special; any
other string is accepted and treated as "C.UTF-8". for other
categories, any string is accepted after being truncated to a maximum
supported length (currently 15 bytes). for LC_MESSAGES, the name is
kept regardless of whether libc itself can use such a message
translation locale, since applications using catgets or gettext should
be able to use message locales libc is not aware of. for other
categories, names which are not successfully loaded as locales (which,
at present, means all names) are treated as aliases for "C". setlocale
never fails.

locale settings are not yet used anywhere, so this commit should have
no visible effects except for the contents of the string returned by
setlocale.

0bc03091

19 6月, 2014 1 次提交

separate __tls_get_addr implementation from dynamic linker/init_tls · 5ba238e1

由 Rich Felker 提交于 6月 19, 2014

such separation serves multiple purposes:

- by having the common path for __tls_get_addr alone in its own
  function with a tail call to the slow case, code generation is
  greatly improved.

- by having __tls_get_addr in it own file, it can be replaced on a
  per-arch basis as needed, for optimization or ABI-specific purposes.

- by removing __tls_get_addr from __init_tls.c, a few bytes of code
  are shaved off of static binaries (which are unlikely to use this
  function unless the linker messed up).

5ba238e1

10 6月, 2014 2 次提交

simplify errno implementation · ac31bf27

由 Rich Felker 提交于 6月 10, 2014

the motivation for the errno_ptr field in the thread structure, which
this commit removes, was to allow the main thread's errno to keep its
address when lazy thread pointer initialization was used. &errno was
evaluated prior to setting up the thread pointer and stored in
errno_ptr for the main thread; subsequently created threads would have
errno_ptr pointing to their own errno_val in the thread structure.

since lazy initialization was removed, there is no need for this extra
level of indirection; __errno_location can simply return the address
of the thread's errno_val directly. this does cause &errno to change,
but the change happens before entry to application code, and thus is
not observable.

ac31bf27

add thread-pointer support for pre-2.6 kernels on i386 · 64e32287

由 Rich Felker 提交于 6月 10, 2014

such kernels cannot support threads, but the thread pointer is also
important for other purposes, most notably stack protector. without a
valid thread pointer, all code compiled with stack protector will
crash. the same applies to any use of thread-local storage by
applications or libraries.

the concept of this patch is to fall back to using the modify_ldt
syscall, which has been around since linux 1.0, to setup the gs
segment register. since the kernel does not have a way to
automatically assign ldt entries, use of slot zero is hard-coded. if
this fallback path is used, __set_thread_area returns a positive value
(rather than the usual zero for success, or negative for error)
indicating to the caller that the thread pointer was successfully set,
but only for the main thread, and that thread creation will not work
properly. the code in __init_tp has been changed accordingly to record
this result for later use by pthread_create.

64e32287

07 4月, 2014 1 次提交
- R
  
  remove some cruft from libc/tls init code · 7e8b0761
  由 Rich Felker 提交于 4月 07, 2014
  
  7e8b0761
05 4月, 2014 1 次提交

remove cruft left behind when lazy thread pointer init was removed · 561e0a09

由 Rich Felker 提交于 4月 04, 2014

the function itself was static, but the weak alias provided an
externally visible reference and thus prevented the dead code from
being omitted from the output. so this change actually reduces bloat
in mandatory static-linked code.

561e0a09

25 3月, 2014 1 次提交

always initialize thread pointer at program start · dab441ae

由 Rich Felker 提交于 3月 24, 2014

this is the first step in an overhaul aimed at greatly simplifying and
optimizing everything dealing with thread-local state.

previously, the thread pointer was initialized lazily on first access,
or at program startup if stack protector was in use, or at certain
random places where inconsistent state could be reached if it were not
initialized early. while believed to be fully correct, the logic was
fragile and non-obvious.

in the first phase of the thread pointer overhaul, support is retained
(and in some cases improved) for systems/situation where loading the
thread pointer fails, e.g. old kernels.

some notes on specific changes:

- the confusing use of libc.main_thread as an indicator that the
  thread pointer is initialized is eliminated in favor of an explicit
  has_thread_pointer predicate.

- sigaction no longer needs to ensure that the thread pointer is
  initialized before installing a signal handler (this was needed to
  prevent a situation where the signal handler caused the thread
  pointer to be initialized and the subsequent sigreturn cleared it
  again) but it still needs to ensure that implementation-internal
  thread-related signals are not blocked.

- pthread tsd initialization for the main thread is deferred in a new
  manner to minimize bloat in the static-linked __init_tp code.

- pthread_setcancelstate no longer needs special handling for the
  situation before the thread pointer is initialized. it simply fails
  on systems that cannot support a thread pointer, which are
  non-conforming anyway.

- pthread_cleanup_push/pop now check for missing thread pointer and
  nop themselves out in this case, so stdio no longer needs to avoid
  the cancellable path when the thread pointer is not available.

a number of cases remain where certain interfaces may crash if the
system does not support a thread pointer. at this point, these should
be limited to pthread interfaces, and the number of such cases should
be fewer than before.

dab441ae

24 3月, 2014 1 次提交

reduce static linking overhead from TLS support by inlining mmap syscall · 98221c36

由 Rich Felker 提交于 3月 23, 2014

the external mmap function is heavy because it has to handle error
reporting that the kernel cannot do, and has to do some locking for
arcane race-condition-avoidance purposes. for allocating initial TLS,
we do not need any of that; the raw syscall suffices.

on i386, this change shaves off 13% of the size of .text for the empty
program.

98221c36

04 8月, 2013 1 次提交

add system for resetting TLS to initial values · 7c6c2906

由 Rich Felker 提交于 8月 03, 2013

this is needed for reused threads in the SIGEV_THREAD timer
notification system, and could be reused elsewhere in the future if
needed, though it should be refactored for such use.

for static linking, __init_tls.c is simply modified to export the TLS
info in a structure with external linkage, rather than using statics.
this perhaps makes the code more clear, since the statics were poorly
named for statics. the new __reset_tls.c is only linked if it is used.

for dynamic linking, the code is in dynlink.c. sharing code with
__copy_tls is not practical since __reset_tls must also re-zero
thread-local bss.

7c6c2906

14 7月, 2013 1 次提交

fix omission of dtv setup in static linked programs on TLS variant I archs · f1292e3d

由 Rich Felker 提交于 7月 13, 2013

apparently this was never noticed before because the linker normally
optimizes dynamic TLS models to non-dynamic ones when static linking,
thus eliminating the calls to __tls_get_addr which crash when the dtv
is missing. however, some libsupc++ code on ARM was calling
__tls_get_addr when static linked and crashing. the reason is unclear
to me, but with this issue fixed it should work now anyway.

f1292e3d

26 12月, 2012 1 次提交

fix reference to libc struct in static tls init code · e172c7b4

由 Rich Felker 提交于 12月 25, 2012

libc is the macro, __libc is the internal symbol, but under some
configurations on old/broken compilers, the symbol might not actually
exist and the libc macro might instead use __libc_loc() to obtain
access to the object.

e172c7b4

09 11月, 2012 1 次提交

clean up sloppy nested inclusion from pthread_impl.h · efd4d87a

由 Rich Felker 提交于 11月 08, 2012

this mirrors the stdio_impl.h cleanup. one header which is not
strictly needed, errno.h, is left in pthread_impl.h, because since
pthread functions return their error codes rather than using errno,
nearly every single pthread function needs the errno constants.

in a few places, rather than bringing in string.h to use memset, the
memset was replaced by direct assignment. this seems to generate much
better code anyway, and makes many functions which were previously
non-leaf functions into leaf functions (possibly eliminating a great
deal of bloat on some platforms where non-leaf functions require ugly
prologue and/or epilogue).

efd4d87a

02 11月, 2012 1 次提交
- R
  
  fix unused variable warnings · 3a5aa8e4
  由 Rich Felker 提交于 11月 01, 2012
  
  3a5aa8e4
19 10月, 2012 1 次提交
- R
  
  fix crashes in static-linked multithreaded programs without TLS · ebee8c2b
  由 Rich Felker 提交于 10月 19, 2012
  
  ebee8c2b
16 10月, 2012 1 次提交

add support for TLS variant I, presently needed for arm and mips · 9ec4283b

由 Rich Felker 提交于 10月 15, 2012

despite documentation that makes it sound a lot different, the only
ABI-constraint difference between TLS variants II and I seems to be
that variant II stores the initial TLS segment immediately below the
thread pointer (i.e. the thread pointer points to the end of it) and
variant I stores the initial TLS segment above the thread pointer,
requiring the thread descriptor to be stored below. the actual value
stored in the thread pointer register also tends to have per-arch
random offsets applied to it for silly micro-optimization purposes.

with these changes applied, TLS should be basically working on all
supported archs except microblaze. I'm still working on getting the
necessary information and a working toolchain that can build TLS
binaries for microblaze, but in theory, static-linked programs with
TLS and dynamic-linked programs where only the main executable uses
TLS should already work on microblaze.

alignment constraints have not yet been heavily tested, so it's
possible that this code does not always align TLS segments correctly
on archs that need TLS variant I.

9ec4283b

08 10月, 2012 1 次提交

clean up and refactor program initialization · 0a96a37f

由 Rich Felker 提交于 10月 07, 2012

the code in __libc_start_main is now responsible for parsing auxv,
rather than duplicating the parsing all over the place. this should
shave off a few cycles and some code size. __init_libc is left as an
external-linkage function despite the fact that it could be static, to
prevent it from being inlined and permanently wasting stack space when
main is called.

a few other minor changes are included, like eliminating per-thread
ssp canaries (they were likely broken when combined with certain
dlopen usages, and completely unnecessary) and some other unnecessary
checks. since this code gets linked into every program, it should be
as small and simple as possible.

0a96a37f

07 10月, 2012 1 次提交
- R
  
  fix buggy TLS size/alignment computations in static-linked TLS · 6a2eaa3c
  由 Rich Felker 提交于 10月 06, 2012
  
  6a2eaa3c
05 10月, 2012 3 次提交

support for TLS in dynamic-loaded (dlopen) modules · dcd60371

由 Rich Felker 提交于 10月 05, 2012

unlike other implementations, this one reserves memory for new TLS in
all pre-existing threads at dlopen-time, and dlopen will fail with no
resources consumed and no new libraries loaded if memory is not
available. memory is not immediately distributed to running threads;
that would be too complex and too costly. instead, assurances are made
that threads needing the new TLS can obtain it in an async-signal-safe
way from a buffer belonging to the dynamic linker/new module (via
atomic fetch-and-add based allocator).

I've re-appropriated the lock that was previously used for __synccall
(synchronizing set*id() syscalls between threads) as a general
pthread_create lock. it's a "backwards" rwlock where the "read"
operation is safe atomic modification of the live thread count, which
multiple threads can perform at the same time, and the "write"
operation is making sure the count does not increase during an
operation that depends on it remaining bounded (__synccall or dlopen).
in static-linked programs that don't use __synccall, this lock is a
no-op and has no cost.

dcd60371

R
partial TLS support for dynamic-linked programs · bc6a35fb
由 Rich Felker 提交于 10月 04, 2012
```
only TLS in the main program is supported so far; TLS defined in
shared libraries will not work yet.
```
bc6a35fb

TLS (GNU/C11 thread-local storage) support for static-linked programs · 8431d797

由 Rich Felker 提交于 10月 04, 2012

the design for TLS in dynamic-linked programs is mostly complete too,
but I have not yet implemented it. cost is nonzero but still low for
programs which do not use TLS and/or do not use threads (a few hundred
bytes of new code, plus dependency on memcpy). i believe it can be
made smaller at some point by merging __init_tls and __init_security
into __libc_start_main and avoiding duplicate auxv-parsing code.

at the same time, I've also slightly changed the logic pthread_create
uses to allocate guard pages to ensure that guard pages are not
counted towards commit charge.

8431d797

OpenHarmony / Third Party Musl 11 个月 前同步成功

OpenHarmony / Third Party Musl
11 个月前同步成功