提交 · 68630b55c0c7219fe9df70dc28ffbf9efc8021d8 · OpenHarmony / Third Party Musl

16 5月, 2015 1 次提交

eliminate costly tricks to avoid TLS access for current locale state · 68630b55

由 Rich Felker 提交于 5月 16, 2015

the code being removed used atomics to track whether any threads might
be using a locale other than the current global locale, and whether
any threads might have abstract 8-bit (non-UTF-8) LC_CTYPE active, a
feature which was never committed (still pending). the motivations
were to support early execution prior to setup of the thread pointer,
to partially support systems (ancient kernels) where thread pointer
setup is not possible, and to avoid high performance cost on archs
where accessing the thread pointer may be very slow.

since commit 19a1fe67, the thread
pointer is always available, so these hacks are no longer needed.
removing them greatly simplifies the affected code.

68630b55

22 4月, 2015 1 次提交

fix duplocale clobbering of new locale struct with memcpy of old · 873e0ec7

由 Rich Felker 提交于 4月 21, 2015

when the non-stub duplocale code was added as part of the locale
framework in commit 0bc03091, the old
code to memcpy the old locale object to the new one was left behind.
the conditional for the memcpy no longer makes sense, because the
conditions are now always-true when it's reached, and the memcpy is
wrong because it clobbers the new->messages_name pointer setup just
above.

since the messages_name and ctype_utf8 members have already been
copied, all that remains is the cat[] array. these pointers are
volatile, so using memcpy to copy them is formally wrong; use a for
loop instead.

873e0ec7

04 3月, 2015 1 次提交

make all objects used with atomic operations volatile · 56fbaa3b

由 Rich Felker 提交于 3月 03, 2015

the memory model we use internally for atomics permits plain loads of
values which may be subject to concurrent modification without
requiring that a special load function be used. since a compiler is
free to make transformations that alter the number of loads or the way
in which loads are performed, the compiler is theoretically free to
break this usage. the most obvious concern is with atomic cas
constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be
transformed to a_cas(p,*p,f(*p)); where the latter is intended to show
multiple loads of *p whose resulting values might fail to be equal;
this would break the atomicity of the whole operation. but even more
fundamental breakage is possible.

with the changes being made now, objects that may be modified by
atomics are modeled as volatile, and the atomic operations performed
on them by other threads are modeled as asynchronous stores by
hardware which happens to be acting on the request of another thread.
such modeling of course does not itself address memory synchronization
between cores/cpus, but that aspect was already handled. this all
seems less than ideal, but it's the best we can do without mandating a
C11 compiler and using the C11 model for atomics.

in the case of pthread_once_t, the ABI type of the underlying object
is not volatile-qualified. so we are assuming that accessing the
object through a volatile-qualified lvalue via casts yields volatile
access semantics. the language of the C standard is somewhat unclear
on this matter, but this is an assumption the linux kernel also makes,
and seems to be the correct interpretation of the standard.

56fbaa3b

06 9月, 2014 1 次提交
- R
  
  fix non-static dummy function that slipped in with locale implementation · 86876dbe
  由 Rich Felker 提交于 9月 06, 2014
  
  86876dbe
13 8月, 2014 1 次提交

add inline isspace in ctype.h as an optimization · b04971d9

由 Szabolcs Nagy 提交于 8月 13, 2014

isspace can be a bottleneck in a simple parser, inlining it
gives slightly smaller and faster code

src/locale/pleval.o already had this optimization, the size
change for other libc functions for i386 is

src/internal/intscan.o     2134    2118   -16
src/locale/dcngettext.o    1562    1552   -10
src/network/res_msend.o    1961    1940   -21
src/network/lookup_name.o  2627    2608   -19
src/network/getnameinfo.o  1814    1811    -3
src/network/lookup_serv.o   643     624   -19
src/stdio/vfscanf.o        2675    2663   -12
src/stdlib/atoll.o          117     107   -10
src/stdlib/atoi.o            95      91    -4
src/stdlib/atol.o            95      91    -4
src/time/strptime.o        1515    1503   -12
(TOTALS)                 432451  432321  -130

b04971d9

01 8月, 2014 1 次提交

harden locale name handling and prevent slashes in LC_MESSAGES · 5059deb1

由 Rich Felker 提交于 7月 31, 2014

the code which loads locale files was already rejecting locale names
containing slashes. however, LC_MESSAGES records a locale name even if
libc does not have a matching locale file, so that gettext or
application code can use the recorded locale name for message
translations to languages that libc does not support. this recorded
name was not being checked for slashes, meaning that such code could
potentially be tricked into directory traversal.

in addition, since the value of a locale category is sometimes used as
a pathname component by callers, the improved code rejects any value
beginning with a dot. this prevents traversal to the parent directory
via "..", use of the top-level locale directory via ".", and also
avoids "hidden" directories as a side effect.

finally, overly long locale names are now rejected (treated as an
unrecognized name and thus as an alias for C.UTF-8) rather than being
truncated.

5059deb1

31 7月, 2014 1 次提交

plural rule evaluator rewrite for dcngettext · 6527b03d

由 Szabolcs Nagy 提交于 7月 30, 2014

using an operator precedence parser the code size
became smaller and it is only slower by about %10

size of old vs new pleval.o on different archs:
(with inlined isspace added to pleval.c for now)

old:
   text    data     bss     dec     hex filename
    828       0       0     828     33c pl.i386.o
   1152       0       0    1152     480 pl.arm.o
   1704       0       0    1704     6a8 pl.mips.o
   1328       0       0    1328     530 pl.ppc.o
    992       0       0     992     3e0 pl.x64.o
new:
   text    data     bss     dec     hex filename
    693       0       0     693     2b5 pl.i386.o
    972       0       0     972     3cc pl.arm.o
   1276       0       0    1276     4fc pl.mips.o
   1087       0       0    1087     43f pl.ppc.o
    846       0       0     846     34e pl.x64.o

6527b03d

30 7月, 2014 2 次提交

tweaks to plural rules evaluator · a126188f

由 Szabolcs Nagy 提交于 7月 29, 2014

const parsing, depth accounting and failure handling was changed
a bit so the generated code is slightly smaller.

a126188f

harden dcngettext plural processing · e4dd0ab8

由 Rich Felker 提交于 7月 29, 2014

while the __mo_lookup backend can verify that the translated message
ends with a null terminator, is has no way to know nplurals and thus
no way to verify that sufficiently many null terminators are present
in the string to satisfy all plural forms. the code in dcngettext was
already attempting to avoid reading past the end of the mo file
mapping, but failed to do so because the strlen call itself could
over-read. using strnlen instead allows us to avoid the problem.

e4dd0ab8

29 7月, 2014 2 次提交

harden mo file processing for locale/translations · 6e892106

由 Rich Felker 提交于 7月 29, 2014

rather than just checking that the start of the string lies within the
mapping, also check that the nominal length remains within the
mapping, and that the null terminator is present at the nominal
length. this ensures that the caller, using the result as a C string,
will not read past the end of the mapping.

the nominal length is never exposed to the caller, but it's useful
internally to find where the null terminator should be without having
to restort to linear search via strnlen/memchr.

6e892106

R
implement non-default plural rules for ngettext translations · 73d2a3bf
由 Rich Felker 提交于 7月 28, 2014
```
the new code in dcngettext was written by me, and the expression
evaluator by Szabolcs Nagy (nsz).
```
73d2a3bf

27 7月, 2014 1 次提交

implement gettext message translation functions · 2068b4e8

由 Rich Felker 提交于 7月 27, 2014

this commit replaces the stub implementations with working message
translation functions. translation units are factored so as to prevent
pulling in the legacy, non-library-safe functions which use a global
textdomain in modern code which is using the versions with an explicit
domain argument. bind_textdomain_codeset is also placed in its own
file since it should not be needed by most programs.

this implementation is still missing some features: the LANGUAGE
environment variable (for multiple fallback languages) is not honored,
and non-default plural-form rules are not supported. these issues will
be addressed in a later commit.

one notable difference from the GNU implementation is that there is no
default path for loading translation files. in principle one could be
added, but since the documented correct usage is to call the
bindtextdomain function, a default path is probably unnecessary.

2068b4e8

26 7月, 2014 4 次提交

add support for LC_TIME and LC_MESSAGES translations · c5b8f193

由 Rich Felker 提交于 7月 26, 2014

for LC_MESSAGES, translation of strerror and similar literal message
functions is supported. for messages in other places (particularly the
dynamic linker) that use format strings, translation is not yet
supported. in order to make it possible and safe, such messages will
need to be refactored to separate the textual content from the format.

for LC_TIME, the day and month names and strftime-style format strings
provided by nl_langinfo are supported for translation. however there
may be limitations, as some of the original C-locale nl_langinfo
strings are non-unique and thus perhaps non-suitable as keys.

overall, the locale support activated by this commit should not be
seen as complete and polished but as a basis for beginning to test
locale functionality and implement locales.

c5b8f193

add missing yes/no strings to nl_langinfo · 0206f596

由 Rich Felker 提交于 7月 26, 2014

these were removed from the standard but still offered as an extension
in langinfo.h, so nl_langinfo should support them.

0206f596

fix nl_langinfo table for LC_TIME era-related items · a19cd2b6

由 Rich Felker 提交于 7月 26, 2014

due to a skipped slot and missing null terminator, the last few
strings were off by one or two slots from their item codes.

a19cd2b6

implement mo file string lookup for translations · 41421d6b

由 Rich Felker 提交于 7月 26, 2014

the core is based on a binary search; hash table is not used. both
native and reverse-endian mo files are supported. all offsets read
from the mapped mo file are checked against the mapping size to
prevent the possibility of reads outside the mapping.

this commit has no observable effects since there are not yet any
callers to the message translation code.

41421d6b

24 7月, 2014 2 次提交

implement locale file loading and state for remaining locale categories · 6cb4f91d

由 Rich Felker 提交于 7月 24, 2014

there is still no code which actually uses the loaded locale files, so
the main observable effect of this commit is that calls to setlocale
store and give back the names of the selected locales for the
remaining categories (LC_TIME, LC_COLLATE, LC_MONETARY) if a locale
file by the requested name could be loaded.

6cb4f91d

fix locale environment variable logic for empty strings · 674e28af

由 Rich Felker 提交于 7月 24, 2014

per POSIX (XBD 8.2) LC_*/LANG environment variables set to to the
empty string are supposed to be treated as if they were not set at
all.

674e28af

03 7月, 2014 4 次提交

R
properly pass current locale to *_l functions when used internally · 4c48501e
由 Rich Felker 提交于 7月 02, 2014
```
this change is presently non-functional since the callees do not yet
use their locale argument for anything.
```
4c48501e
R
consolidate str[n]casecmp_l into str[n]casecmp source files · 7424ac58
由 Rich Felker 提交于 7月 02, 2014
```
this is mainly done for consistency with the ctype functions and to
declutter the src/locale directory.
```
7424ac58

consolidate *_l ctype/wctype functions into their non-_l source files · d89fdec5

由 Rich Felker 提交于 7月 02, 2014

the main practical purposes of this commit are to remove a huge amount
of clutter from the src/locale directory, to cut down on the length of
the $(AR) and $(LD) command lines, and to reduce the amount of space
wasted by object file headers in the static libc.a. build time may
also be reduced, though this has not been measured.

as an additional justification, if there ever were a need for the
behavior of these functions to vary by locale, it would be necessary
for the non-_l versions to call the _l versions, so that linking the
former without the latter would not be possible anyway.

d89fdec5

add locale framework · 0bc03091

由 Rich Felker 提交于 7月 02, 2014

this commit adds non-stub implementations of setlocale, duplocale,
newlocale, and uselocale, along with the data structures and minimal
code needed for representing the active locale on a per-thread basis
and optimizing the common case where thread-local locale settings are
not in use.

at this point, the data structures only contain what is necessary to
represent LC_CTYPE (a single flag) and LC_MESSAGES (a name for use in
finding message translation files). representation for the other
categories will be added later; the expectation is that a single
pointer will suffice for each.

for LC_CTYPE, the strings "C" and "POSIX" are treated as special; any
other string is accepted and treated as "C.UTF-8". for other
categories, any string is accepted after being truncated to a maximum
supported length (currently 15 bytes). for LC_MESSAGES, the name is
kept regardless of whether libc itself can use such a message
translation locale, since applications using catgets or gettext should
be able to use message locales libc is not aware of. for other
categories, names which are not successfully loaded as locales (which,
at present, means all names) are treated as aliases for "C". setlocale
never fails.

locale settings are not yet used anywhere, so this commit should have
no visible effects except for the contents of the string returned by
setlocale.

0bc03091

10 6月, 2014 1 次提交

replace all remaining internal uses of pthread_self with __pthread_self · df15168c

由 Rich Felker 提交于 6月 10, 2014

prior to version 1.1.0, the difference between pthread_self (the
public function) and __pthread_self (the internal macro or inline
function) was that the former would lazily initialize the thread
pointer if it was not already initialized, whereas the latter would
crash in this case. since lazy initialization is no longer supported,
use of pthread_self no longer makes sense; it simply generates larger,
slower code.

df15168c

14 5月, 2014 1 次提交

add cp437 and cp850 to available iconv conversions · 8a2d8719

由 Rich Felker 提交于 5月 13, 2014

perhaps some additional legacy DOS-era codepages would also be useful
to have, but these are the ones for which there has been demand. the
size of the diff is due to the fact that legacychars.h is updated in
such a way that new characters are inserted into the table in unicode
codepoint order; thus other mappings in codepages.h have changed to
reflect the new table indices of their characters.

8a2d8719

23 1月, 2014 1 次提交
- S
  fix an overflow in wcsxfrm when n==0 · f1471d32
  由 Szabolcs Nagy 提交于 1月 23, 2014
```
posix allows zero length destination
```
  f1471d32
12 12月, 2013 1 次提交
- S
  
  include cleanups: remove unused headers and add feature test macros · 57174444
  由 Szabolcs Nagy 提交于 12月 12, 2013
  
  57174444
26 11月, 2013 1 次提交
- S
  
  remove duplicate includes from dynlink.c, strfmon.c and getaddrinfo.c · 2b1f2f14
  由 Szabolcs Nagy 提交于 11月 25, 2013
  
  2b1f2f14
18 8月, 2013 2 次提交

R

remove spurious tmp file present since initial git check-in · 37c25065
由 Rich Felker 提交于 8月 17, 2013

37c25065

add hkscs/big5-2003/eten extensions to iconv big5 · 109bd65a

由 Rich Felker 提交于 8月 17, 2013

with these changes, the character set implemented as "big5" in musl is
a pure superset of cp950, the canonical "big5", and agrees with the
normative parts of Unicode. this means it has minor differences from
both hkscs and big5-2003:

- the range A2CC-A2CE maps to CJK ideographs rather than numerals,
  contrary to changes made in big5-2003.

- C6CD maps to a CJK ideograph rather than its corresponding Kangxi
  radical character, contrary to changes made in hkscs.

- F9FE maps to U+2593 rather than U+FFED.

of these differences, none but the last are visually distinct, and the
last is a character used purely for text-based graphics, not to convey
linguistic content.

should there be future demand for strict conformance to big5-2003 or
hkscs mappings, the present charset aliases can be replaced with
distinct variants.

reportedly there are other non-standard big5 extensions in common use
in Taiwan and perhaps elsewhere, which could also be added as layers
on top of the existing big5 support.

there may be additional characters which should be added to the hkscs
table: the whatwg standard for big5 defines what appears to be a
superset of hkscs.

109bd65a

08 8月, 2013 1 次提交

add Big5 charset support to iconv · 19b4a0a2

由 Rich Felker 提交于 8月 07, 2013

at this point, it is just the common base charset equivalent to
Windows CP 950, with no further extensions. HKSCS and possibly other
supersets will be added later. other aliases may need to be added too.

19b4a0a2

06 8月, 2013 1 次提交

iconv support for legacy Korean encodings · 734062b2

由 Rich Felker 提交于 8月 05, 2013

like for other character sets, stateful iso-2022 form is not supported
yet but everything else should work. all charset aliases are treated
the same, as Windows codepage 949, because reportedly the EUC-KR
charset name is in widespread (mis?)usage in email and on the web for
data which actually uses the extended characters outside the standard
93x94 grid. this could easily be changed if desired.

the principle of this converter for handling the giant bulk of rare
Hangul syllables outside of the standard KS X 1001 93x94 grid is the
same as the GB18030 converter's treatment of non-explicitly-coded
Unicode codepoints: sequences in the extension range are mapped to an
integer index N, and the converter explicitly computes the Nth Hangul
syllable not explicitly encoded in the character map. empirically,
this requires at most 7 passes over the grid. this approach reduces
the table size required for Korean legacy encodings from roughly 44k
to 17k and should have minimal performance impact on real-world text
conversions since the "slow" characters are rare. where it does have
impact, the cost is merely a large constant time factor.

734062b2

28 7月, 2013 1 次提交

fix semantically incorrect use of LC_GLOBAL_LOCALE · 1ae4bc42

由 Rich Felker 提交于 7月 28, 2013

LC_GLOBAL_LOCALE refers to the global locale, controlled by setlocale,
not the thread-local locale in effect which these functions should be
using. neither LC_GLOBAL_LOCALE nor 0 has an argument to the *_l
functions has behavior defined by the standard, but 0 is a more
logical choice for requesting the callee to lookup the current locale.
in the future I may move the current locale lookup the the caller (the
non-_l-suffixed wrapper).

at this point, all of the locale logic is dummied out, so no harm was
done, but it should at least avoid misleading usage.

1ae4bc42

25 7月, 2013 5 次提交
- R
  
  rework langinfo code for ABI compat and for use by time code · 87be54a1
  由 Rich Felker 提交于 7月 24, 2013
  
  87be54a1
- R
  
  update strxfrm/wcsxfrm for future LC_COLLATE support and ABI compat · ad4a5367
  由 Rich Felker 提交于 7月 24, 2013
  
  ad4a5367
- R
  
  add ABI compat aliases for a number of locale_t functions · 4350935c
  由 Rich Felker 提交于 7月 24, 2013
  
  4350935c
- R
  
  prepare strcoll/wcscoll for LC_COLLATE support and add ABI symbols · 4b0306c8
  由 Rich Felker 提交于 7月 24, 2013
  
  4b0306c8
- R
  move strftime_l into strftime.c and add __-prefixed version · 0a37d995
  由 Rich Felker 提交于 7月 24, 2013
```
the latter is both for ABI purposes, and to facilitate eventually
adding LC_TIME support. it's also nice to eliminate an extra source
file.
```
  0a37d995
27 6月, 2013 1 次提交

fix iconv conversion to legacy 8bit codepages · 6a4cfbdb

由 Rich Felker 提交于 6月 26, 2013

this seems to have been a simple copy-and-paste error from the code
for converting from legacy codepages.

6a4cfbdb

07 9月, 2012 1 次提交

use restrict everywhere it's required by c99 and/or posix 2008 · 400c5e5c

由 Rich Felker 提交于 9月 06, 2012

to deal with the fact that the public headers may be used with pre-c99
compilers, __restrict is used in place of restrict, and defined
appropriately for any supported compiler. we also avoid the form
[restrict] since older versions of gcc rejected it due to a bug in the
original c99 standard, and instead use the form *restrict.

400c5e5c

21 6月, 2012 1 次提交

duplocale: don't crash when called with LC_GLOBAL_LOCALE · b3d7d062

由 Rich Felker 提交于 6月 20, 2012

posix has resolved to add this usage; for now, we just avoid writing
anything to the new locale object since it's not used anyway.

b3d7d062

OpenHarmony / Third Party Musl 11 个月 前同步成功

OpenHarmony / Third Party Musl
11 个月前同步成功