1. 16 5月, 2015 1 次提交
    • R
      eliminate costly tricks to avoid TLS access for current locale state · 68630b55
      Rich Felker 提交于
      the code being removed used atomics to track whether any threads might
      be using a locale other than the current global locale, and whether
      any threads might have abstract 8-bit (non-UTF-8) LC_CTYPE active, a
      feature which was never committed (still pending). the motivations
      were to support early execution prior to setup of the thread pointer,
      to partially support systems (ancient kernels) where thread pointer
      setup is not possible, and to avoid high performance cost on archs
      where accessing the thread pointer may be very slow.
      
      since commit 19a1fe67, the thread
      pointer is always available, so these hacks are no longer needed.
      removing them greatly simplifies the affected code.
      68630b55
  2. 22 4月, 2015 1 次提交
    • R
      fix duplocale clobbering of new locale struct with memcpy of old · 873e0ec7
      Rich Felker 提交于
      when the non-stub duplocale code was added as part of the locale
      framework in commit 0bc03091, the old
      code to memcpy the old locale object to the new one was left behind.
      the conditional for the memcpy no longer makes sense, because the
      conditions are now always-true when it's reached, and the memcpy is
      wrong because it clobbers the new->messages_name pointer setup just
      above.
      
      since the messages_name and ctype_utf8 members have already been
      copied, all that remains is the cat[] array. these pointers are
      volatile, so using memcpy to copy them is formally wrong; use a for
      loop instead.
      873e0ec7
  3. 04 3月, 2015 1 次提交
    • R
      make all objects used with atomic operations volatile · 56fbaa3b
      Rich Felker 提交于
      the memory model we use internally for atomics permits plain loads of
      values which may be subject to concurrent modification without
      requiring that a special load function be used. since a compiler is
      free to make transformations that alter the number of loads or the way
      in which loads are performed, the compiler is theoretically free to
      break this usage. the most obvious concern is with atomic cas
      constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be
      transformed to a_cas(p,*p,f(*p)); where the latter is intended to show
      multiple loads of *p whose resulting values might fail to be equal;
      this would break the atomicity of the whole operation. but even more
      fundamental breakage is possible.
      
      with the changes being made now, objects that may be modified by
      atomics are modeled as volatile, and the atomic operations performed
      on them by other threads are modeled as asynchronous stores by
      hardware which happens to be acting on the request of another thread.
      such modeling of course does not itself address memory synchronization
      between cores/cpus, but that aspect was already handled. this all
      seems less than ideal, but it's the best we can do without mandating a
      C11 compiler and using the C11 model for atomics.
      
      in the case of pthread_once_t, the ABI type of the underlying object
      is not volatile-qualified. so we are assuming that accessing the
      object through a volatile-qualified lvalue via casts yields volatile
      access semantics. the language of the C standard is somewhat unclear
      on this matter, but this is an assumption the linux kernel also makes,
      and seems to be the correct interpretation of the standard.
      56fbaa3b
  4. 06 9月, 2014 1 次提交
  5. 13 8月, 2014 1 次提交
    • S
      add inline isspace in ctype.h as an optimization · b04971d9
      Szabolcs Nagy 提交于
      isspace can be a bottleneck in a simple parser, inlining it
      gives slightly smaller and faster code
      
      src/locale/pleval.o already had this optimization, the size
      change for other libc functions for i386 is
      
      src/internal/intscan.o     2134    2118   -16
      src/locale/dcngettext.o    1562    1552   -10
      src/network/res_msend.o    1961    1940   -21
      src/network/lookup_name.o  2627    2608   -19
      src/network/getnameinfo.o  1814    1811    -3
      src/network/lookup_serv.o   643     624   -19
      src/stdio/vfscanf.o        2675    2663   -12
      src/stdlib/atoll.o          117     107   -10
      src/stdlib/atoi.o            95      91    -4
      src/stdlib/atol.o            95      91    -4
      src/time/strptime.o        1515    1503   -12
      (TOTALS)                 432451  432321  -130
      b04971d9
  6. 01 8月, 2014 1 次提交
    • R
      harden locale name handling and prevent slashes in LC_MESSAGES · 5059deb1
      Rich Felker 提交于
      the code which loads locale files was already rejecting locale names
      containing slashes. however, LC_MESSAGES records a locale name even if
      libc does not have a matching locale file, so that gettext or
      application code can use the recorded locale name for message
      translations to languages that libc does not support. this recorded
      name was not being checked for slashes, meaning that such code could
      potentially be tricked into directory traversal.
      
      in addition, since the value of a locale category is sometimes used as
      a pathname component by callers, the improved code rejects any value
      beginning with a dot. this prevents traversal to the parent directory
      via "..", use of the top-level locale directory via ".", and also
      avoids "hidden" directories as a side effect.
      
      finally, overly long locale names are now rejected (treated as an
      unrecognized name and thus as an alias for C.UTF-8) rather than being
      truncated.
      5059deb1
  7. 31 7月, 2014 1 次提交
    • S
      plural rule evaluator rewrite for dcngettext · 6527b03d
      Szabolcs Nagy 提交于
      using an operator precedence parser the code size
      became smaller and it is only slower by about %10
      
      size of old vs new pleval.o on different archs:
      (with inlined isspace added to pleval.c for now)
      
      old:
         text    data     bss     dec     hex filename
          828       0       0     828     33c pl.i386.o
         1152       0       0    1152     480 pl.arm.o
         1704       0       0    1704     6a8 pl.mips.o
         1328       0       0    1328     530 pl.ppc.o
          992       0       0     992     3e0 pl.x64.o
      new:
         text    data     bss     dec     hex filename
          693       0       0     693     2b5 pl.i386.o
          972       0       0     972     3cc pl.arm.o
         1276       0       0    1276     4fc pl.mips.o
         1087       0       0    1087     43f pl.ppc.o
          846       0       0     846     34e pl.x64.o
      6527b03d
  8. 30 7月, 2014 2 次提交
    • S
      tweaks to plural rules evaluator · a126188f
      Szabolcs Nagy 提交于
      const parsing, depth accounting and failure handling was changed
      a bit so the generated code is slightly smaller.
      a126188f
    • R
      harden dcngettext plural processing · e4dd0ab8
      Rich Felker 提交于
      while the __mo_lookup backend can verify that the translated message
      ends with a null terminator, is has no way to know nplurals and thus
      no way to verify that sufficiently many null terminators are present
      in the string to satisfy all plural forms. the code in dcngettext was
      already attempting to avoid reading past the end of the mo file
      mapping, but failed to do so because the strlen call itself could
      over-read. using strnlen instead allows us to avoid the problem.
      e4dd0ab8
  9. 29 7月, 2014 2 次提交
    • R
      harden mo file processing for locale/translations · 6e892106
      Rich Felker 提交于
      rather than just checking that the start of the string lies within the
      mapping, also check that the nominal length remains within the
      mapping, and that the null terminator is present at the nominal
      length. this ensures that the caller, using the result as a C string,
      will not read past the end of the mapping.
      
      the nominal length is never exposed to the caller, but it's useful
      internally to find where the null terminator should be without having
      to restort to linear search via strnlen/memchr.
      6e892106
    • R
      implement non-default plural rules for ngettext translations · 73d2a3bf
      Rich Felker 提交于
      the new code in dcngettext was written by me, and the expression
      evaluator by Szabolcs Nagy (nsz).
      73d2a3bf
  10. 27 7月, 2014 1 次提交
    • R
      implement gettext message translation functions · 2068b4e8
      Rich Felker 提交于
      this commit replaces the stub implementations with working message
      translation functions. translation units are factored so as to prevent
      pulling in the legacy, non-library-safe functions which use a global
      textdomain in modern code which is using the versions with an explicit
      domain argument. bind_textdomain_codeset is also placed in its own
      file since it should not be needed by most programs.
      
      this implementation is still missing some features: the LANGUAGE
      environment variable (for multiple fallback languages) is not honored,
      and non-default plural-form rules are not supported. these issues will
      be addressed in a later commit.
      
      one notable difference from the GNU implementation is that there is no
      default path for loading translation files. in principle one could be
      added, but since the documented correct usage is to call the
      bindtextdomain function, a default path is probably unnecessary.
      2068b4e8
  11. 26 7月, 2014 4 次提交
    • R
      add support for LC_TIME and LC_MESSAGES translations · c5b8f193
      Rich Felker 提交于
      for LC_MESSAGES, translation of strerror and similar literal message
      functions is supported. for messages in other places (particularly the
      dynamic linker) that use format strings, translation is not yet
      supported. in order to make it possible and safe, such messages will
      need to be refactored to separate the textual content from the format.
      
      for LC_TIME, the day and month names and strftime-style format strings
      provided by nl_langinfo are supported for translation. however there
      may be limitations, as some of the original C-locale nl_langinfo
      strings are non-unique and thus perhaps non-suitable as keys.
      
      overall, the locale support activated by this commit should not be
      seen as complete and polished but as a basis for beginning to test
      locale functionality and implement locales.
      c5b8f193
    • R
      add missing yes/no strings to nl_langinfo · 0206f596
      Rich Felker 提交于
      these were removed from the standard but still offered as an extension
      in langinfo.h, so nl_langinfo should support them.
      0206f596
    • R
      fix nl_langinfo table for LC_TIME era-related items · a19cd2b6
      Rich Felker 提交于
      due to a skipped slot and missing null terminator, the last few
      strings were off by one or two slots from their item codes.
      a19cd2b6
    • R
      implement mo file string lookup for translations · 41421d6b
      Rich Felker 提交于
      the core is based on a binary search; hash table is not used. both
      native and reverse-endian mo files are supported. all offsets read
      from the mapped mo file are checked against the mapping size to
      prevent the possibility of reads outside the mapping.
      
      this commit has no observable effects since there are not yet any
      callers to the message translation code.
      41421d6b
  12. 24 7月, 2014 2 次提交
  13. 03 7月, 2014 4 次提交
    • R
      properly pass current locale to *_l functions when used internally · 4c48501e
      Rich Felker 提交于
      this change is presently non-functional since the callees do not yet
      use their locale argument for anything.
      4c48501e
    • R
      consolidate str[n]casecmp_l into str[n]casecmp source files · 7424ac58
      Rich Felker 提交于
      this is mainly done for consistency with the ctype functions and to
      declutter the src/locale directory.
      7424ac58
    • R
      consolidate *_l ctype/wctype functions into their non-_l source files · d89fdec5
      Rich Felker 提交于
      the main practical purposes of this commit are to remove a huge amount
      of clutter from the src/locale directory, to cut down on the length of
      the $(AR) and $(LD) command lines, and to reduce the amount of space
      wasted by object file headers in the static libc.a. build time may
      also be reduced, though this has not been measured.
      
      as an additional justification, if there ever were a need for the
      behavior of these functions to vary by locale, it would be necessary
      for the non-_l versions to call the _l versions, so that linking the
      former without the latter would not be possible anyway.
      d89fdec5
    • R
      add locale framework · 0bc03091
      Rich Felker 提交于
      this commit adds non-stub implementations of setlocale, duplocale,
      newlocale, and uselocale, along with the data structures and minimal
      code needed for representing the active locale on a per-thread basis
      and optimizing the common case where thread-local locale settings are
      not in use.
      
      at this point, the data structures only contain what is necessary to
      represent LC_CTYPE (a single flag) and LC_MESSAGES (a name for use in
      finding message translation files). representation for the other
      categories will be added later; the expectation is that a single
      pointer will suffice for each.
      
      for LC_CTYPE, the strings "C" and "POSIX" are treated as special; any
      other string is accepted and treated as "C.UTF-8". for other
      categories, any string is accepted after being truncated to a maximum
      supported length (currently 15 bytes). for LC_MESSAGES, the name is
      kept regardless of whether libc itself can use such a message
      translation locale, since applications using catgets or gettext should
      be able to use message locales libc is not aware of. for other
      categories, names which are not successfully loaded as locales (which,
      at present, means all names) are treated as aliases for "C". setlocale
      never fails.
      
      locale settings are not yet used anywhere, so this commit should have
      no visible effects except for the contents of the string returned by
      setlocale.
      0bc03091
  14. 10 6月, 2014 1 次提交
    • R
      replace all remaining internal uses of pthread_self with __pthread_self · df15168c
      Rich Felker 提交于
      prior to version 1.1.0, the difference between pthread_self (the
      public function) and __pthread_self (the internal macro or inline
      function) was that the former would lazily initialize the thread
      pointer if it was not already initialized, whereas the latter would
      crash in this case. since lazy initialization is no longer supported,
      use of pthread_self no longer makes sense; it simply generates larger,
      slower code.
      df15168c
  15. 14 5月, 2014 1 次提交
    • R
      add cp437 and cp850 to available iconv conversions · 8a2d8719
      Rich Felker 提交于
      perhaps some additional legacy DOS-era codepages would also be useful
      to have, but these are the ones for which there has been demand. the
      size of the diff is due to the fact that legacychars.h is updated in
      such a way that new characters are inserted into the table in unicode
      codepoint order; thus other mappings in codepages.h have changed to
      reflect the new table indices of their characters.
      8a2d8719
  16. 23 1月, 2014 1 次提交
  17. 12 12月, 2013 1 次提交
  18. 26 11月, 2013 1 次提交
  19. 18 8月, 2013 2 次提交
    • R
      37c25065
    • R
      add hkscs/big5-2003/eten extensions to iconv big5 · 109bd65a
      Rich Felker 提交于
      with these changes, the character set implemented as "big5" in musl is
      a pure superset of cp950, the canonical "big5", and agrees with the
      normative parts of Unicode. this means it has minor differences from
      both hkscs and big5-2003:
      
      - the range A2CC-A2CE maps to CJK ideographs rather than numerals,
        contrary to changes made in big5-2003.
      
      - C6CD maps to a CJK ideograph rather than its corresponding Kangxi
        radical character, contrary to changes made in hkscs.
      
      - F9FE maps to U+2593 rather than U+FFED.
      
      of these differences, none but the last are visually distinct, and the
      last is a character used purely for text-based graphics, not to convey
      linguistic content.
      
      should there be future demand for strict conformance to big5-2003 or
      hkscs mappings, the present charset aliases can be replaced with
      distinct variants.
      
      reportedly there are other non-standard big5 extensions in common use
      in Taiwan and perhaps elsewhere, which could also be added as layers
      on top of the existing big5 support.
      
      there may be additional characters which should be added to the hkscs
      table: the whatwg standard for big5 defines what appears to be a
      superset of hkscs.
      109bd65a
  20. 08 8月, 2013 1 次提交
    • R
      add Big5 charset support to iconv · 19b4a0a2
      Rich Felker 提交于
      at this point, it is just the common base charset equivalent to
      Windows CP 950, with no further extensions. HKSCS and possibly other
      supersets will be added later. other aliases may need to be added too.
      19b4a0a2
  21. 06 8月, 2013 1 次提交
    • R
      iconv support for legacy Korean encodings · 734062b2
      Rich Felker 提交于
      like for other character sets, stateful iso-2022 form is not supported
      yet but everything else should work. all charset aliases are treated
      the same, as Windows codepage 949, because reportedly the EUC-KR
      charset name is in widespread (mis?)usage in email and on the web for
      data which actually uses the extended characters outside the standard
      93x94 grid. this could easily be changed if desired.
      
      the principle of this converter for handling the giant bulk of rare
      Hangul syllables outside of the standard KS X 1001 93x94 grid is the
      same as the GB18030 converter's treatment of non-explicitly-coded
      Unicode codepoints: sequences in the extension range are mapped to an
      integer index N, and the converter explicitly computes the Nth Hangul
      syllable not explicitly encoded in the character map. empirically,
      this requires at most 7 passes over the grid. this approach reduces
      the table size required for Korean legacy encodings from roughly 44k
      to 17k and should have minimal performance impact on real-world text
      conversions since the "slow" characters are rare. where it does have
      impact, the cost is merely a large constant time factor.
      734062b2
  22. 28 7月, 2013 1 次提交
    • R
      fix semantically incorrect use of LC_GLOBAL_LOCALE · 1ae4bc42
      Rich Felker 提交于
      LC_GLOBAL_LOCALE refers to the global locale, controlled by setlocale,
      not the thread-local locale in effect which these functions should be
      using. neither LC_GLOBAL_LOCALE nor 0 has an argument to the *_l
      functions has behavior defined by the standard, but 0 is a more
      logical choice for requesting the callee to lookup the current locale.
      in the future I may move the current locale lookup the the caller (the
      non-_l-suffixed wrapper).
      
      at this point, all of the locale logic is dummied out, so no harm was
      done, but it should at least avoid misleading usage.
      1ae4bc42
  23. 25 7月, 2013 5 次提交
  24. 27 6月, 2013 1 次提交
  25. 07 9月, 2012 1 次提交
    • R
      use restrict everywhere it's required by c99 and/or posix 2008 · 400c5e5c
      Rich Felker 提交于
      to deal with the fact that the public headers may be used with pre-c99
      compilers, __restrict is used in place of restrict, and defined
      appropriately for any supported compiler. we also avoid the form
      [restrict] since older versions of gcc rejected it due to a bug in the
      original c99 standard, and instead use the form *restrict.
      400c5e5c
  26. 21 6月, 2012 1 次提交