1. 16 6月, 2015 1 次提交
    • R
      fix btowc corner case · 38e2f727
      Rich Felker 提交于
      btowc is required to interpret its argument by conversion to unsigned
      char, unless the argument is equal to EOF. since the conversion to
      produces a non-character value anyway, we can just unconditionally
      convert, for now.
      38e2f727
  2. 22 4月, 2015 2 次提交
  3. 19 12月, 2014 1 次提交
  4. 16 11月, 2014 1 次提交
    • J
      implement a private state for the uchar.h functions · 941644e9
      Jens Gustedt 提交于
      The C standard is imperative on that:
      
        7.28.1 ... If ps is a null pointer, each function uses its own internal
        mbstate_t object instead, which is initialized at program startup to
        the initial conversion state;
      
      and these functions are also not supposed to implicitly use the state of
      the wchar.h functions:
      
        7.29.6.3 ... The implementation behaves as if no library function calls
        these functions with a null pointer for ps.
      
      Previously this resulted in two bugs.
      
       - The functions c16rtomb and mbrtoc16 would crash when called with ps
         set to null.
      
       - The function mbrtoc32 used the private state of mbrtowc, which it
         is not allowed to do.
      941644e9
  5. 14 10月, 2014 1 次提交
  6. 02 7月, 2014 1 次提交
    • R
      fix aliasing violations in mbtowc and mbrtowc · e89cfe51
      Rich Felker 提交于
      these functions were setting wc to point to wchar_t aliasing itself as
      a "cheap" way to support null wc arguments. doing so was anything but
      cheap, since even without the aliasing violation, it would limit the
      compiler's ability to optimize.
      
      making wc point to a dummy object is equally easy and does not suffer
      from the above problems.
      e89cfe51
  7. 03 6月, 2014 1 次提交
    • R
      fix incorrect end pointer in some cases when wcsrtombs stops early · 8fba4458
      Rich Felker 提交于
      when wcsrtombs stopped due to hitting zero remaining space in the
      output buffer, it was wrongly clearing the position pointer as if it
      had completed the conversion successfully.
      
      this commit rearranges the code somewhat to make a clear separation
      between the cases of ending due to running out of output buffer space,
      and ending due to reaching the end of input or an illegal sequence in
      the input. the new branches have been arranged with the hope of
      optimizing more common cases, too.
      8fba4458
  8. 12 12月, 2013 1 次提交
  9. 28 9月, 2013 1 次提交
    • R
      fix buffer overflow in mbsrtowcs · 211264e4
      Rich Felker 提交于
      issue reported by Michael Forney:
      
      "If wn becomes 0 after processing a chunk of 4, mbsrtowcs currently
      continues on, wrapping wn around to -1, causing the rest of the string
      to be processed.
      
      This resulted in buffer overruns if there was only space in ws for wn
      wide characters."
      
      the original patch submitted added an additional check for !wn after
      the loop; to avoid extra branching, I instead just changed the wn>=4
      check to wn>=5 to ensure that at least one slot remains after the
      word-at-a-time loop runs. this should not slow down the tail
      processing on real-world usage, since an extra slot that can't be
      processed in the word-at-a-time loop is needed for the null
      termination anyway.
      211264e4
  10. 30 6月, 2013 1 次提交
  11. 09 4月, 2013 4 次提交
    • R
      mbrtowc: do not leave mbstate_t in permanent-fail state after EILSEQ · 23ab8c25
      Rich Felker 提交于
      the standard is clear that the old behavior is conforming: "In this
      case, [EILSEQ] shall be stored in errno and the conversion state is
      undefined."
      
      however, the specification of mbrtowc has one peculiarity when the
      source argument is a null pointer: in this case, it's required to
      behave as mbrtowc(NULL, "", 1, ps). no motivation is provided for this
      requirement, but the natural one that comes to mind is that the intent
      is to reset the mbstate_t object. for stateful encodings, such
      behavior is actually specified: "If the corresponding wide character
      is the null wide character, the resulting state described shall be the
      initial conversion state." but in the case of UTF-8 where the
      mbstate_t object contains a partially-decoded character rather than a
      shift state, a subsequent '\0' byte indicates that the previous
      partial character is incomplete and thus an illegal sequence.
      
      naturally, applications using their own mbstate_t object should clear
      it themselves after an error, but the standard presently provides no
      way to clear the builtin mbstate_t object used when the ps argument is
      a null pointer. I suspect this issue may be addressed in the future by
      specifying that a null source argument resets the state, as this seems
      to have been the intent all along.
      
      for what it's worth, this change also slightly reduces code size.
      23ab8c25
    • R
      implement mbtowc directly, not as a wrapper for mbrtowc · ea34b1b9
      Rich Felker 提交于
      the interface contract for mbtowc admits a much faster implementation
      than mbrtowc can achieve; wrapping mbrtowc with an extra call frame
      only made the situation worse.
      
      since the regex implementation uses mbtowc already, this change should
      improve regex performance too. it may be possible to improve
      performance in other places internally by switching from mbrtowc to
      mbtowc.
      ea34b1b9
    • R
      optimize mbrtowc · a49e038b
      Rich Felker 提交于
      this simple change, in my measurements, makes about a 7% performance
      improvement. at first glance this change would seem like a
      compiler-specific hack, since the modified code is not even used.
      however, I suspect the reason is that I'm eliminating a second path
      into the main body of the code, allowing the compiler more flexibility
      to optimize the normal (hot) path into the main body. so even if it
      weren't for the measurable (and quite notable) difference in
      performance, I think the change makes sense.
      a49e038b
    • R
      fix out-of-bounds access in UTF-8 decoding · 8f06ab0e
      Rich Felker 提交于
      SA and SB are used as the lowest and highest valid starter bytes, but
      the value of SB was one-past the last valid starter. this caused
      access past the end of the state table when the illegal byte '\xf5'
      was encountered in a starter position. the error did not show up in
      full-character decoding tests, since the bogus state read from just
      past the table was unlikely to admit any continuation bytes as valid,
      but would have shown up had we tested feeding '\xf5' to the
      byte-at-a-time decoding in mbrtowc: it would cause the funtion to
      wrongly return -2 rather than -1.
      
      I may eventually go back and remove all references to SA and SB,
      replacing them with the values; this would make the code more
      transparent, I think. the original motivation for using macros was to
      allow misguided users of the code to redefine them for the purpose of
      enlarging the set of accepted sequences past the end of Unicode...
      8f06ab0e
  12. 05 4月, 2013 5 次提交
    • R
      cleanup wcstombs · 771c6cea
      Rich Felker 提交于
      remove redundant headers and comments; this file is completely trivial
      now. also, avoid temp var.
      771c6cea
    • R
      cleanup mbstowcs wrapper · b5a527f9
      Rich Felker 提交于
      remove unneeded headers. this file is utterly trivial now and there's
      no sense in having a comment to state that it's in the public domain.
      b5a527f9
    • R
      minor optimization to mbstowcs · f62b12d0
      Rich Felker 提交于
      there is no need to zero-fill an mbstate_t object in the caller;
      mbsrtowcs will automatically treat a null pointer as the initial
      state.
      f62b12d0
    • R
      fix incorrect range checks in wcsrtombs · 40b2b5fa
      Rich Felker 提交于
      negative values of wchar_t need to be treated in the non-ASCII case so
      that they can properly generate EILSEQ rather than getting truncated
      to 8bit values and stored in the output.
      40b2b5fa
    • R
      overhaul mbsrtowcs · 50d9661d
      Rich Felker 提交于
      these changes fix at least two bugs:
      - misaligned access to the input as uint32_t for vectorized ASCII test
      - incorrect src pointer after stopping on EILSEQ
      
      in addition, the text of the standard makes it unclear whether the
      mbstate_t object is to be modified when the destination pointer is
      null; previously it was cleared either way; now, it's only cleared
      when the destination is non-null. this change may need revisiting, but
      it should not affect most applications, since calling mbsrtowcs with
      non-zero state can only happen when the head of the string was already
      processed with mbrtowc.
      
      finally, these changes shave about 20% size off the function and seem
      to improve performance by 1-5%.
      50d9661d
  13. 07 9月, 2012 1 次提交
    • R
      use restrict everywhere it's required by c99 and/or posix 2008 · 400c5e5c
      Rich Felker 提交于
      to deal with the fact that the public headers may be used with pre-c99
      compilers, __restrict is used in place of restrict, and defined
      appropriately for any supported compiler. we also avoid the form
      [restrict] since older versions of gcc rejected it due to a bug in the
      original c99 standard, and instead use the form *restrict.
      400c5e5c
  14. 27 5月, 2012 1 次提交
  15. 03 5月, 2012 1 次提交
    • R
      fix longstanding exit logic bugs in mbsnrtowcs and wcsnrtombs · 485fb14a
      Rich Felker 提交于
      these are POSIX 2008 (previously GNU extension) functions that are
      rarely used. apparently they had never been tested before, since the
      end-of-string logic was completely missing. mbsnrtowcs is used by
      modern versions of bash for its glob implementation, and and this bug
      was causing tab completion to hang in an infinite loop.
      485fb14a
  16. 25 2月, 2012 2 次提交
  17. 24 2月, 2012 1 次提交
    • R
      cleanup and work around visibility bug in gcc 3 that affects x86_64 · bae2e52b
      Rich Felker 提交于
      in gcc 3, the visibility attribute must be placed on both the
      declaration and on the definition. if it's omitted from the
      definition, the compiler fails to emit the ".hidden" directive in the
      assembly, and the linker will either generate textrels (if supported,
      such as on i386) or refuse to link (on targets where certain types of
      textrels are forbidden or impossible without further assumptions about
      memory layout, such as on x86_64).
      
      this patch also unifies the decision about when to use visibility into
      libc.h and makes the visibility in the utf-8 state machine tables
      based on libc.h rather than a duplicate test.
      bae2e52b
  18. 26 3月, 2011 1 次提交
    • R
      fix all implicit conversion between signed/unsigned pointers · 9ae8d5fc
      Rich Felker 提交于
      sadly the C language does not specify any such implicit conversion, so
      this is not a matter of just fixing warnings (as gcc treats it) but
      actual errors. i would like to revisit a number of these changes and
      possibly revise the types used to reduce the number of casts required.
      9ae8d5fc
  19. 27 2月, 2011 1 次提交
    • R
      cleanup utf-8 multibyte code, use visibility if possible · 015d33c5
      Rich Felker 提交于
      this code was written independently of musl, with support for a the
      backwards, nonstandard "31-bit unicode" some libraries/apps might
      want. unfortunately the extra code (inside #ifdef) makes the source
      harder to read and makes code that should be simple look complex, so
      i'm removing it. anyone who wants to use the old code can find it in
      the history or from elsewhere.
      
      also, change the visibility of the __fsmu8 state machine table to
      hidden, if supported. this should improve performance slightly in
      shared-library builds.
      015d33c5
  20. 22 2月, 2011 1 次提交
  21. 14 2月, 2011 1 次提交
  22. 12 2月, 2011 1 次提交