提交 · 38e2f727237230300fea6aff68802db04625fd23 · OpenHarmony / Third Party Musl

16 6月, 2015 1 次提交

由 Rich Felker 提交于 6月 16, 2015

btowc is required to interpret its argument by conversion to unsigned
char, unless the argument is equal to EOF. since the conversion to
produces a non-character value anyway, we can just unconditionally
convert, for now.

38e2f727

22 4月, 2015 2 次提交

R

remove libc.h dependency from otherwise-independent multibyte code · 9d836f44
由 Rich Felker 提交于 4月 22, 2015

9d836f44

remove cruft for libc struct accessor function and broken visibility · f9cccfc1

由 Rich Felker 提交于 4月 22, 2015

these were hacks to work around toolchains that could not properly
optimize PIC accesses based on visibility and would generate GOT
lookups even for hidden data, which broke the old dynamic linker.
since commit f3ddd173 it no longer
matters; the dynamic linker does not assume accessibility of this data
until stage 3.

f9cccfc1

19 12月, 2014 1 次提交
- R
  fix return value computation in one code path of wcsnrtombs · 2e1ae3b6
  由 Rich Felker 提交于 12月 18, 2014
```
the affected code was wrongly counting characters instead of bytes.
```
  2e1ae3b6
16 11月, 2014 1 次提交

implement a private state for the uchar.h functions · 941644e9

由 Jens Gustedt 提交于 11月 09, 2014

The C standard is imperative on that:

  7.28.1 ... If ps is a null pointer, each function uses its own internal
  mbstate_t object instead, which is initialized at program startup to
  the initial conversion state;

and these functions are also not supposed to implicitly use the state of
the wchar.h functions:

  7.29.6.3 ... The implementation behaves as if no library function calls
  these functions with a null pointer for ps.

Previously this resulted in two bugs.

 - The functions c16rtomb and mbrtoc16 would crash when called with ps
   set to null.

 - The function mbrtoc32 used the private state of mbrtowc, which it
   is not allowed to do.

941644e9

14 10月, 2014 1 次提交
- R
  
  implement uchar.h (C11 UTF-16/32 conversion) interfaces · ab9672ae
  由 Rich Felker 提交于 10月 13, 2014
  
  ab9672ae
02 7月, 2014 1 次提交

fix aliasing violations in mbtowc and mbrtowc · e89cfe51

由 Rich Felker 提交于 7月 01, 2014

these functions were setting wc to point to wchar_t aliasing itself as
a "cheap" way to support null wc arguments. doing so was anything but
cheap, since even without the aliasing violation, it would limit the
compiler's ability to optimize.

making wc point to a dummy object is equally easy and does not suffer
from the above problems.

e89cfe51

03 6月, 2014 1 次提交

fix incorrect end pointer in some cases when wcsrtombs stops early · 8fba4458

由 Rich Felker 提交于 6月 02, 2014

when wcsrtombs stopped due to hitting zero remaining space in the
output buffer, it was wrongly clearing the position pointer as if it
had completed the conversion successfully.

this commit rearranges the code somewhat to make a clear separation
between the cases of ending due to running out of output buffer space,
and ending due to reaching the end of input or an illegal sequence in
the input. the new branches have been arranged with the hope of
optimizing more common cases, too.

8fba4458

12 12月, 2013 1 次提交
- S
  
  include cleanups: remove unused headers and add feature test macros · 57174444
  由 Szabolcs Nagy 提交于 12月 12, 2013
  
  57174444
28 9月, 2013 1 次提交

fix buffer overflow in mbsrtowcs · 211264e4

由 Rich Felker 提交于 9月 27, 2013

issue reported by Michael Forney:

"If wn becomes 0 after processing a chunk of 4, mbsrtowcs currently
continues on, wrapping wn around to -1, causing the rest of the string
to be processed.

This resulted in buffer overruns if there was only space in ws for wn
wide characters."

the original patch submitted added an additional check for !wn after
the loop; to avoid extra branching, I instead just changed the wn>=4
check to wn>=5 to ensure that at least one slot remains after the
word-at-a-time loop runs. this should not slow down the tail
processing on real-world usage, since an extra slot that can't be
processed in the word-at-a-time loop is needed for the null
termination anyway.

211264e4

30 6月, 2013 1 次提交
- R
  
  fix failure of mbsrtowcs to record stop position when dest is full · 4ca44215
  由 Rich Felker 提交于 6月 29, 2013
  
  4ca44215
09 4月, 2013 4 次提交

mbrtowc: do not leave mbstate_t in permanent-fail state after EILSEQ · 23ab8c25

由 Rich Felker 提交于 4月 08, 2013

the standard is clear that the old behavior is conforming: "In this
case, [EILSEQ] shall be stored in errno and the conversion state is
undefined."

however, the specification of mbrtowc has one peculiarity when the
source argument is a null pointer: in this case, it's required to
behave as mbrtowc(NULL, "", 1, ps). no motivation is provided for this
requirement, but the natural one that comes to mind is that the intent
is to reset the mbstate_t object. for stateful encodings, such
behavior is actually specified: "If the corresponding wide character
is the null wide character, the resulting state described shall be the
initial conversion state." but in the case of UTF-8 where the
mbstate_t object contains a partially-decoded character rather than a
shift state, a subsequent '\0' byte indicates that the previous
partial character is incomplete and thus an illegal sequence.

naturally, applications using their own mbstate_t object should clear
it themselves after an error, but the standard presently provides no
way to clear the builtin mbstate_t object used when the ps argument is
a null pointer. I suspect this issue may be addressed in the future by
specifying that a null source argument resets the state, as this seems
to have been the intent all along.

for what it's worth, this change also slightly reduces code size.

23ab8c25

implement mbtowc directly, not as a wrapper for mbrtowc · ea34b1b9

由 Rich Felker 提交于 4月 08, 2013

the interface contract for mbtowc admits a much faster implementation
than mbrtowc can achieve; wrapping mbrtowc with an extra call frame
only made the situation worse.

since the regex implementation uses mbtowc already, this change should
improve regex performance too. it may be possible to improve
performance in other places internally by switching from mbrtowc to
mbtowc.

ea34b1b9

optimize mbrtowc · a49e038b

由 Rich Felker 提交于 4月 08, 2013

this simple change, in my measurements, makes about a 7% performance
improvement. at first glance this change would seem like a
compiler-specific hack, since the modified code is not even used.
however, I suspect the reason is that I'm eliminating a second path
into the main body of the code, allowing the compiler more flexibility
to optimize the normal (hot) path into the main body. so even if it
weren't for the measurable (and quite notable) difference in
performance, I think the change makes sense.

a49e038b

fix out-of-bounds access in UTF-8 decoding · 8f06ab0e

由 Rich Felker 提交于 4月 08, 2013

SA and SB are used as the lowest and highest valid starter bytes, but
the value of SB was one-past the last valid starter. this caused
access past the end of the state table when the illegal byte '\xf5'
was encountered in a starter position. the error did not show up in
full-character decoding tests, since the bogus state read from just
past the table was unlikely to admit any continuation bytes as valid,
but would have shown up had we tested feeding '\xf5' to the
byte-at-a-time decoding in mbrtowc: it would cause the funtion to
wrongly return -2 rather than -1.

I may eventually go back and remove all references to SA and SB,
replacing them with the values; this would make the code more
transparent, I think. the original motivation for using macros was to
allow misguided users of the code to redefine them for the purpose of
enlarging the set of accepted sequences past the end of Unicode...

8f06ab0e

05 4月, 2013 5 次提交

cleanup wcstombs · 771c6cea

由 Rich Felker 提交于 4月 04, 2013

remove redundant headers and comments; this file is completely trivial
now. also, avoid temp var.

771c6cea

cleanup mbstowcs wrapper · b5a527f9

由 Rich Felker 提交于 4月 04, 2013

remove unneeded headers. this file is utterly trivial now and there's
no sense in having a comment to state that it's in the public domain.

b5a527f9

minor optimization to mbstowcs · f62b12d0

由 Rich Felker 提交于 4月 04, 2013

there is no need to zero-fill an mbstate_t object in the caller;
mbsrtowcs will automatically treat a null pointer as the initial
state.

f62b12d0

fix incorrect range checks in wcsrtombs · 40b2b5fa

由 Rich Felker 提交于 4月 04, 2013

negative values of wchar_t need to be treated in the non-ASCII case so
that they can properly generate EILSEQ rather than getting truncated
to 8bit values and stored in the output.

40b2b5fa

overhaul mbsrtowcs · 50d9661d

由 Rich Felker 提交于 4月 04, 2013

these changes fix at least two bugs:
- misaligned access to the input as uint32_t for vectorized ASCII test
- incorrect src pointer after stopping on EILSEQ

in addition, the text of the standard makes it unclear whether the
mbstate_t object is to be modified when the destination pointer is
null; previously it was cleared either way; now, it's only cleared
when the destination is non-null. this change may need revisiting, but
it should not affect most applications, since calling mbsrtowcs with
non-zero state can only happen when the head of the string was already
processed with mbrtowc.

finally, these changes shave about 20% size off the function and seem
to improve performance by 1-5%.

50d9661d

07 9月, 2012 1 次提交

use restrict everywhere it's required by c99 and/or posix 2008 · 400c5e5c

由 Rich Felker 提交于 9月 06, 2012

to deal with the fact that the public headers may be used with pre-c99
compilers, __restrict is used in place of restrict, and defined
appropriately for any supported compiler. we also avoid the form
[restrict] since older versions of gcc rejected it due to a bug in the
original c99 standard, and instead use the form *restrict.

400c5e5c

27 5月, 2012 1 次提交
- R
  fix failure of mbsinit(0) (not UB; required to return nonzero) · 6436b371
  由 Rich Felker 提交于 5月 26, 2012
```
issue reported by Richard Pennington; slightly simpler fix applied
```
  6436b371
03 5月, 2012 1 次提交

fix longstanding exit logic bugs in mbsnrtowcs and wcsnrtombs · 485fb14a

由 Rich Felker 提交于 5月 02, 2012

these are POSIX 2008 (previously GNU extension) functions that are
rarely used. apparently they had never been tested before, since the
end-of-string logic was completely missing. mbsnrtowcs is used by
modern versions of bash for its glob implementation, and and this bug
was causing tab completion to hang in an infinite loop.

485fb14a

25 2月, 2012 2 次提交

new attempt at working around the gcc 3 visibility bug · 78e79d9d

由 Rich Felker 提交于 2月 24, 2012

since gcc is failing to generate the necessary ".hidden" directive in
the output asm, generate it explicitly with an __asm__ statement...

78e79d9d

remove useless attribute visibility from definitions · 7fa29920

由 Rich Felker 提交于 2月 24, 2012

this was a failed attempt at working around the gcc 3 visibility bug
affecting x86_64. subsequent patch will address it with an ugly but
working hack.

7fa29920

24 2月, 2012 1 次提交

cleanup and work around visibility bug in gcc 3 that affects x86_64 · bae2e52b

由 Rich Felker 提交于 2月 23, 2012

in gcc 3, the visibility attribute must be placed on both the
declaration and on the definition. if it's omitted from the
definition, the compiler fails to emit the ".hidden" directive in the
assembly, and the linker will either generate textrels (if supported,
such as on i386) or refuse to link (on targets where certain types of
textrels are forbidden or impossible without further assumptions about
memory layout, such as on x86_64).

this patch also unifies the decision about when to use visibility into
libc.h and makes the visibility in the utf-8 state machine tables
based on libc.h rather than a duplicate test.

bae2e52b

26 3月, 2011 1 次提交

fix all implicit conversion between signed/unsigned pointers · 9ae8d5fc

由 Rich Felker 提交于 3月 25, 2011

sadly the C language does not specify any such implicit conversion, so
this is not a matter of just fixing warnings (as gcc treats it) but
actual errors. i would like to revisit a number of these changes and
possibly revise the types used to reduce the number of casts required.

9ae8d5fc

27 2月, 2011 1 次提交

cleanup utf-8 multibyte code, use visibility if possible · 015d33c5

由 Rich Felker 提交于 2月 27, 2011

this code was written independently of musl, with support for a the
backwards, nonstandard "31-bit unicode" some libraries/apps might
want. unfortunately the extra code (inside #ifdef) makes the source
harder to read and makes code that should be simple look complex, so
i'm removing it. anyone who wants to use the old code can find it in
the history or from elsewhere.

also, change the visibility of the __fsmu8 state machine table to
hidden, if supported. this should improve performance slightly in
shared-library builds.

015d33c5

22 2月, 2011 1 次提交
- R
  
  remove sample utf-8 code that's not part of the standard library · cfcbea1e
  由 Rich Felker 提交于 2月 21, 2011
  
  cfcbea1e
14 2月, 2011 1 次提交
- R
  
  cleanup multibyte stuff to remove ugly casts, sanitize the ptr align casts · f9d880d2
  由 Rich Felker 提交于 2月 13, 2011
  
  f9d880d2
12 2月, 2011 1 次提交
- R
  
  initial check-in, version 0.5.0 · 0b44a031
  由 Rich Felker 提交于 2月 12, 2011
  
  0b44a031

OpenHarmony / Third Party Musl 1 年多 前同步成功

OpenHarmony / Third Party Musl
1 年多前同步成功