提交 · 4260dfe1ecc43d92d1e6d30daa0f22bd746d1740 · OpenHarmony / Third Party Musl

24 9月, 2015 1 次提交
- S
  regcomp: propagate allocation failures · 4260dfe1
  由 Szabolcs Nagy 提交于 9月 23, 2015
```
The error code of an allocating function was not checked in tre_add_tag.
```
  4260dfe1
16 6月, 2015 1 次提交

byte-based C locale, phase 1: multibyte character handling functions · 1507ebf8

由 Rich Felker 提交于 6月 16, 2015

this patch makes the functions which work directly on multibyte
characters treat the high bytes as individual abstract code units
rather than as multibyte sequences when MB_CUR_MAX is 1. since
MB_CUR_MAX is presently defined as a constant 4, all of the new code
added is dead code, and optimizing compilers' code generation should
not be affected at all. a future commit will activate the new code.

as abstract code units, bytes 0x80 to 0xff are represented by wchar_t
values 0xdf80 to 0xdfff, at the end of the surrogates range. this
ensures that they will never be misinterpreted as Unicode characters,
and that all wctype functions return false for these "characters"
without needing locale-specific logic. a high range outside of Unicode
such as 0x7fffff80 to 0x7fffffff was also considered, but since C11's
char16_t also needs to be able to represent conversions of these
bytes, the surrogate range was the natural choice.

1507ebf8

28 3月, 2015 1 次提交

regex: fix character class repetitions · c498efe1

由 Szabolcs Nagy 提交于 3月 25, 2015

Internally regcomp needs to copy some iteration nodes before
translating the AST into TNFA representation.

Literal nodes were not copied correctly: the class type and list
of negated class types were not copied so classes were ignored
(in the non-negated case an ignored char class caused the literal
to match everything).

This affects iterations when the upper bound is finite, larger
than one or the lower bound is larger than one. So eg. the EREs

 [[:digit:]]{2}
 [^[:space:]ab]{1,4}

were treated as

 .{2}
 [^ab]{1,4}

The fix is done with minimal source modification to copy the
necessary fields, but the AST preparation and node handling
code of tre will need to be cleaned up for clarity.

c498efe1

24 3月, 2015 1 次提交

do not treat \0 as a backref in BRE · 32dee9b9

由 Szabolcs Nagy 提交于 3月 22, 2015

The valid BRE backref tokens are \1 .. \9, and 0 is not a special
character either so \0 is undefined by the standard.

Such undefined escaped characters are treated as literal characters
currently, following existing practice, so \0 is the same as 0.

32dee9b9

21 3月, 2015 2 次提交

suppress backref processing in ERE regcomp · 7c8c86f6

由 Rich Felker 提交于 3月 20, 2015

one of the features of ERE is that it's actually a regular language
and does not admit expressions which cannot be matched in linear time.
introduction of \n backref support into regcomp's ERE parsing was
unintentional.

7c8c86f6

fix memory-corruption in regcomp with backslash followed by high byte · 39dfd584

由 Rich Felker 提交于 3月 20, 2015

the regex parser handles the (undefined) case of an unexpected byte
following a backslash as a literal. however, instead of correctly
decoding a character, it was treating the byte value itself as a
character. this was not only semantically unjustified, but turned out
to be dangerous on archs where plain char is signed: bytes in the
range 252-255 alias the internal codes -4 through -1 used for special
types of literal nodes in the AST.

39dfd584

18 12月, 2014 1 次提交
- N
  
  implement FNM_CASEFOLD extension to fnmatch function · efa9d396
  由 Nagy Szabolcs 提交于 10月 10, 2014
  
  efa9d396
13 9月, 2014 1 次提交

rewrite the regex pattern parser in regcomp · ec1aed0a

由 Szabolcs Nagy 提交于 8月 14, 2014

The new code is a bit simpler and the generated code is about 1KB
smaller (on i386). The basic design was kept including internal
interfaces, TNFA generation was not touched.

The old tre parser had various issues:

[^aa-z]
negated overlapping ranges in a bracket expression were handled
incorrectly (eg [^aa-z] was handled as [^a] instead of [^a-z])

a{,2}
missing lower bound in a counted repetition should be an error,
but it was accepted with broken semantics: a{,2} was treated as
a{0,3}, the new parser rejects it

a{999,}
large min count was not rejected (a{5000,} failed with REG_ESPACE
due to reaching a stack limit), the new parser enforces the
RE_DUP_MAX limit

\xff
regcomp used to accept a pattern with illegal sequences in it
(treated them as empty expression so p\xffq matched pq) the new
parser rejects such patterns with REG_BADPAT or REG_ERANGE

[^b-fD-H] with REG_ICASE
old parser turned this into [^b-fB-F] because of the negated
overlapping range issue (see above), the new parser treats it
as [^b-hB-H], POSIX seems to require [^d-fD-F], but practical
implementations do case-folding first and negate the character
set later instead of the other way around. (Supporting the posix
way efficiently would require significant changes so it was left
as is, it is unclear if any application actually expects the
posix behaviour, this issue is raised on the austingroup tracker:
http://austingroupbugs.net/view.php?id=872 ).

another case-insensitive matching issue is that unicode case
folding rules can group more than two characters together while
towupper and towlower can only work for a pair of upper and
lower case characters, this is a limitation of POSIX so it is
not fixed.

invalid bracket and brace expressions may return different error
codes now (REG_ERANGE instead of REG_EBRACK or REG_BADBR instead
of REG_EBRACE) otherwise the new parser should be compatible with
the old one.

regcomp should be able to handle arbitrary pattern input if the
pattern length is limited, the only exception is the use of large
repetition counts (eg. (a{255}){255}) which require exp amount
of memory and there is no easy workaround.

ec1aed0a

06 9月, 2014 1 次提交
- S
  
  fix memory leak in regexec when input contains illegal sequence · 546f6b32
  由 Szabolcs Nagy 提交于 9月 05, 2014
  
  546f6b32
26 7月, 2014 1 次提交

add support for LC_TIME and LC_MESSAGES translations · c5b8f193

由 Rich Felker 提交于 7月 26, 2014

for LC_MESSAGES, translation of strerror and similar literal message
functions is supported. for messages in other places (particularly the
dynamic linker) that use format strings, translation is not yet
supported. in order to make it possible and safe, such messages will
need to be refactored to separate the textual content from the format.

for LC_TIME, the day and month names and strftime-style format strings
provided by nl_langinfo are supported for translation. however there
may be limitations, as some of the original C-locale nl_langinfo
strings are non-unique and thus perhaps non-suitable as keys.

overall, the locale support activated by this commit should not be
seen as complete and polished but as a basis for beginning to test
locale functionality and implement locales.

c5b8f193

18 7月, 2014 1 次提交
- R
  fix crash in regexec for nonzero nmatch argument with REG_NOSUB · 72ed3d47
  由 Rich Felker 提交于 7月 17, 2014
```
per POSIX, the nmatch and pmatch arguments are ignored when the regex
was compiled with REG_NOSUB.
```
  72ed3d47
12 12月, 2013 1 次提交
- S
  
  include cleanups: remove unused headers and add feature test macros · 57174444
  由 Szabolcs Nagy 提交于 12月 12, 2013
  
  57174444
02 12月, 2013 3 次提交

implement FNM_LEADING_DIR extension flag in fnmatch · a4e10e30

由 Rich Felker 提交于 12月 02, 2013

previously this flag was defined and accepted as a no-op, possibly
breaking some software that uses it. given the choice to remove the
definition and possibly break applications that were already working,
or simply implement the feature, the latter turned out to be easy
enough to make the decision easy.

in the case where the FNM_PATHNAME flag is also set, this
implementation is clean and essentially optimal. otherwise, it's an
inefficient "brute force" implementation. at some point, when cleaning
up and refactoring this code, I may add a more direct code path for
handling FNM_LEADING_DIR in the non-FNM_PATHNAME case, but at this
point my main interest is avoiding introducing new bugs in the code
that implements the standard fnmatch features specified by POSIX.

a4e10e30

fix fnmatch corner cases related to escaping · 6ec82a3b

由 Rich Felker 提交于 12月 01, 2013

the FNM_PATHNAME logic for advancing by /-delimited components was
incorrect when the / character was escaped (i.e. \/), and a final \ at
the end of pattern was not handled correctly.

6ec82a3b

fix the end of string matching in fnmatch with FNM_PATHNAME · da0fcdb8

由 Szabolcs Nagy 提交于 12月 01, 2013

a '/' in the pattern could be incorrectly matched against the
terminating null byte in the string causing arbitrarily long
sequence of out-of-bounds access in fnmatch("/","",FNM_PATHNAME)

da0fcdb8

07 10月, 2013 1 次提交

fix allocation sizes in regcomp · 1e81fa45

由 Szabolcs Nagy 提交于 10月 07, 2013

sizeof had incorrect argument in a few places, the size was always
large enough so the issue was not critical.

1e81fa45

01 2月, 2013 1 次提交

revert regex "cleanup" that seems unjustified and may break backtracking · ae4b0b96

由 Rich Felker 提交于 2月 01, 2013

it's not clear to me at the moment whether the code that was removed
(and which is now being re-added) is needed, but it's far from being a
no-op, and i don't want to risk breaking regex in this release.

ae4b0b96

15 1月, 2013 1 次提交

remove unused "params" related code from regex · f05f59b8

由 Szabolcs Nagy 提交于 1月 15, 2013

some structs and functions had reference to the params
feature of tre that is not used by the code anymore

f05f59b8

14 1月, 2013 1 次提交
- S
  regex: remove an unused local variable from regexec · dd959163
  由 Szabolcs Nagy 提交于 1月 14, 2013
```
pos_start local variable is not used in tre_tnfa_run_backtrack
```
  dd959163
07 9月, 2012 1 次提交

use restrict everywhere it's required by c99 and/or posix 2008 · 400c5e5c

由 Rich Felker 提交于 9月 06, 2012

to deal with the fact that the public headers may be used with pre-c99
compilers, __restrict is used in place of restrict, and defined
appropriately for any supported compiler. we also avoid the form
[restrict] since older versions of gcc rejected it due to a bug in the
original c99 standard, and instead use the form *restrict.

400c5e5c

25 5月, 2012 1 次提交

fix regex on arm · 8b4c232e

由 Rich Felker 提交于 5月 25, 2012

TRE has a broken assumption that wchar_t is signed, which is a sane
expectation, but not required by the standard, and false on ARM's ABI.

i leave tre_char_t as wchar_t for now, since a pointer to it is
directly passed to functions that need pointer to wchar_t. it does not
seem to break anything. and since the maximum unicode scalar value is
0x10ffff, just use that explicitly rather than using the max value of
any particular C type.

8b4c232e

14 5月, 2012 2 次提交

remove some no-op end of string tests from regex parser · 13b2945a

由 Rich Felker 提交于 5月 13, 2012

these are cruft from the original code which used an explicit string
length rather than null termination. i blindly converted all the
checks to null terminator checks, without noticing that in several
cases, the subsequent switch statement would automatically handle the
null byte correctly.

13b2945a

another BRE fix: in ^*, * is literal · e9cddc8e

由 Rich Felker 提交于 5月 13, 2012

i don't understand why this has to be conditional on being in BRE
mode, but enabling this code unconditionally breaks a huge number of
ERE test cases.

e9cddc8e

08 5月, 2012 4 次提交

R

fix error checking for \ at end of regex (this was broken previously) · 952700e8
由 Rich Felker 提交于 5月 07, 2012

952700e8
R

fix copy and paste error in regex code causing mishandling of \) in BRE · 17361482
由 Rich Felker 提交于 5月 07, 2012

17361482
R

fix regex breakage in last commit (failure to handle empty regex, etc.) · a5a47783
由 Rich Felker 提交于 5月 07, 2012

a5a47783

fix ugly bugs in TRE regex parser · d7a90b35

由 Rich Felker 提交于 5月 07, 2012

1. * in BRE is not special at the beginning of the regex or a
subexpression. this broke ncurses' build scripts.

2. \\( in BRE is a literal \ followed by a literal (, not a literal \
followed by a subexpression opener.

3. the ^ in \\(^ in BRE is a literal ^ only at the beginning of the
entire BRE. POSIX allows treating it as an anchor at the beginning of
a subexpression, but TRE's code for checking if it was at the
beginning of a subexpression was wrong, and fixing it for the sake of
supporting a non-portable usage was too much trouble when just
removing this non-portable behavior was much easier.

this patch also moved lots of the ugly logic for empty atom checking
out of the default/literal case and into new cases for the relevant
characters. this should make parsing faster and make the code smaller.
if nothing else it's a lot more readable/logical.

at some point i'd like to revisit and overhaul lots of this code...

d7a90b35

29 4月, 2012 1 次提交

new fnmatch implementation · 45b38550

由 Rich Felker 提交于 4月 28, 2012

unlike the old one, this one's algorithm does not suffer from
potential stack overflow issues or pathologically bad performance on
certain patterns. instead of backtracking, it uses a matching
algorithm which I have not seen before (unsure whether I invented or
re-invented it) that runs in O(1) space and O(nm) time. it may be
possible to improve the time to O(n), but not without significantly
greater complexity.

45b38550

27 4月, 2012 1 次提交

update fnmatch to POSIX 2008 semantics · 2b87a5db

由 Rich Felker 提交于 4月 26, 2012

an invalid bracket expression must be treated as if the opening
bracket were just a literal character. this is to fix a bug whereby
POSIX left the behavior of the "[" shell command undefined due to it
being an invalid bracket expression.

2b87a5db

15 4月, 2012 1 次提交

fix signedness error handling invalid multibyte sequences in regexec · b9dd43db

由 Rich Felker 提交于 4月 14, 2012

the "< 0" test was always false due to use of an unsigned type. this
resulted in infinite loops on 32-bit machines (adding -1U to a pointer
is the same as adding -1) and crashes on 64-bit machines (offsetting
the string pointer by 4gb-1b when an illegal sequence was hit).

b9dd43db

14 4月, 2012 2 次提交

remove invalid code from TRE · 386b34a0

由 Rich Felker 提交于 4月 13, 2012

TRE wants to treat + and ? after a +, ?, or * as special; ? means
ungreedy and + is reserved for future use. however, this is
non-conformant. although redundant, these redundant characters have
well-defined (no-op) meaning for POSIX ERE, and are actually _literal_
characters (which TRE is wrongly ignoring) in POSIX BRE mode.

the simplest fix is to simply remove the unneeded nonstandard
functionality. as a plus, this shaves off a small amount of bloat.

386b34a0

R

fix broken regerror (typo) and missing message · b6dbdc69
由 Rich Felker 提交于 4月 13, 2012

b6dbdc69

21 3月, 2012 1 次提交

upgrade to latest upstream TRE regex code (0.8.0) · ad47d45e

由 Rich Felker 提交于 3月 20, 2012

the main practical results of this change are
1. the regex code is no longer subject to LGPL; it's now 2-clause BSD
2. most (all?) popular nonstandard regex extensions are supported

I hesitate to call this a "sync" since both the old and new code are
heavily modified. in one sense, the old code was "more severely"
modified, in that it was actively hostile to non-strictly-conforming
expressions. on the other hand, the new code has eliminated the
useless translation of the entire regex string to wchar_t prior to
compiling, and now only converts multibyte character literals as
needed.

in the future i may use this modified TRE as a basis for writing the
long-planned new regex engine that will avoid multibyte-to-wide
character conversion entirely by compiling multibyte bracket
expressions specific to UTF-8.

ad47d45e

24 1月, 2012 1 次提交

make glob mark symlinks-to-directories with the GLOB_MARK flag · d0678b58

由 Rich Felker 提交于 1月 23, 2012

POSIX is unclear on whether it should, but all historical
implementations seem to behave this way, and it seems more useful to
applications.

d0678b58

23 1月, 2012 1 次提交
- R
  support GLOB_PERIOD flag (GNU extension) to glob function · 787c2648
  由 Rich Felker 提交于 1月 22, 2012
```
patch by sh4rm4
```
  787c2648
17 6月, 2011 1 次提交
- R
  
  duplicate re_nsub in LSB/glibc ABI compatible location · 32aea208
  由 Rich Felker 提交于 6月 16, 2011
  
  32aea208
07 6月, 2011 1 次提交

fix handling of d_name in struct dirent · da88b16a

由 Rich Felker 提交于 6月 06, 2011

basically there are 3 choices for how to implement this variable-size
string member:
1. C99 flexible array member: breaks using dirent.h with pre-C99 compiler.
2. old way: length-1 string: generates array bounds warnings in caller.
3. new way: length-NAME_MAX string. no problems, simplifies all code.

of course the usable part in the pointer returned by readdir might be
shorter than NAME_MAX+1 bytes, but that is allowed by the standard and
doesn't hurt anything.

da88b16a

06 6月, 2011 2 次提交

safety fix for glob's vla usage: disallow patterns longer than PATH_MAX · 0dc99ac4

由 Rich Felker 提交于 6月 05, 2011

this actually inadvertently disallows some valid patterns with
redundant / or * characters, but it's better than allowing unbounded
vla allocation.

eventually i'll write code to move the pattern to the stack and
eliminate redundancy to ensure that it fits in PATH_MAX at the
beginning of glob. this would also allow it to be modified in place
for passing to fnmatch rather than copied at each level of recursion.

0dc99ac4

R

eliminate (harmless in this case) vla usage in fnmatch.c · a6c399cf
由 Rich Felker 提交于 6月 05, 2011

a6c399cf

08 4月, 2011 1 次提交
- R
  
  fix bug in TRE found by clang (typo && instead of &) · 74f75541
  由 Rich Felker 提交于 4月 07, 2011
  
  74f75541

OpenHarmony / Third Party Musl 1 年多 前同步成功

OpenHarmony / Third Party Musl
1 年多前同步成功