提交 · c4615c1b083b0a59ad1dd3b3bb0a092965e4ef1e · OpenHarmony / Third Party Musl

28 9月, 2022 1 次提交
- fixed 58362922 from https://gitee.com/i-wangliangliang/third_party_musl/pulls/550 · c4615c1b
  由王liangliang 提交于 9月 26, 2022
```
iccarm c库增加regexec
Signed-off-by: i-wangliangliang <willfox@126.com>
Change-Id: Ied244f95bbc7d30866ad4cc51bcfd52599b0730f
```
  c4615c1b
11 6月, 2021 1 次提交

chore: isolate modifications into the porting folder · a7a91cd9

由 Caoruihong 提交于 6月 10, 2021

isolate changes, keep orignal musl sources clean.
Signed-off-by: NCaoruihong <crh.cao@huawei.com>
Change-Id: Id7f3a5109771f93d397e30febba36e09ddaf4f36

a7a91cd9

11 3月, 2021 1 次提交
- M
  
  update openharmony 1.0.1 · a6919e3f
  由 mamingshuai 提交于 3月 11, 2021
  
  a6919e3f
09 9月, 2020 1 次提交
- W
  
  add OpenHarmony 1.0 baseline · ea5b2688
  由 wenjun 提交于 9月 09, 2020
  
  ea5b2688
22 3月, 2017 1 次提交

regex: fix newline matching with negated brackets · 9571c531

由 Julien Ramseier 提交于 3月 21, 2017

With REG_NEWLINE, POSIX says:
"A <newline> in string shall not be matched by a period outside
a bracket expression or by any form of a non-matching list"

9571c531

17 12月, 2016 1 次提交

handle ^ and $ in BRE subexpression start and end as anchors · 7a4c25d7

由 Szabolcs Nagy 提交于 11月 24, 2016

In BRE, ^ is an anchor at the beginning of an expression, optionally
it may be an anchor at the beginning of a subexpression and must be
treated as a literal otherwise.

Previously musl treated ^ in subexpressions as literal, but at least
glibc and gnu sed treats it as an anchor and that's the more useful
behaviour: it can always be escaped to get back the literal meaning.

Same for $ at the end of a subexpression.

Portable BRE should not rely on this, but there are sed commands in
build scripts which do.

This changes the meaning of the BREs:

	\(^a\)
	\(a\|^b\)
	\(a$\)
	\(a$\|b\)

7a4c25d7

23 5月, 2016 1 次提交

fix the use of uninitialized value in regcomp · 51eeb6eb

由 Szabolcs Nagy 提交于 5月 21, 2016

the num_submatches field of some ast nodes was not initialized in
tre_add_tag_{left,right}, but was accessed later.

this was a benign bug since the uninitialized values were never used
(these values are created during tre_add_tags and copied around during
tre_expand_ast where they are also used in computations, but nothing
in the final tnfa depends on them).

51eeb6eb

02 3月, 2016 2 次提交

fix ^* at the start of a complete BRE · 29b13575

由 Szabolcs Nagy 提交于 2月 29, 2016

This is a workaround to treat * as literal * at the start of a BRE.

Ideally ^ would be treated as an anchor at the start of any BRE
subexpression and similarly $ would be an anchor at the end of any
subexpression. This is not required by the standard and hard to do
with the current code, but it's the existing practice. If it is
changed, * should be treated as literal after such anchor as well.

29b13575

fix * at the start of a BRE subexpression · 39ea71fb

由 Szabolcs Nagy 提交于 2月 29, 2016

commit 7eaa76fc made * invalid at
the start of a BRE subexpression, but it should be accepted as
literal * there according to the standard.

This patch does not fix subexpressions starting with ^*.

39ea71fb

01 2月, 2016 1 次提交

regex: increase the stack tre uses for tnfa creation · 2810b30f

由 Szabolcs Nagy 提交于 1月 31, 2016

10k elements stack is increased to 1000k, otherwise tnfa creation fails
for reasonable sized patterns: a single literal char can add 7 elements
to this stack, so regcomp of an 1500 char long pattern (with only litral
chars) fails with REG_ESPACE. (the new limit allows about < 150k chars,
this arbitrary limit allows most command line regex usage.)

ideally there would be no upper bound: regcomp dynamically reallocates
this buffer, every reallocation checks for allocation failure and at
the end this stack is freed so there is no reason for special bound.
however that may have unwanted effect on regcomp and regexec runtime
so this is a conservative change.

2810b30f

31 1月, 2016 6 次提交

S

regex: simplify the {,} repetition parsing logic · 831e9d9e
由 Szabolcs Nagy 提交于 4月 18, 2015

831e9d9e
S
regex: treat \+, \? as repetitions in BRE · 25160f1c
由 Szabolcs Nagy 提交于 4月 18, 2015
```
These are undefined escape sequences by the standard, but often
used in sed scripts.
```
25160f1c
S
regex: rewrite the repetition parsing code · 03498ec2
由 Szabolcs Nagy 提交于 4月 18, 2015
```
The goto logic was hard to follow and modify. This is
in preparation for the BRE \+ and \? support.
```
03498ec2

regex: treat \| in BRE as alternation · da4cc13b

由 Szabolcs Nagy 提交于 4月 18, 2015

The standard does not define semantics for \| in BRE, but some code
depends on it meaning alternation. Empty alternative expression is
allowed to be consistent with ERE.

Based on a patch by Rob Landley.

da4cc13b

regex: reject repetitions in some cases with REG_BADRPT · 7eaa76fc

由 Szabolcs Nagy 提交于 4月 18, 2015

Previously repetitions were accepted after empty expressions like
in (*|?)|{2}, but in BRE the handling of * and \{\} were not
consistent: they were accepted as literals in some cases and
repetitions in others.

It is better to treat repetitions after an empty expression as an
error (this is allowed by the standard, and glibc mostly does the
same). This is hard to do consistently with the current logic so
the new rule is:

Reject repetitions after empty expressions, except after assertions
^*, $? and empty groups ()+ and never treat them as literals.

Empty alternation (|a) is undefined by the standard, but it can be
useful so that should be accepted.

7eaa76fc

regex: clean up position accounting for literal nodes · a8cc2253

由 Szabolcs Nagy 提交于 4月 18, 2015

This should not change the meaning of the code, just make the intent
clearer: advancing position is tied to adding a new literal.

a8cc2253

24 9月, 2015 1 次提交
- S
  regcomp: propagate allocation failures · 4260dfe1
  由 Szabolcs Nagy 提交于 9月 23, 2015
```
The error code of an allocating function was not checked in tre_add_tag.
```
  4260dfe1
28 3月, 2015 1 次提交

regex: fix character class repetitions · c498efe1

由 Szabolcs Nagy 提交于 3月 25, 2015

Internally regcomp needs to copy some iteration nodes before
translating the AST into TNFA representation.

Literal nodes were not copied correctly: the class type and list
of negated class types were not copied so classes were ignored
(in the non-negated case an ignored char class caused the literal
to match everything).

This affects iterations when the upper bound is finite, larger
than one or the lower bound is larger than one. So eg. the EREs

 [[:digit:]]{2}
 [^[:space:]ab]{1,4}

were treated as

 .{2}
 [^ab]{1,4}

The fix is done with minimal source modification to copy the
necessary fields, but the AST preparation and node handling
code of tre will need to be cleaned up for clarity.

c498efe1

24 3月, 2015 1 次提交

do not treat \0 as a backref in BRE · 32dee9b9

由 Szabolcs Nagy 提交于 3月 22, 2015

The valid BRE backref tokens are \1 .. \9, and 0 is not a special
character either so \0 is undefined by the standard.

Such undefined escaped characters are treated as literal characters
currently, following existing practice, so \0 is the same as 0.

32dee9b9

21 3月, 2015 2 次提交

suppress backref processing in ERE regcomp · 7c8c86f6

由 Rich Felker 提交于 3月 20, 2015

one of the features of ERE is that it's actually a regular language
and does not admit expressions which cannot be matched in linear time.
introduction of \n backref support into regcomp's ERE parsing was
unintentional.

7c8c86f6

fix memory-corruption in regcomp with backslash followed by high byte · 39dfd584

由 Rich Felker 提交于 3月 20, 2015

the regex parser handles the (undefined) case of an unexpected byte
following a backslash as a literal. however, instead of correctly
decoding a character, it was treating the byte value itself as a
character. this was not only semantically unjustified, but turned out
to be dangerous on archs where plain char is signed: bytes in the
range 252-255 alias the internal codes -4 through -1 used for special
types of literal nodes in the AST.

39dfd584

13 9月, 2014 1 次提交

rewrite the regex pattern parser in regcomp · ec1aed0a

由 Szabolcs Nagy 提交于 8月 14, 2014

The new code is a bit simpler and the generated code is about 1KB
smaller (on i386). The basic design was kept including internal
interfaces, TNFA generation was not touched.

The old tre parser had various issues:

[^aa-z]
negated overlapping ranges in a bracket expression were handled
incorrectly (eg [^aa-z] was handled as [^a] instead of [^a-z])

a{,2}
missing lower bound in a counted repetition should be an error,
but it was accepted with broken semantics: a{,2} was treated as
a{0,3}, the new parser rejects it

a{999,}
large min count was not rejected (a{5000,} failed with REG_ESPACE
due to reaching a stack limit), the new parser enforces the
RE_DUP_MAX limit

\xff
regcomp used to accept a pattern with illegal sequences in it
(treated them as empty expression so p\xffq matched pq) the new
parser rejects such patterns with REG_BADPAT or REG_ERANGE

[^b-fD-H] with REG_ICASE
old parser turned this into [^b-fB-F] because of the negated
overlapping range issue (see above), the new parser treats it
as [^b-hB-H], POSIX seems to require [^d-fD-F], but practical
implementations do case-folding first and negate the character
set later instead of the other way around. (Supporting the posix
way efficiently would require significant changes so it was left
as is, it is unclear if any application actually expects the
posix behaviour, this issue is raised on the austingroup tracker:
http://austingroupbugs.net/view.php?id=872 ).

another case-insensitive matching issue is that unicode case
folding rules can group more than two characters together while
towupper and towlower can only work for a pair of upper and
lower case characters, this is a limitation of POSIX so it is
not fixed.

invalid bracket and brace expressions may return different error
codes now (REG_ERANGE instead of REG_EBRACK or REG_BADBR instead
of REG_EBRACE) otherwise the new parser should be compatible with
the old one.

regcomp should be able to handle arbitrary pattern input if the
pattern length is limited, the only exception is the use of large
repetition counts (eg. (a{255}){255}) which require exp amount
of memory and there is no easy workaround.

ec1aed0a

12 12月, 2013 1 次提交
- S
  
  include cleanups: remove unused headers and add feature test macros · 57174444
  由 Szabolcs Nagy 提交于 12月 12, 2013
  
  57174444
07 10月, 2013 1 次提交

fix allocation sizes in regcomp · 1e81fa45

由 Szabolcs Nagy 提交于 10月 07, 2013

sizeof had incorrect argument in a few places, the size was always
large enough so the issue was not critical.

1e81fa45

15 1月, 2013 1 次提交

remove unused "params" related code from regex · f05f59b8

由 Szabolcs Nagy 提交于 1月 15, 2013

some structs and functions had reference to the params
feature of tre that is not used by the code anymore

f05f59b8

07 9月, 2012 1 次提交

use restrict everywhere it's required by c99 and/or posix 2008 · 400c5e5c

由 Rich Felker 提交于 9月 06, 2012

to deal with the fact that the public headers may be used with pre-c99
compilers, __restrict is used in place of restrict, and defined
appropriately for any supported compiler. we also avoid the form
[restrict] since older versions of gcc rejected it due to a bug in the
original c99 standard, and instead use the form *restrict.

400c5e5c

14 5月, 2012 2 次提交

remove some no-op end of string tests from regex parser · 13b2945a

由 Rich Felker 提交于 5月 13, 2012

these are cruft from the original code which used an explicit string
length rather than null termination. i blindly converted all the
checks to null terminator checks, without noticing that in several
cases, the subsequent switch statement would automatically handle the
null byte correctly.

13b2945a

another BRE fix: in ^*, * is literal · e9cddc8e

由 Rich Felker 提交于 5月 13, 2012

i don't understand why this has to be conditional on being in BRE
mode, but enabling this code unconditionally breaks a huge number of
ERE test cases.

e9cddc8e

08 5月, 2012 4 次提交

R

fix error checking for \ at end of regex (this was broken previously) · 952700e8
由 Rich Felker 提交于 5月 07, 2012

952700e8
R

fix copy and paste error in regex code causing mishandling of \) in BRE · 17361482
由 Rich Felker 提交于 5月 07, 2012

17361482
R

fix regex breakage in last commit (failure to handle empty regex, etc.) · a5a47783
由 Rich Felker 提交于 5月 07, 2012

a5a47783

fix ugly bugs in TRE regex parser · d7a90b35

由 Rich Felker 提交于 5月 07, 2012

1. * in BRE is not special at the beginning of the regex or a
subexpression. this broke ncurses' build scripts.

2. \\( in BRE is a literal \ followed by a literal (, not a literal \
followed by a subexpression opener.

3. the ^ in \\(^ in BRE is a literal ^ only at the beginning of the
entire BRE. POSIX allows treating it as an anchor at the beginning of
a subexpression, but TRE's code for checking if it was at the
beginning of a subexpression was wrong, and fixing it for the sake of
supporting a non-portable usage was too much trouble when just
removing this non-portable behavior was much easier.

this patch also moved lots of the ugly logic for empty atom checking
out of the default/literal case and into new cases for the relevant
characters. this should make parsing faster and make the code smaller.
if nothing else it's a lot more readable/logical.

at some point i'd like to revisit and overhaul lots of this code...

d7a90b35

14 4月, 2012 1 次提交

remove invalid code from TRE · 386b34a0

由 Rich Felker 提交于 4月 13, 2012

TRE wants to treat + and ? after a +, ?, or * as special; ? means
ungreedy and + is reserved for future use. however, this is
non-conformant. although redundant, these redundant characters have
well-defined (no-op) meaning for POSIX ERE, and are actually _literal_
characters (which TRE is wrongly ignoring) in POSIX BRE mode.

the simplest fix is to simply remove the unneeded nonstandard
functionality. as a plus, this shaves off a small amount of bloat.

386b34a0

21 3月, 2012 1 次提交

upgrade to latest upstream TRE regex code (0.8.0) · ad47d45e

由 Rich Felker 提交于 3月 20, 2012

the main practical results of this change are
1. the regex code is no longer subject to LGPL; it's now 2-clause BSD
2. most (all?) popular nonstandard regex extensions are supported

I hesitate to call this a "sync" since both the old and new code are
heavily modified. in one sense, the old code was "more severely"
modified, in that it was actively hostile to non-strictly-conforming
expressions. on the other hand, the new code has eliminated the
useless translation of the entire regex string to wchar_t prior to
compiling, and now only converts multibyte character literals as
needed.

in the future i may use this modified TRE as a basis for writing the
long-planned new regex engine that will avoid multibyte-to-wide
character conversion entirely by compiling multibyte bracket
expressions specific to UTF-8.

ad47d45e

17 6月, 2011 1 次提交
- R
  
  duplicate re_nsub in LSB/glibc ABI compatible location · 32aea208
  由 Rich Felker 提交于 6月 16, 2011
  
  32aea208
12 2月, 2011 1 次提交
- R
  
  initial check-in, version 0.5.0 · 0b44a031
  由 Rich Felker 提交于 2月 12, 2011
  
  0b44a031

OpenHarmony / Third Party Musl 9 个月 前同步成功

OpenHarmony / Third Party Musl
9 个月前同步成功