regex: fix character class repetitions

Internally regcomp needs to copy some iteration nodes before translating the AST into TNFA representation. Literal nodes were not copied correctly: the class type and list of negated class types were not copied so classes were ignored (in the non-negated case an ignored char class caused the literal to match everything). This affects iterations when the upper bound is finite, larger than one or the lower bound is larger than one. So eg. the EREs [[:digit:]]{2} [^[:space:]ab]{1,4} were treated as .{2} [^ab]{1,4} The fix is done with minimal source modification to copy the necessary fields, but the AST preparation and node handling code of tre will need to be cleaned up for clarity.

regex: fix character class repetitions
Internally regcomp needs to copy some iteration nodes before translating the AST into TNFA representation. Literal nodes were not copied correctly: the class type and list of negated class types were not copied so classes were ignored (in the non-negated case an ignored char class caused the literal to match everything). This affects iterations when the upper bound is finite, larger than one or the lower bound is larger than one. So eg. the EREs [[:digit:]]{2} [^[:space:]ab]{1,4} were treated as .{2} [^ab]{1,4} The fix is done with minimal source modification to copy the necessary fields, but the AST preparation and node handling code of tre will need to be cleaned up for clarity.
c498efe1 · Szabolcs Nagy · Rich Felker · 32dee9b9 · c498efe1
隐藏空白更改
内联并排

Showing with 5 addition and 0 deletion

src/regex/regcomp.c src/regex/regcomp.c +5 -0

未找到文件。
--- a/src/regex/regcomp.c
+++ b/src/regex/regcomp.c
@@ -1700,6 +1700,11 @@ tre_copy_ast(tre_mem_t mem, tre_stack_t *stack, tre_ast_node_t *ast,
 		*result = tre_ast_new_literal(mem, min, max, pos);
 		if (*result == NULL)
 		  status = REG_ESPACE;
+		else {
+		  tre_literal_t *p = (*result)->obj;
+		  p->class = lit->class;
+		  p->neg_classes = lit->neg_classes;
+		}

 		if (pos > *max_pos)
 		  *max_pos = pos;