1. 11 7月, 2012 1 次提交
    • T
      Re-implement extraction of fixed prefixes from regular expressions. · 628cbb50
      Tom Lane 提交于
      To generate btree-indexable conditions from regex WHERE conditions (such as
      WHERE indexed_col ~ '^foo'), we need to be able to identify any fixed
      prefix that a regex might have; that is, find any string that must be a
      prefix of all strings satisfying the regex.  We used to do that with
      entirely ad-hoc code that looked at the source text of the regex.  It
      didn't know very much about regex syntax, which mostly meant that it would
      fail to identify some optimizable cases; but Viktor Rosenfeld reported that
      it would produce actively wrong answers for quantified parenthesized
      subexpressions, such as '^(foo)?bar'.  Rather than trying to extend the
      ad-hoc code to cover this, let's get rid of it altogether in favor of
      identifying prefixes by examining the compiled form of a regex.
      
      To do this, I've added a new entry point "pg_regprefix" to the regex library;
      hopefully it is defined in a sufficiently general fashion that it can remain
      in the library when/if that code gets split out as a standalone project.
      
      Since this bug has been there for a very long time, this fix needs to get
      back-patched.  However it depends on some other recent commits (particularly
      the addition of wchar-to-database-encoding conversion), so I'll commit this
      separately and then go to work on back-porting the necessary fixes.
      628cbb50
  2. 24 2月, 2012 1 次提交
    • T
      Fix the general case of quantified regex back-references. · 173e29aa
      Tom Lane 提交于
      Cases where a back-reference is part of a larger subexpression that
      is quantified have never worked in Spencer's regex engine, because
      he used a compile-time transformation that neglected the need to
      check the back-reference match in iterations before the last one.
      (That was okay for capturing parens, and we still do it if the
      regex has *only* capturing parens ... but it's not okay for backrefs.)
      
      To make this work properly, we have to add an "iteration" node type
      to the regex engine's vocabulary of sub-regex nodes.  Since this is a
      moderately large change with a fair risk of introducing new bugs of its
      own, apply to HEAD only, even though it's a fix for a longstanding bug.
      173e29aa
  3. 20 2月, 2012 1 次提交
    • T
      Create the beginnings of internals documentation for the regex code. · 27af9143
      Tom Lane 提交于
      Create src/backend/regex/README to hold an implementation overview of
      the regex package, and fill it in with some preliminary notes about
      the code's DFA/NFA processing and colormap management.  Much more to
      do there of course.
      
      Also, improve some code comments around the colormap and cvec code.
      No functional changes except to add one missing assert.
      27af9143