• T
    Re-implement extraction of fixed prefixes from regular expressions. · 628cbb50
    Tom Lane 提交于
    To generate btree-indexable conditions from regex WHERE conditions (such as
    WHERE indexed_col ~ '^foo'), we need to be able to identify any fixed
    prefix that a regex might have; that is, find any string that must be a
    prefix of all strings satisfying the regex.  We used to do that with
    entirely ad-hoc code that looked at the source text of the regex.  It
    didn't know very much about regex syntax, which mostly meant that it would
    fail to identify some optimizable cases; but Viktor Rosenfeld reported that
    it would produce actively wrong answers for quantified parenthesized
    subexpressions, such as '^(foo)?bar'.  Rather than trying to extend the
    ad-hoc code to cover this, let's get rid of it altogether in favor of
    identifying prefixes by examining the compiled form of a regex.
    
    To do this, I've added a new entry point "pg_regprefix" to the regex library;
    hopefully it is defined in a sufficiently general fashion that it can remain
    in the library when/if that code gets split out as a standalone project.
    
    Since this bug has been there for a very long time, this fix needs to get
    back-patched.  However it depends on some other recent commits (particularly
    the addition of wchar-to-database-encoding conversion), so I'll commit this
    separately and then go to work on back-porting the necessary fixes.
    628cbb50
regex.sql 1.5 KB