1. 11 7月, 2012 1 次提交
    • T
      Re-implement extraction of fixed prefixes from regular expressions. · 628cbb50
      Tom Lane 提交于
      To generate btree-indexable conditions from regex WHERE conditions (such as
      WHERE indexed_col ~ '^foo'), we need to be able to identify any fixed
      prefix that a regex might have; that is, find any string that must be a
      prefix of all strings satisfying the regex.  We used to do that with
      entirely ad-hoc code that looked at the source text of the regex.  It
      didn't know very much about regex syntax, which mostly meant that it would
      fail to identify some optimizable cases; but Viktor Rosenfeld reported that
      it would produce actively wrong answers for quantified parenthesized
      subexpressions, such as '^(foo)?bar'.  Rather than trying to extend the
      ad-hoc code to cover this, let's get rid of it altogether in favor of
      identifying prefixes by examining the compiled form of a regex.
      
      To do this, I've added a new entry point "pg_regprefix" to the regex library;
      hopefully it is defined in a sufficiently general fashion that it can remain
      in the library when/if that code gets split out as a standalone project.
      
      Since this bug has been there for a very long time, this fix needs to get
      back-patched.  However it depends on some other recent commits (particularly
      the addition of wchar-to-database-encoding conversion), so I'll commit this
      separately and then go to work on back-porting the necessary fixes.
      628cbb50
  2. 10 7月, 2012 1 次提交
    • T
      Refactor pattern_fixed_prefix() to avoid dealing in incomplete patterns. · 00dac600
      Tom Lane 提交于
      Previously, pattern_fixed_prefix() was defined to return whatever fixed
      prefix it could extract from the pattern, plus the "rest" of the pattern.
      That definition was sensible for LIKE patterns, but not so much for
      regexes, where reconstituting a valid pattern minus the prefix could be
      quite tricky (certainly the existing code wasn't doing that correctly).
      Since the only thing that callers ever did with the "rest" of the pattern
      was to pass it to like_selectivity() or regex_selectivity(), let's cut out
      the middle-man and just have pattern_fixed_prefix's subroutines do this
      directly.  Then pattern_fixed_prefix can return a simple selectivity
      number, and the question of how to cope with partial patterns is removed
      from its API specification.
      
      While at it, adjust the API spec so that callers who don't actually care
      about the pattern's selectivity (which is a lot of them) can pass NULL for
      the selectivity pointer to skip doing the work of computing a selectivity
      estimate.
      
      This patch is only an API refactoring that doesn't actually change any
      processing, other than allowing a little bit of useless work to be skipped.
      However, it's necessary infrastructure for my upcoming fix to regex prefix
      extraction, because after that change there won't be any simple way to
      identify the "rest" of the regex, not even to the low level of fidelity
      needed by regex_selectivity.  We can cope with that if regex_fixed_prefix
      and regex_selectivity communicate directly, but not if we have to work
      within the old API.  Hence, back-patch to all active branches.
      00dac600
  3. 09 7月, 2012 1 次提交
    • T
      Fix planner to pass correct collation to operator selectivity estimators. · e7ef6d7e
      Tom Lane 提交于
      We can do this without creating an API break for estimation functions
      by passing the collation using the existing fmgr functionality for
      passing an input collation as a hidden parameter.
      
      The need for this was foreseen at the outset, but we didn't get around to
      making it happen in 9.1 because of the decision to sort all pg_statistic
      histograms according to the database's default collation.  That meant that
      selectivity estimators generally need to use the default collation too,
      even if they're estimating for an operator that will do something
      different.  The reason it's suddenly become more interesting is that
      regexp interpretation also uses a collation (for its LC_TYPE not LC_COLLATE
      property), and we no longer want to use the wrong collation when examining
      regexps during planning.  It's not that the selectivity estimate is likely
      to change much from this; rather that we are thinking of caching compiled
      regexps during planner estimation, and we won't get the intended benefit
      if we cache them with a different collation than the executor will use.
      
      Back-patch to 9.1, both because the regexp change is likely to get
      back-patched and because we might as well get this right in all
      collation-supporting branches, in case any third-party code wants to
      rely on getting the collation.  The patch turns out to be minuscule
      now that I've done it ...
      e7ef6d7e
  4. 08 7月, 2012 1 次提交
    • T
      Simplify and document regex library's compact-NFA representation. · c6aae304
      Tom Lane 提交于
      The previous coding abused the first element of a cNFA state's arcs list
      to hold a per-state flag bit, which was confusing, undocumented, and not
      even particularly efficient.  Get rid of that in favor of a separate
      "stflags" vector.  Since there's only one bit in use, I chose to allocate a
      char per state; we could possibly replace this with a bitmap at some point,
      but that would make accesses a little slower.  It's already about 8X
      smaller than before, so let's not get overly tense.
      
      Also document the representation better than it was before, which is to say
      not at all.
      
      This patch is a byproduct of investigations towards extracting a "fixed
      prefix" string from the compact-NFA representation of regex patterns.
      Might need to back-patch it if we decide to back-patch that fix, but for
      now it's just code cleanup so I'll just put it in HEAD.
      c6aae304
  5. 07 7月, 2012 4 次提交
  6. 06 7月, 2012 10 次提交
  7. 05 7月, 2012 12 次提交
  8. 04 7月, 2012 10 次提交