1. 27 9月, 2017 40 次提交
    • E
      Don't assume a subquery's output is unique if there's a SRF in its tlist · e7ff3ef1
      Ekta Khanna and Jemish Patel 提交于
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Tue Jul 8 14:03:32 2014 -0400
      
          While the x output of "select x from t group by x" can be presumed unique,
          this does not hold for "select x, generate_series(1,10) from t group by x",
          because we may expand the set-returning function after the grouping step.
          (Perhaps that should be re-thought; but considering all the other oddities
          involved with SRFs in targetlists, it seems unlikely we'll change it.)
          Put a check in query_is_distinct_for() so it's not fooled by such cases.
      
          Back-patch to all supported branches.
      
          David Rowley
      
      (cherry picked from commit 2e7469dc8b3bac4fe0f9bd042aaf802132efde85)
      e7ff3ef1
    • E
      Misc test answer file changes · 8bd49b1b
      Ekta Khanna 提交于
      Signed-off-by: NJemish Patel <jpatel@pivotal.io>
      8bd49b1b
    • E
      Fix possible crash with nested SubLinks. · cb7e418d
      Ekta Khanna 提交于
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Tue Dec 10 16:10:36 2013 -0500
      
      An expression such as WHERE (... x IN (SELECT ...) ...) IN (SELECT ...)
      could produce an invalid plan that results in a crash at execution time,
      if the planner attempts to flatten the outer IN into a semi-join.
      This happens because convert_testexpr() was not expecting any nested
      SubLinks and would wrongly replace any PARAM_SUBLINK Params belonging
      to the inner SubLink.  (I think the comment denying that this case could
      happen was wrong when written; it's certainly been wrong for quite a long
      time, since very early versions of the semijoin flattening logic.)
      
      Per report from Teodor Sigaev.  Back-patch to all supported branches.
      
      (cherry picked from commit 884c6384a2db34f6a65573e6bfd4b71dfba0de90)
      cb7e418d
    • E
      Fix planner's handling of outer PlaceHolderVars within subqueries. · 45cbf64a
      Ekta Khanna 提交于
      commit 0a0ca1cb18a34e92ab549df171e174dcce7bf7a3
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Sat Mar 24 16:22:00 2012 -0400
      
          Fix planner's handling of outer PlaceHolderVars within subqueries.
      
          For some reason, in the original coding of the PlaceHolderVar mechanism
          I had supposed that PlaceHolderVars couldn't propagate into subqueries.
          That is of course entirely possible.  When it happens, we need to treat
          an outer-level PlaceHolderVar much like an outer Var or Aggref, that is
          SS_replace_correlation_vars() needs to replace the PlaceHolderVar with
          a Param, and then when building the finished SubPlan we have to provide
          the PlaceHolderVar expression as an actual parameter for the SubPlan.
          The handling of the contained expression is a bit delicate but it can be
          treated exactly like an Aggref's expression.
      
          In addition to the missing logic in subselect.c, prepjointree.c was failing
          to search subqueries for PlaceHolderVars that need their relids adjusted
          during subquery pullup.  It looks like everyplace else that touches
          PlaceHolderVars got it right, though.
      
          Per report from Mark Murawski.  In 9.1 and HEAD, queries affected by this
          oversight would fail with "ERROR: Upper-level PlaceHolderVar found where
          not expected".  But in 9.0 and 8.4, you'd silently get possibly-wrong
          answers, since the value transmitted into the subquery wouldn't go to null
          when it should.
      45cbf64a
    • S
      Remove is_simple_subquery() check in simplify_EXISTS_query() · 77f804f5
      Shreedhar Hardikar 提交于
      GPDB handles a lot of the cases that are restricted by
      is_simple_subquery; and the restrictions not handled, are checked for
      separately in convert_EXISTS_sublink_to_join().
      
      Resulting from cascading ICG failures, we also fixed the following:
      
      - initialize all the members of IncrementVarSublevelsUp_context
        properly.
      - remove incorrect assertions brought in from upstream. In GPDB, these
        cases are handled.
      - improve plans for NOT EXISTS sub-queries containing an aggregation
        without limits by creating a "false" one-time filter.
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      77f804f5
    • S
      Remove dead code around JoinExpr::subqfromlist. · f16deabd
      Shreedhar Hardikar 提交于
      This was used to keep information about the subquery join tree for
      pulled-up sublinks for use later in deconstruct_recurse().  With the
      upstream subselect merge, a JoinExpr constructed at the pull-up time
      itself, so this is no longer needed since the subquery join tree
      information is available in the constructed JoinExpr.
      
      Also with the merge, deconstruct_recurse() handles JOIN_SEMI JoinExprs.
      However, since GPDB differs from upstream by treating SEMI joins as
      INNER join for internal join planning, this commit also updates
      inner_join_rels correctly for SEMI joins (see regression test).
      
      Also remove unused function declaration for not_null_inner_vars().
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      f16deabd
    • S
      Do not commute inner/outer rels of JOIN_ANTI and JOIN_LASJ_NOTIN · 70be2285
      Shreedhar Hardikar 提交于
      This issue was discovered during the subselect merge. wherin planner
      incorrectly commutes anti joins.
      `cdb_add_subquery_join_paths()` creates join paths for (rel1, rel2) and
      (rel2, rel1) for all join types including JOIN_ANTI and JOIN_LASJ_NOTIN.
      This produces wrong results since these joins are order-sensitive w.r.t
      inner and outer relations (see new regression tests). So, do not add
      (rel2, rel1) for JOIN_ANTI and JOIN_LASJ_NOTIN.
      
      This commit also refactors cdb_add_subquery_join_paths() and
      make_join_rel() to make it easier to control the commuting.
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      70be2285
    • S
      Handle pending merge FIXMEs from merging e549722a · cc208986
      Shreedhar Hardikar 提交于
      1. convert_IN_to_antijoin() should fail pull-up when left relids are not
         a subset of available_rels, otherwise we get wrong results. See
         regression tests in qp_correlated_query.sql.
      2. convert_EXPR_to_join() is a GPDB-only function that already handles
         this case via ProcessSubqueryToJoin().
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      cc208986
    • S
      Partial cherry-pick up of upstream commit 0dec322. · cdfc5616
      Shreedhar Hardikar 提交于
      commit 0dec3226ee905f94d0b9d6e2f274e13bbcaf5370
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Mon Jun 20 14:33:20 2011 -0400
      
          Fix thinko in previous patch for optimizing EXISTS-within-EXISTS.
      
          When recursing after an optimization in pull_up_sublinks_qual_recurse, the
          available_rels value passed down must include only the relations that are
          in the righthand side of the new SEMI or ANTI join; it's incorrect to pull
          up a sub-select that refers to other relations, as seen in the added test
          case.  Per report from BangarRaju Vadapalli.
      
      NOTE: The second part of the upstream commit is not pulled in because that
      produces inferior plans in GPDB by not pulling nested sublinks below NOT
      EXISTS. That part is reverted later upstream in 9.2 anyway.
      
      Also update regression tests.
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      cdfc5616
    • T
      Fix pull_up_sublinks' failure to handle nested pull-up opportunities · 40082bd2
      Tom Lane 提交于
      commit f3f0f37068e06d01e88abbf3ed596664b139f7e2
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Mon May 2 15:56:47 2011 -0400
      
          Fix pull_up_sublinks' failure to handle nested pull-up opportunities.
      
          After finding an EXISTS or ANY sub-select that can be converted to a
          semi-join or anti-join, we should recurse into the body of the sub-select.
          This allows cases such as EXISTS-within-EXISTS to be optimized properly.
          The original coding would leave the lower sub-select as a SubLink, which
          is no better and often worse than what we can do with a join.  Per example
          from Wayne Conrad.
      
          Back-patch to 8.4.  There is a related issue in older versions' handling
          of pull_up_IN_clauses, but they're lame enough anyway about the whole area
          that it seems not worth the extra work to try to fix.
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      40082bd2
    • T
      Fix mishandling of whole-row Vars referencing a view or sub-select · 385bb3cb
      Tom Lane 提交于
      commit c4ac2ff765d9b68a3ff2a3461804489721770d06
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Mon Jun 21 00:14:54 2010 +0000
      
          Fix mishandling of whole-row Vars referencing a view or sub-select.
          If such a Var appeared within a nested sub-select, we failed to translate it
          correctly during pullup of the view, because the recursive call to
          replace_rte_variables_mutator was looking for the wrong sublevels_up value.
          Bug was introduced during the addition of the PlaceHolderVar mechanism.
          Per bug #5514 from Marcos Castedo.
      385bb3cb
    • D
      Fix an oversight in convert_EXISTS_sublink_to_join · d3ff95a1
      Dhanashree Kashid 提交于
      commit dcd647d7cf98e3393f919135f6e113e896781f60
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Mon Jan 18 18:17:52 2010 +0000
      
          Fix an oversight in convert_EXISTS_sublink_to_join: we can't convert an
          EXISTS that contains a WITH clause.  This would usually lead to a
          "could not find CTE" error later in planning, because the WITH wouldn't
          get processed at all.  Noted while playing with an example from Ken Marshall.
      d3ff95a1
    • T
      Move exprType,exprTypmod,expression_tree_walker and related routines · e65f963b
      Tom Lane 提交于
        commit e5536e77
        Author: Tom Lane <tgl@sss.pgh.pa.us>
        Date:   Mon Aug 25 22:42:34 2008 +0000
      
            Move exprType(), exprTypmod(), expression_tree_walker(), and related routines
            into nodes/nodeFuncs, so as to reduce wanton cross-subsystem #includes inside
            the backend.  There's probably more that should be done along this line,
            but this is a start anyway
      Signed-off-by: NShreedhar Hardikar <shardikar@pivotal.io>
      e65f963b
    • T
      Change EXPLAIN output so that subplans and initplans · acfd5f4d
      Tom Lane 提交于
      commit fbcce080
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Sun Apr 5 19:59:40 2009 +0000
      
          Change EXPLAIN output so that subplans and initplans (particularly CTEs)
          are individually labeled, rather than just grouped under an "InitPlan"
          or "SubPlan" heading.  This in turn makes it possible for decompilation of
          a subplan reference to usefully identify which subplan it's referencing.
          I also made InitPlans identify which parameter symbol(s) they compute,
          so that references to those parameters elsewhere in the plan tree can
          be connected to the initplan that will be executed.  Per a gripe from
          Robert Haas about EXPLAIN output of a WITH query being inadequate,
          plus some longstanding pet peeves of my own.
      acfd5f4d
    • T
      Don't try to optimize EXISTS subqueries with empty FROM-lists · 16d21117
      Tom Lane 提交于
      commit 173a6760
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Mon Dec 8 00:16:09 2008 +0000
      
          Don't try to optimize EXISTS subqueries with empty FROM-lists: we need to
          form a join and that case doesn't have anything to join to.  (We could
          probably make it work if we didn't pull up the subquery, but it seems to
          me that the case isn't worth extra code.)  Per report from Greg Stark.
      16d21117
    • D
      Rename all 8.4-9.0 merge FIXMEs as `GPDB_84_MERGE_FIXME` · 2228c939
      Dhanashree Kashid, Ekta Khanna and Omer Arap 提交于
      We had a bunch of fixmes that we added as part of the subselect merge;
      All of the fixmes are now marked as `GPDB_84_MERGE_FIXME` so that they can
      be grepped easily.
      2228c939
    • D
      Updated test answer files after merging dc9cc88 · 745bcd2b
      Dhanashree Kashid 提交于
      After the introduction of `placeholder` mechanism,
      we see some changes in the plan.
      These are all good changes in that we remove the
      redundant `NOTIN_subquery` subquery scan nodes:
      
      Before
      ```
      ...
      ->  Broadcast Motion 3:3  (slice1; segments: 3)
      	->  Subquery Scan "NotIn_SUBQUERY"
      		->  Seq Scan on subselect_tbl
      			Filter: f3 IS NOT NULL
      ```
      After
      ```
      ...
      ->  Broadcast Motion 3:3  (slice1; segments: 3)
      	 ->  Seq Scan on subselect_tbl
      	 	 Filter: f3 IS NOT NULL
      ```
      
      New plans have better cost since we will not be executing
      subquery scan for each tuple coming from its OUTER.
      
      We are now consistent with plans produced for same queries
      with IN instead of NOT IN.
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      745bcd2b
    • D
      Fix PlaceHolderVar mechanism's interaction with outer joins. · e06bc957
      Dhanashree Kashid 提交于
      commit dc9cc887b74bfa0d40829c4df66dead509fdd8f6
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Tue Sep 28 14:15:42 2010 -0400
      
          The point of a PlaceHolderVar is to allow a non-strict expression to be
          evaluated below an outer join, after which its value bubbles up like a Var
          and can be forced to NULL when the outer join's semantics require that.
          However, there was a serious design oversight in that, namely that we
          didn't ensure that there was actually a correct place in the plan tree
          to evaluate the placeholder :-(.  It may be necessary to delay evaluation
          of an outer join to ensure that a placeholder that should be evaluated
          below the join can be evaluated there.  Per recent bug report from Kirill
          Simonov.
      
          Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      e06bc957
    • D
      Fix subquery pullup to wrap a PlaceHolderVar around the entire RowExpr · ac66ca53
      Dhanashree Kashid 提交于
      commit 2bdd765f79df947b46a8b5a22e3b993b58cd6d32
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Wed Sep 2 17:52:33 2009 +0000
      
          Fix subquery pullup to wrap a PlaceHolderVar around the entire RowExpr
          that's generated for a whole-row Var referencing the subquery, when the
          subquery is in the nullable side of an outer join.  The previous coding
          instead put PlaceHolderVars around the elements of the RowExpr.  The effect
          was that when the outer join made the subquery outputs go to null, the
          whole-row Var produced ROW(NULL,NULL,...) rather than just NULL.  There
          are arguments afoot about whether those things ought to be semantically
          indistinguishable, but for the moment they are not entirely so, and the
          planner needs to take care that its machinations preserve the difference.
          Per bug #5025.
      
          Making this feasible required refactoring ResolveNew() to allow more caller
          control over what is substituted for a Var.  I chose to make ResolveNew()
          a wrapper around a new general-purpose function replace_rte_variables().
          I also fixed the ancient bogosity that ResolveNew might fail to set
          a query's hasSubLinks field after inserting a SubLink in it.  Although
          all current callers make sure that happens anyway, we've had bugs of that
          sort before, and it seemed like a good time to install a proper solution.
      
          Back-patch to 8.4.  The problem can be demonstrated clear back to 8.0,
          but the fix would be too invasive in earlier branches; not to mention
          that people may be depending on the subtly-incorrect behavior.  The
          8.4 series is new enough that fixing this probably won't cause complaints,
          but it might in older branches.  Also, 8.4 shows the incorrect behavior
          in more cases than older branches do, because it is able to flatten
          subqueries in more cases.
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      ac66ca53
    • E
      Improve pull_up_subqueries logic w.r.t PlaceHolderVar · da29e67a
      Ekta Khanna 提交于
      commit c59d8dd4
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Tue Apr 28 21:31:16 2009 +0000
      
          Improve pull_up_subqueries logic so that it doesn't insert unnecessary
          PlaceHolderVar nodes in join quals appearing in or below the lowest
          outer join that could null the subquery being pulled up.  This improves
          the planner's ability to recognize constant join quals, and probably
          helps with detection of common sort keys (equivalence classes) as well.
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      da29e67a
    • E
      Refrain from creating the planner's placeholder_list · 695c9fdf
      Ekta Khanna 提交于
      commit 31468d05
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Wed Oct 22 20:17:52 2008 +0000
      
          Dept of better ideas: refrain from creating the planner's placeholder_list
          until vars are distributed to rels during query_planner() startup.  We don't
          really need it before that, and not building it early has some advantages.
          First, we don't need to put it through the various preprocessing steps, which
          saves some cycles and eliminates the need for a number of routines to support
          PlaceHolderInfo nodes at all.  Second, this means one less unused plan for any
          sub-SELECT appearing in a placeholder's expression, since we don't build
          placeholder_list until after sublink expansion is complete.
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      695c9fdf
    • B
      Add a concept of "placeholder" variables to the planner · 2b5c8201
      Bhuvnesh Chaudhary 提交于
      commit e6ae3b5d
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Tue Oct 21 20:42:53 2008 +0000
      
          Add a concept of "placeholder" variables to the planner.  These are variables
          that represent some expression that we desire to compute below the top level
          of the plan, and then let that value "bubble up" as though it were a plain
          Var (ie, a column value).
      
          The immediate application is to allow sub-selects to be flattened even when
          they are below an outer join and have non-nullable output expressions.
          Formerly we couldn't flatten because such an expression wouldn't properly
          go to NULL when evaluated above the outer join.  Now, we wrap it in a
          PlaceHolderVar and arrange for the actual evaluation to occur below the outer
          join.  When the resulting Var bubbles up through the join, it will be set to
          NULL if necessary, yielding the correct results.  This fixes a planner
          limitation that's existed since 7.1.
      
          In future we might want to use this mechanism to re-introduce some form of
          Hellerstein's "expensive functions" optimization, ie place the evaluation of
          an expensive function at the most suitable point in the plan tree.
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      2b5c8201
    • B
      Arrange to convert EXISTS subqueries that are equivalent to hashable IN · 18a2f85d
      Bhuvnesh Chaudhary 提交于
      commit bd3dadda
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Fri Aug 22 00:16:04 2008 +0000
      
          Arrange to convert EXISTS subqueries that are equivalent to hashable IN
          subqueries into the same thing you'd have gotten from IN (except always with
          unknownEqFalse = true, so as to get the proper semantics for an EXISTS).
          I believe this fixes the last case within CVS HEAD in which an EXISTS could
          give worse performance than an equivalent IN subquery.
      
          The tricky part of this is that if the upper query probes the EXISTS for only
          a few rows, the hashing implementation can actually be worse than the default,
          and therefore we need to make a cost-based decision about which way to use.
          But at the time when the planner generates plans for subqueries, it doesn't
          really know how many times the subquery will be executed.  The least invasive
          solution seems to be to generate both plans and postpone the choice until
          execution.  Therefore, in a query that has been optimized this way, EXPLAIN
          will show two subplans for the EXISTS, of which only one will actually get
          executed.
      
          There is a lot more that could be done based on this infrastructure: in
          particular it's interesting to consider switching to the hash plan if we start
          out using the non-hashed plan but find a lot more upper rows going by than we
          expected.  I have therefore left some minor inefficiencies in place, such as
          initializing both subplans even though we will currently only use one.
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      18a2f85d
    • J
      Cherry-pick upstream commit: 8309d006 · fe791d96
      Jemish Patel 提交于
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Sat Nov 22 22:47:06 2008 +0000
      
          Switch the planner over to treating qualifications of a JOIN_SEMI join as
          though it is an inner rather than outer join type.  This essentially means
          that we don't bother to separate "pushed down" qual conditions from actual
          join quals at a semijoin plan node; which is okay because the restrictions of
          SQL syntax make it impossible to have a pushed-down qual that references the
          inner side of a semijoin.  This allows noticeably better optimization of
          IN/EXISTS cases than we had before, since the equivalence-class machinery can
          now use those quals.  Also fix a couple of other mistakes that had essentially
          disabled the ability to unique-ify the inner relation and then join it to just
          a subset of the left-hand relations.  An example case using the regression
          database is
      
          select * from tenk1 a, tenk1 b
          where (a.unique1,b.unique2) in (select unique1,unique2 from tenk1 c);
      
          which is planned reasonably well by 8.3 and earlier but had been forcing a
          cartesian join of a/b in CVS HEAD.
      
      Ref [#147647503]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      fe791d96
    • J
      Marginal improvement in sublink planning · 6e907566
      Jemish Patel 提交于
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Wed Aug 20 19:58:24 2008 +0000
      
          Marginal improvement in sublink planning: allow unknownEqFalse optimization
          to be used for SubLinks that are underneath a top-level OR clause.  Just as at
          the very top level of WHERE, it's not necessary to be accurate about whether
          the sublink returns FALSE or NULL, because either result has the same impact
          on whether the WHERE will succeed.
      
      (cherry picked from commit cc0dd438)
      
      Ref [#142356645]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      6e907566
    • J
      Cherry pick upstream commit: 390e59cd · 1ea5c119
      Jemish Patel 提交于
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Wed Aug 20 15:49:30 2008 +0000
      
          Fix obsolete comment.  It's no longer the case that Param nodes don't
          carry typmod.
      
      (cherry picked from commit 390e59cd)
      
      Ref [#142356645]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      1ea5c119
    • S
      Update subselect EXPLAIN tests · e0415acc
      Shreedhar Hardikar 提交于
      This commit updates test answer files after merging e006a24a, 19e34b62 and
      e549722a. As a result of these merges, we produce inferior plans for
      which we have added FIX_ME's appropriately. They will be addressed in
      later commits.
      e0415acc
    • D
      Implement CDB like pre-join deduplication · efb2777a
      Dhanashree Kashid, Ekta Khanna and Omer Arap 提交于
      For flattened IN or EXISTS sublinks, if we chose INNER JOIN path instead
      of SEMI JOIN then we need to apply duplicate suppression.
      
      The deduplication can be done in two ways:
      1. post-join dedup
      unique-ify the inner join results. try_postjoin_dedup in CdbRelDedupInfo denotes
      if we need to got for post-join dedup
      
      2. pre-join dedup
      unique-ify the rows coming from the rel containing the subquery result,
      before that is joined with any other rels. join_unique_ininfo in
      CdbRelDedupInfo denotes if we need to go for pre-join dedup.
      semi_operators and semi_rhs_exprs are used for this. We ported a
      function from 9.5 to compute these in make_outerjoininfo().
      
      Upstream has completely different implementation of this. Upstream explores JOIN_UNIQUE_INNER
      and JOIN_UNIQUE_OUTER paths for this and deduplication is done create_unique_path().
      GPDB does this differently since JOIN_UNIQUE_INNER and JOIN_UNIQUE_OUTER are obsolete
      for us. Hence we have kept the GPDB style deduplication mechanism as it in this merge.
      
      Post-join has been implemented in previous merge commits.
      
      Ref [#146890743]
      efb2777a
    • S
      CDB Specific changes, other fix-ups after merging e549722a · e5f6e826
      Shreedhar Hardikar 提交于
      0. Fix up post join dedup logic after cherry-pick
      0. Fix pull_up_sublinks_jointree_recurse returning garbage relids
      0. Update gporca, rangefuncs, eagerfree answer fileis
      	1. gporca
      	Previously we were generating a Hash Inner Join with an
      	HashAggregate for deduplication. Now we generate a Hash
      	Semi Join in which case we do not need to deduplicate the
      	inner.
      
      	2. rangefuncs
      	We updated this answer file during the cherry-pick of
      	e006a24a since there was a change in plan.
      	After these cherry-picks, we are back to the original
      	plan as master. Hence we see the original error.
      
      	3. eagerfree
      	We are generating a not-very-useful subquery scan node
      	with this change. This is not producing wrong results.
      	But this subqeury scan needs to be removed.
      	We will file a follow-up chore to investigate and fix this.
      
      0. We no longer need helper function `hasSemiJoin()` to check whether
      this specialInfo list has any specialJoinInfos constructed for Semi Join
      (IN/EXISTS sublink). We have moved that check inside
      `cdb_set_cheapest_dedup()`
      
      0. We are not exercising the pre-join-deduplication code path after
      this cherry-pick. Before this merge, we had three CDB specific
      nodes in `InClauseInfo` in which we recorded information for
      pre-join-dedup in case of simple uncorrelated IN sublinks.
      `try_join_unique`, `sub_targetlist` and `InOperators`
      Since we now have `SpecialJoinInfo` instead of `InClauseInfo`, we need
      to devise a way to record this information in `SpecialJoinInfo`.
      We have filed a follow-up story for this.
      
      Ref [#142356521]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      e5f6e826
    • D
      Get rid of the rather fuzzily defined FlattenedSubLink node type · 330dd1b3
      Dhanashree Kashid 提交于
      commit e549722a
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Wed Feb 25 03:30:38 2009 +0000
      
          Get rid of the rather fuzzily defined FlattenedSubLink node type in favor of
          making pull_up_sublinks() construct a full-blown JoinExpr tree representation
          of IN/EXISTS SubLinks that it is able to convert to semi or anti joins.
          This makes pull_up_sublinks() a shade more complex, but the gain in semantic
          clarity is worth it.  I still have more to do in this area to address the
          previously-discussed problems, but this commit in itself fixes at least one
          bug in HEAD, as shown by added regression test case.
      
      Ref [#142356521]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      330dd1b3
    • E
      Improve sublink pullup code to handle ANY/EXISTS sublinks · 1ddcb97e
      Ekta Khanna 提交于
      commit 19e34b62
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Sun Aug 17 01:20:00 2008 +0000
      
          Improve sublink pullup code to handle ANY/EXISTS sublinks that are at top
          level of a JOIN/ON clause, not only at top level of WHERE.  (However, we
          can't do this in an outer join's ON clause, unless the ANY/EXISTS refers
          only to the nullable side of the outer join, so that it can effectively
          be pushed down into the nullable side.)  Per request from Kevin Grittner.
      
          In passing, fix a bug in the initial implementation of EXISTS pullup:
          it would Assert if the EXIST's WHERE clause used a join alias variable.
          Since we haven't yet flattened join aliases when this transformation
          happens, it's necessary to include join relids in the computed set of
          RHS relids.
      
      Ref [#142356521]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      1ddcb97e
    • E
      Update test output files and indentation · 102295b8
      Ekta Khanna and Omer Arap 提交于
      This commit updates test answer files after merging with e006a24a.
      
      0. Replace `EXISTS Join` to `Semi Join`
      0. Replace `Left Anti Semi Join` to `Anti Join`
      0. Updated plan for `table_functions` for IN queries as we do not pull
      up the sublink and convert it into a join when sublink testexpr does not
      contain Vars of parent query.
      0. Updated the error in `rangefuncs.out` since we now create a different
      plan for the following query :
      ```
      CREATE TABLE foorescan (fooid int, foosubid int, fooname text, primary key(fooid,foosubid));
      INSERT INTO foorescan values(5000,1,'abc.5000.1');
      INSERT INTO foorescan values(5001,1,'abc.5001.1');
      CREATE FUNCTION foorescan(int,int) RETURNS setof foorescan AS 'SELECT * FROM foorescan WHERE fooid >= $1 and fooid < $2 ;' LANGUAGE SQL;
      SELECT * FROM foorescan f WHERE f.fooid IN (SELECT fooid FROM foorescan(5002,5004)) ORDER BY 1,2;
      ```
      Plan before fix :
      ```
                                                    QUERY PLAN
      ------------------------------------------------------------------------------------------------------
       Sort  (cost=270.41..270.54 rows=50 width=50)
         Sort Key: f.fooid, f.foosubid
         ->  HashAggregate  (cost=268.50..269.00 rows=50 width=50)
               Group By: f.ctid::bigint, f.gp_segment_id
               ->  Hash Join  (cost=5.12..268.25 rows=50 width=29)
                     Hash Cond: foorescan.fooid = f.fooid
                     ->  Function Scan on foorescan  (cost=0.00..260.00 rows=1000 width=4)
                     ->  Hash  (cost=4.50..4.50 rows=17 width=29)
                           ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..4.50 rows=50 width=29)
                                 ->  Seq Scan on foorescan f  (cost=0.00..3.50 rows=17 width=29)
       Settings:  optimizer=off
       Optimizer status: legacy query optimizer
      (12 rows)
      
      ```
      here the function scan is done on master and since the function is
      accessing a distributed relation,  `cdbdisp_dispatchToGang()`
      errors out.
      
      Plan after fix:
      ```
       explain SELECT * FROM foorescan f WHERE f.fooid IN (SELECT fooid FROM foorescan(5002,5004)) ORDER BY 1,2;
                                                    QUERY PLAN
      ------------------------------------------------------------------------------------------------------
       Gather Motion 3:1  (slice3; segments: 3)  (cost=299.16..299.29 rows=50 width=19)
         Merge Key: f.fooid, f.foosubid
         ->  Sort  (cost=299.16..299.29 rows=17 width=19)
               Sort Key: f.fooid, f.foosubid
               ->  Hash Semi Join  (cost=292.50..297.75 rows=17 width=19)
                     Hash Cond: f.fooid = foorescan.fooid
                     ->  Redistribute Motion 3:3  (slice1; segments: 3)  (cost=0.00..4.50 rows=17 width=19)
                           Hash Key: f.fooid
                           ->  Seq Scan on foorescan f  (cost=0.00..3.50 rows=17 width=19)
                     ->  Hash  (cost=280.00..280.00 rows=334 width=4)
                           ->  Redistribute Motion 1:3  (slice2)  (cost=0.00..280.00 rows=1000 width=4)
                                 Hash Key: foorescan.fooid
                                 ->  Function Scan on foorescan  (cost=0.00..260.00 rows=1000 width=4)
       Settings:  optimizer=off
       Optimizer status: legacy query optimizer
      (15 rows)
      ```
      With this new plan, function scan is executed on segment in which case
      `init_sql_fcache()` first walks the query tree and checks if it is safe
      to be planned and executed on the segment using `querytree_safe_for_segment_walker()`.
      `querytree_safe_for_segment_walker()` errors out since the function is
      accessing distributed table. Both the new and old errors are testing the same scenario.
      But due to plan change, the place where we bail out is different.
      
      Ref [#142355175]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      102295b8
    • E
      Replace JOIN_LASJ by JOIN_ANTI · 6e7b4722
      Ekta Khanna 提交于
      After merging with e006a24a, Anti Semi Join will
      be denoted by `JOIN_ANTI` instead of `JOIN_LASJ`
      
      Ref [#142355175]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      6e7b4722
    • E
      Remove InClauseInfo and OuterJoinInfo · 8b63aafb
      Ekta Khanna 提交于
      Since `InClauseInfo` and `OuterJoinInfo` are now combined into
      `SpecialJoinInfo` after merging with e006a24a; this commit remove them
      from the relevant places.
      
      Access `join_info_list` instead of `in_info_list` and `oj_info_list`
      
      Previously, `CdbRelDedupInfo` contained list of `InClauseInfo` s. While
      making join decisions and overall join processing, we traversed this list
      and invoked cdb specific functions: `cdb_make_rel_dedup_info()`, `cdbpath_dedup_fixup()`
      
      Since `InClauseInfo` is no longer available,  `CdbRelDedupInfo` will contain list of
      `SpecialJoinInfo` s. All the cdb specific routines which were previously called for
      `InClauseInfo` list will now be called if `CdbRelDedupInfo` has valid `SpecialJoinInfo`
      list and if join type in `SpecialJoinInfo` is `JOIN_SEMI`. A new helper routine `hasSemiJoin()`
      has been added which traverses `SpecialJoinInfo` list to check if it contains `JOIN_SEMI`.
      
      Ref [#142355175]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      8b63aafb
    • E
      Replace JOIN_IN by JOIN_SEMI in ORCA translator · db853de9
      Ekta Khanna 提交于
      After merging with e006a24a, the jointype JOIN_IN has been renamed to
      JOIN_SEMI.
      This commit makes appropriate changes in ORCA for the same.
      
      Ref [#142355175]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      db853de9
    • E
      Add pullup decisions in `convert_ANY_sublink_to_join` · 7be2c0ad
      Ekta Khanna 提交于
      Add pullup decisions specific to CDB from `convert_IN_to_join`
      
      Ref [#142355175]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      7be2c0ad
    • D
      Add pullup decisions in `convert_EXISTS_sublink_to_join` · d91f0efb
      Dhanashree Kashid 提交于
      After merging with e006a24a, this commit adds CDB specific restrictions
      as follows:
      
      0. Add pullup decisions specific to CDB from
      `convert_EXISTS_to_join` and `convert_NOT_EXISTS_to_join`
      
      0. Before this cherry-pick, we used to generate extra quals
      for the NOT EXISTS query. This was done by calling `cdbpullup_expr()`
      in `convert_NOT_EXISTS_to_join()`.
      However, for the exact same query with EXISTS, we never generated these
      extra quals.
      ```
      create table foo(t text, n numeric, i int, v varchar(10)) distributed by (t);
      explain select * from foo t0 where not exists (select 1 from foo t1 where t0.i=t1.i + 1);
                                                 QUERY PLAN
      -------------------------------------------------------------------------------------------------
       Gather Motion 3:1  (slice2; segments: 3)  (cost=1.08..2.12 rows=4 width=19)
         ->  Hash Left Anti Semi Join  (cost=1.08..2.12 rows=2 width=19)
               Hash Cond: t0.i = (t1.i + 1)
               ->  Seq Scan on foo t0  (cost=0.00..1.00 rows=1 width=19)
               ->  Hash  (cost=1.04..1.04 rows=1 width=4)
                     ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..1.04 rows=1 width=4)
                           ->  Seq Scan on foo t1  (cost=0.00..1.00 rows=1 width=4)
                                 Filter: (i + 1) IS NOT NULL  -> extra filter
       Settings:  optimizer=off
       Optimizer status: legacy query optimizer
      (10 rows)
      
      explain select * from foo t0 where exists (select 1 from foo t1 where t0.i=t1.i + 1);
                                                 QUERY PLAN
      -------------------------------------------------------------------------------------------------
       Gather Motion 3:1  (slice2; segments: 3)  (cost=1.08..2.12 rows=4 width=19)
         ->  Hash EXISTS Join  (cost=1.08..2.12 rows=2 width=19)
               Hash Cond: t0.i = (t1.i + 1)
               ->  Seq Scan on foo t0  (cost=0.00..1.00 rows=1 width=19)
               ->  Hash  (cost=1.04..1.04 rows=1 width=4)
                     ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..1.04 rows=1 width=4)
                           ->  Seq Scan on foo t1  (cost=0.00..1.00 rows=1 width=4)
       Settings:  optimizer=off
       Optimizer status: legacy query optimizer
      (9 rows)
      
      ```
      Currently with this commit, the combined pull-up code for EXISTS and NOT EXISTS
      does not generate extra filters. This will be a future TODO.
      
      0. Use `is_simple_subquery` in `simplify_EXISTS_query` to check if
      subquery can be pulled up or not.
      
      Ref [#142355175]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      d91f0efb
    • D
      Remove old pullup functions after merging e006a24a · 2b5c1b9e
      Dhanashree Kashid 提交于
      With the new flow, we don't need the following functions:
      
       - pull_up_IN_clauses
       - convert_EXISTS_to_join
       - convert_NOT_EXISTS_to_antijoin
       - not_null_inner_vars
       - safe_to_convert_NOT_EXISTS
       - convert_sublink_to_join
      
      Ref [#142355175]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      2b5c1b9e
    • E
      CDBlize the cherry-pick e006a24a · 0feb1bd9
      Ekta Khanna 提交于
      Original Flow:
      cdb_flatten_sublinks
      	+--> pull_up_IN_clauses
      		+--> convert_sublink_to_join
      
      New Flow:
      cdb_flatten_sublinks
      	+--> pull_up_sublinks
      
      This commit contains relevant changes for the above flow.
      
      Previously, `try_join_unique` was part of `InClauseInfo`. It was getting
      set in `convert_IN_to_join()` and used in `cdb_make_rel_dedup_info()`.
      Now, since `InClauseInfo` is not present and we construct
      `FlattenedSublink` instead in `convert_ANY_sublink_to_join()`. And later
      in the flow, we construct `SpecialJoinInfo` from `FlattenedSublink` in
      `deconstruct_sublink_quals_to_rel()`. Hence, adding `try_join_unique` as
      part of both `FlattenedSublink` and `SpecialJoinInfo`.
      
      Ref [#142355175]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      0feb1bd9
    • E
      Implement SEMI and ANTI joins in the planner and executor. · fe2eb2c9
      Ekta Khanna 提交于
      commit e006a24a
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Thu Aug 14 18:48:00 2008 +0000
      
          Implement SEMI and ANTI joins in the planner and executor.  (Semijoins replace
          the old JOIN_IN code, but antijoins are new functionality.)  Teach the planner
          to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
          joins respectively.  Also, LEFT JOINs with suitable upper-level IS NULL
          filters are recognized as being anti joins.  Unify the InClauseInfo and
          OuterJoinInfo infrastructure into "SpecialJoinInfo".  With that change,
          it becomes possible to associate a SpecialJoinInfo with every join attempt,
          which permits some cleanup of join selectivity estimation.  That needs to be
          taken much further than this patch does, but the next step is to change the
          API for oprjoin selectivity functions, which seems like material for a
          separate patch.  So for the moment the output size estimates for semi and
          especially anti joins are quite bogus.
      
      Ref [#142355175]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      fe2eb2c9