1. 27 9月, 2017 23 次提交
    • J
      Marginal improvement in sublink planning · 6e907566
      Jemish Patel 提交于
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Wed Aug 20 19:58:24 2008 +0000
      
          Marginal improvement in sublink planning: allow unknownEqFalse optimization
          to be used for SubLinks that are underneath a top-level OR clause.  Just as at
          the very top level of WHERE, it's not necessary to be accurate about whether
          the sublink returns FALSE or NULL, because either result has the same impact
          on whether the WHERE will succeed.
      
      (cherry picked from commit cc0dd438)
      
      Ref [#142356645]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      6e907566
    • J
      Cherry pick upstream commit: 390e59cd · 1ea5c119
      Jemish Patel 提交于
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Wed Aug 20 15:49:30 2008 +0000
      
          Fix obsolete comment.  It's no longer the case that Param nodes don't
          carry typmod.
      
      (cherry picked from commit 390e59cd)
      
      Ref [#142356645]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      1ea5c119
    • S
      Update subselect EXPLAIN tests · e0415acc
      Shreedhar Hardikar 提交于
      This commit updates test answer files after merging e006a24a, 19e34b62 and
      e549722a. As a result of these merges, we produce inferior plans for
      which we have added FIX_ME's appropriately. They will be addressed in
      later commits.
      e0415acc
    • D
      Implement CDB like pre-join deduplication · efb2777a
      Dhanashree Kashid, Ekta Khanna and Omer Arap 提交于
      For flattened IN or EXISTS sublinks, if we chose INNER JOIN path instead
      of SEMI JOIN then we need to apply duplicate suppression.
      
      The deduplication can be done in two ways:
      1. post-join dedup
      unique-ify the inner join results. try_postjoin_dedup in CdbRelDedupInfo denotes
      if we need to got for post-join dedup
      
      2. pre-join dedup
      unique-ify the rows coming from the rel containing the subquery result,
      before that is joined with any other rels. join_unique_ininfo in
      CdbRelDedupInfo denotes if we need to go for pre-join dedup.
      semi_operators and semi_rhs_exprs are used for this. We ported a
      function from 9.5 to compute these in make_outerjoininfo().
      
      Upstream has completely different implementation of this. Upstream explores JOIN_UNIQUE_INNER
      and JOIN_UNIQUE_OUTER paths for this and deduplication is done create_unique_path().
      GPDB does this differently since JOIN_UNIQUE_INNER and JOIN_UNIQUE_OUTER are obsolete
      for us. Hence we have kept the GPDB style deduplication mechanism as it in this merge.
      
      Post-join has been implemented in previous merge commits.
      
      Ref [#146890743]
      efb2777a
    • S
      CDB Specific changes, other fix-ups after merging e549722a · e5f6e826
      Shreedhar Hardikar 提交于
      0. Fix up post join dedup logic after cherry-pick
      0. Fix pull_up_sublinks_jointree_recurse returning garbage relids
      0. Update gporca, rangefuncs, eagerfree answer fileis
      	1. gporca
      	Previously we were generating a Hash Inner Join with an
      	HashAggregate for deduplication. Now we generate a Hash
      	Semi Join in which case we do not need to deduplicate the
      	inner.
      
      	2. rangefuncs
      	We updated this answer file during the cherry-pick of
      	e006a24a since there was a change in plan.
      	After these cherry-picks, we are back to the original
      	plan as master. Hence we see the original error.
      
      	3. eagerfree
      	We are generating a not-very-useful subquery scan node
      	with this change. This is not producing wrong results.
      	But this subqeury scan needs to be removed.
      	We will file a follow-up chore to investigate and fix this.
      
      0. We no longer need helper function `hasSemiJoin()` to check whether
      this specialInfo list has any specialJoinInfos constructed for Semi Join
      (IN/EXISTS sublink). We have moved that check inside
      `cdb_set_cheapest_dedup()`
      
      0. We are not exercising the pre-join-deduplication code path after
      this cherry-pick. Before this merge, we had three CDB specific
      nodes in `InClauseInfo` in which we recorded information for
      pre-join-dedup in case of simple uncorrelated IN sublinks.
      `try_join_unique`, `sub_targetlist` and `InOperators`
      Since we now have `SpecialJoinInfo` instead of `InClauseInfo`, we need
      to devise a way to record this information in `SpecialJoinInfo`.
      We have filed a follow-up story for this.
      
      Ref [#142356521]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      e5f6e826
    • D
      Get rid of the rather fuzzily defined FlattenedSubLink node type · 330dd1b3
      Dhanashree Kashid 提交于
      commit e549722a
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Wed Feb 25 03:30:38 2009 +0000
      
          Get rid of the rather fuzzily defined FlattenedSubLink node type in favor of
          making pull_up_sublinks() construct a full-blown JoinExpr tree representation
          of IN/EXISTS SubLinks that it is able to convert to semi or anti joins.
          This makes pull_up_sublinks() a shade more complex, but the gain in semantic
          clarity is worth it.  I still have more to do in this area to address the
          previously-discussed problems, but this commit in itself fixes at least one
          bug in HEAD, as shown by added regression test case.
      
      Ref [#142356521]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      330dd1b3
    • E
      Improve sublink pullup code to handle ANY/EXISTS sublinks · 1ddcb97e
      Ekta Khanna 提交于
      commit 19e34b62
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Sun Aug 17 01:20:00 2008 +0000
      
          Improve sublink pullup code to handle ANY/EXISTS sublinks that are at top
          level of a JOIN/ON clause, not only at top level of WHERE.  (However, we
          can't do this in an outer join's ON clause, unless the ANY/EXISTS refers
          only to the nullable side of the outer join, so that it can effectively
          be pushed down into the nullable side.)  Per request from Kevin Grittner.
      
          In passing, fix a bug in the initial implementation of EXISTS pullup:
          it would Assert if the EXIST's WHERE clause used a join alias variable.
          Since we haven't yet flattened join aliases when this transformation
          happens, it's necessary to include join relids in the computed set of
          RHS relids.
      
      Ref [#142356521]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      1ddcb97e
    • E
      Update test output files and indentation · 102295b8
      Ekta Khanna and Omer Arap 提交于
      This commit updates test answer files after merging with e006a24a.
      
      0. Replace `EXISTS Join` to `Semi Join`
      0. Replace `Left Anti Semi Join` to `Anti Join`
      0. Updated plan for `table_functions` for IN queries as we do not pull
      up the sublink and convert it into a join when sublink testexpr does not
      contain Vars of parent query.
      0. Updated the error in `rangefuncs.out` since we now create a different
      plan for the following query :
      ```
      CREATE TABLE foorescan (fooid int, foosubid int, fooname text, primary key(fooid,foosubid));
      INSERT INTO foorescan values(5000,1,'abc.5000.1');
      INSERT INTO foorescan values(5001,1,'abc.5001.1');
      CREATE FUNCTION foorescan(int,int) RETURNS setof foorescan AS 'SELECT * FROM foorescan WHERE fooid >= $1 and fooid < $2 ;' LANGUAGE SQL;
      SELECT * FROM foorescan f WHERE f.fooid IN (SELECT fooid FROM foorescan(5002,5004)) ORDER BY 1,2;
      ```
      Plan before fix :
      ```
                                                    QUERY PLAN
      ------------------------------------------------------------------------------------------------------
       Sort  (cost=270.41..270.54 rows=50 width=50)
         Sort Key: f.fooid, f.foosubid
         ->  HashAggregate  (cost=268.50..269.00 rows=50 width=50)
               Group By: f.ctid::bigint, f.gp_segment_id
               ->  Hash Join  (cost=5.12..268.25 rows=50 width=29)
                     Hash Cond: foorescan.fooid = f.fooid
                     ->  Function Scan on foorescan  (cost=0.00..260.00 rows=1000 width=4)
                     ->  Hash  (cost=4.50..4.50 rows=17 width=29)
                           ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..4.50 rows=50 width=29)
                                 ->  Seq Scan on foorescan f  (cost=0.00..3.50 rows=17 width=29)
       Settings:  optimizer=off
       Optimizer status: legacy query optimizer
      (12 rows)
      
      ```
      here the function scan is done on master and since the function is
      accessing a distributed relation,  `cdbdisp_dispatchToGang()`
      errors out.
      
      Plan after fix:
      ```
       explain SELECT * FROM foorescan f WHERE f.fooid IN (SELECT fooid FROM foorescan(5002,5004)) ORDER BY 1,2;
                                                    QUERY PLAN
      ------------------------------------------------------------------------------------------------------
       Gather Motion 3:1  (slice3; segments: 3)  (cost=299.16..299.29 rows=50 width=19)
         Merge Key: f.fooid, f.foosubid
         ->  Sort  (cost=299.16..299.29 rows=17 width=19)
               Sort Key: f.fooid, f.foosubid
               ->  Hash Semi Join  (cost=292.50..297.75 rows=17 width=19)
                     Hash Cond: f.fooid = foorescan.fooid
                     ->  Redistribute Motion 3:3  (slice1; segments: 3)  (cost=0.00..4.50 rows=17 width=19)
                           Hash Key: f.fooid
                           ->  Seq Scan on foorescan f  (cost=0.00..3.50 rows=17 width=19)
                     ->  Hash  (cost=280.00..280.00 rows=334 width=4)
                           ->  Redistribute Motion 1:3  (slice2)  (cost=0.00..280.00 rows=1000 width=4)
                                 Hash Key: foorescan.fooid
                                 ->  Function Scan on foorescan  (cost=0.00..260.00 rows=1000 width=4)
       Settings:  optimizer=off
       Optimizer status: legacy query optimizer
      (15 rows)
      ```
      With this new plan, function scan is executed on segment in which case
      `init_sql_fcache()` first walks the query tree and checks if it is safe
      to be planned and executed on the segment using `querytree_safe_for_segment_walker()`.
      `querytree_safe_for_segment_walker()` errors out since the function is
      accessing distributed table. Both the new and old errors are testing the same scenario.
      But due to plan change, the place where we bail out is different.
      
      Ref [#142355175]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      102295b8
    • E
      Replace JOIN_LASJ by JOIN_ANTI · 6e7b4722
      Ekta Khanna 提交于
      After merging with e006a24a, Anti Semi Join will
      be denoted by `JOIN_ANTI` instead of `JOIN_LASJ`
      
      Ref [#142355175]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      6e7b4722
    • E
      Remove InClauseInfo and OuterJoinInfo · 8b63aafb
      Ekta Khanna 提交于
      Since `InClauseInfo` and `OuterJoinInfo` are now combined into
      `SpecialJoinInfo` after merging with e006a24a; this commit remove them
      from the relevant places.
      
      Access `join_info_list` instead of `in_info_list` and `oj_info_list`
      
      Previously, `CdbRelDedupInfo` contained list of `InClauseInfo` s. While
      making join decisions and overall join processing, we traversed this list
      and invoked cdb specific functions: `cdb_make_rel_dedup_info()`, `cdbpath_dedup_fixup()`
      
      Since `InClauseInfo` is no longer available,  `CdbRelDedupInfo` will contain list of
      `SpecialJoinInfo` s. All the cdb specific routines which were previously called for
      `InClauseInfo` list will now be called if `CdbRelDedupInfo` has valid `SpecialJoinInfo`
      list and if join type in `SpecialJoinInfo` is `JOIN_SEMI`. A new helper routine `hasSemiJoin()`
      has been added which traverses `SpecialJoinInfo` list to check if it contains `JOIN_SEMI`.
      
      Ref [#142355175]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      8b63aafb
    • E
      Replace JOIN_IN by JOIN_SEMI in ORCA translator · db853de9
      Ekta Khanna 提交于
      After merging with e006a24a, the jointype JOIN_IN has been renamed to
      JOIN_SEMI.
      This commit makes appropriate changes in ORCA for the same.
      
      Ref [#142355175]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      db853de9
    • E
      Add pullup decisions in `convert_ANY_sublink_to_join` · 7be2c0ad
      Ekta Khanna 提交于
      Add pullup decisions specific to CDB from `convert_IN_to_join`
      
      Ref [#142355175]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      7be2c0ad
    • D
      Add pullup decisions in `convert_EXISTS_sublink_to_join` · d91f0efb
      Dhanashree Kashid 提交于
      After merging with e006a24a, this commit adds CDB specific restrictions
      as follows:
      
      0. Add pullup decisions specific to CDB from
      `convert_EXISTS_to_join` and `convert_NOT_EXISTS_to_join`
      
      0. Before this cherry-pick, we used to generate extra quals
      for the NOT EXISTS query. This was done by calling `cdbpullup_expr()`
      in `convert_NOT_EXISTS_to_join()`.
      However, for the exact same query with EXISTS, we never generated these
      extra quals.
      ```
      create table foo(t text, n numeric, i int, v varchar(10)) distributed by (t);
      explain select * from foo t0 where not exists (select 1 from foo t1 where t0.i=t1.i + 1);
                                                 QUERY PLAN
      -------------------------------------------------------------------------------------------------
       Gather Motion 3:1  (slice2; segments: 3)  (cost=1.08..2.12 rows=4 width=19)
         ->  Hash Left Anti Semi Join  (cost=1.08..2.12 rows=2 width=19)
               Hash Cond: t0.i = (t1.i + 1)
               ->  Seq Scan on foo t0  (cost=0.00..1.00 rows=1 width=19)
               ->  Hash  (cost=1.04..1.04 rows=1 width=4)
                     ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..1.04 rows=1 width=4)
                           ->  Seq Scan on foo t1  (cost=0.00..1.00 rows=1 width=4)
                                 Filter: (i + 1) IS NOT NULL  -> extra filter
       Settings:  optimizer=off
       Optimizer status: legacy query optimizer
      (10 rows)
      
      explain select * from foo t0 where exists (select 1 from foo t1 where t0.i=t1.i + 1);
                                                 QUERY PLAN
      -------------------------------------------------------------------------------------------------
       Gather Motion 3:1  (slice2; segments: 3)  (cost=1.08..2.12 rows=4 width=19)
         ->  Hash EXISTS Join  (cost=1.08..2.12 rows=2 width=19)
               Hash Cond: t0.i = (t1.i + 1)
               ->  Seq Scan on foo t0  (cost=0.00..1.00 rows=1 width=19)
               ->  Hash  (cost=1.04..1.04 rows=1 width=4)
                     ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..1.04 rows=1 width=4)
                           ->  Seq Scan on foo t1  (cost=0.00..1.00 rows=1 width=4)
       Settings:  optimizer=off
       Optimizer status: legacy query optimizer
      (9 rows)
      
      ```
      Currently with this commit, the combined pull-up code for EXISTS and NOT EXISTS
      does not generate extra filters. This will be a future TODO.
      
      0. Use `is_simple_subquery` in `simplify_EXISTS_query` to check if
      subquery can be pulled up or not.
      
      Ref [#142355175]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      d91f0efb
    • D
      Remove old pullup functions after merging e006a24a · 2b5c1b9e
      Dhanashree Kashid 提交于
      With the new flow, we don't need the following functions:
      
       - pull_up_IN_clauses
       - convert_EXISTS_to_join
       - convert_NOT_EXISTS_to_antijoin
       - not_null_inner_vars
       - safe_to_convert_NOT_EXISTS
       - convert_sublink_to_join
      
      Ref [#142355175]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      2b5c1b9e
    • E
      CDBlize the cherry-pick e006a24a · 0feb1bd9
      Ekta Khanna 提交于
      Original Flow:
      cdb_flatten_sublinks
      	+--> pull_up_IN_clauses
      		+--> convert_sublink_to_join
      
      New Flow:
      cdb_flatten_sublinks
      	+--> pull_up_sublinks
      
      This commit contains relevant changes for the above flow.
      
      Previously, `try_join_unique` was part of `InClauseInfo`. It was getting
      set in `convert_IN_to_join()` and used in `cdb_make_rel_dedup_info()`.
      Now, since `InClauseInfo` is not present and we construct
      `FlattenedSublink` instead in `convert_ANY_sublink_to_join()`. And later
      in the flow, we construct `SpecialJoinInfo` from `FlattenedSublink` in
      `deconstruct_sublink_quals_to_rel()`. Hence, adding `try_join_unique` as
      part of both `FlattenedSublink` and `SpecialJoinInfo`.
      
      Ref [#142355175]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      0feb1bd9
    • E
      Implement SEMI and ANTI joins in the planner and executor. · fe2eb2c9
      Ekta Khanna 提交于
      commit e006a24a
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Thu Aug 14 18:48:00 2008 +0000
      
          Implement SEMI and ANTI joins in the planner and executor.  (Semijoins replace
          the old JOIN_IN code, but antijoins are new functionality.)  Teach the planner
          to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
          joins respectively.  Also, LEFT JOINs with suitable upper-level IS NULL
          filters are recognized as being anti joins.  Unify the InClauseInfo and
          OuterJoinInfo infrastructure into "SpecialJoinInfo".  With that change,
          it becomes possible to associate a SpecialJoinInfo with every join attempt,
          which permits some cleanup of join selectivity estimation.  That needs to be
          taken much further than this patch does, but the next step is to change the
          API for oprjoin selectivity functions, which seems like material for a
          separate patch.  So for the moment the output size estimates for semi and
          especially anti joins are quite bogus.
      
      Ref [#142355175]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      fe2eb2c9
    • L
      b68bcd89
    • L
      docs - restructure admin guide top level topics (#3389) · 56cdb519
      Lisa Owen 提交于
      * admin guide working w/dbs section - pull some topics up a level
      
      * promote ddl, crud topics; move querying topic up
      56cdb519
    • N
      Force behave to use older version of its dependency · b4cfe392
      Nadeem Ghani 提交于
      Behave by default just grabs the latest parse_type package as part of
      the setup requirement. However, the newest parse_type package (0.4.2)
      uses a new convention on how it handles packaging which is not in the
      older python versions before 2.7.13. Since we're in python 2.7.12, we
      break.
      
      Force requirement to use an older version as a hack to bypass this
      issue.
      Signed-off-by: NMarbin Tan <mtan@pivotal.io>
      b4cfe392
    • M
      2bc5401c
    • L
      f1bfd7d9
    • L
    • L
      e397112d
  2. 26 9月, 2017 13 次提交
  3. 25 9月, 2017 4 次提交