提交 · e65f963b73d7ad68fd16058d7eef6ccb0d9f5623 · Greenplum / Gpdb

27 9月, 2017 35 次提交

Move exprType,exprTypmod,expression_tree_walker and related routines · e65f963b

由 Tom Lane 提交于 8月 25, 2008

  commit e5536e77
  Author: Tom Lane <tgl@sss.pgh.pa.us>
  Date:   Mon Aug 25 22:42:34 2008 +0000

      Move exprType(), exprTypmod(), expression_tree_walker(), and related routines
      into nodes/nodeFuncs, so as to reduce wanton cross-subsystem #includes inside
      the backend.  There's probably more that should be done along this line,
      but this is a start anyway
Signed-off-by: NShreedhar Hardikar <shardikar@pivotal.io>

e65f963b

Change EXPLAIN output so that subplans and initplans · acfd5f4d

由 Tom Lane 提交于 4月 05, 2009

commit fbcce080
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sun Apr 5 19:59:40 2009 +0000

    Change EXPLAIN output so that subplans and initplans (particularly CTEs)
    are individually labeled, rather than just grouped under an "InitPlan"
    or "SubPlan" heading.  This in turn makes it possible for decompilation of
    a subplan reference to usefully identify which subplan it's referencing.
    I also made InitPlans identify which parameter symbol(s) they compute,
    so that references to those parameters elsewhere in the plan tree can
    be connected to the initplan that will be executed.  Per a gripe from
    Robert Haas about EXPLAIN output of a WITH query being inadequate,
    plus some longstanding pet peeves of my own.

acfd5f4d

Don't try to optimize EXISTS subqueries with empty FROM-lists · 16d21117

由 Tom Lane 提交于 12月 08, 2008

commit 173a6760
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Mon Dec 8 00:16:09 2008 +0000

    Don't try to optimize EXISTS subqueries with empty FROM-lists: we need to
    form a join and that case doesn't have anything to join to.  (We could
    probably make it work if we didn't pull up the subquery, but it seems to
    me that the case isn't worth extra code.)  Per report from Greg Stark.

16d21117

Rename all 8.4-9.0 merge FIXMEs as `GPDB_84_MERGE_FIXME` · 2228c939

由 Dhanashree Kashid, Ekta Khanna and Omer Arap 提交于 8月 17, 2017

We had a bunch of fixmes that we added as part of the subselect merge;
All of the fixmes are now marked as `GPDB_84_MERGE_FIXME` so that they can
be grepped easily.

2228c939

Updated test answer files after merging dc9cc88 · 745bcd2b

由 Dhanashree Kashid 提交于 7月 25, 2017

After the introduction of `placeholder` mechanism,
we see some changes in the plan.
These are all good changes in that we remove the
redundant `NOTIN_subquery` subquery scan nodes:

Before
```
...
->  Broadcast Motion 3:3  (slice1; segments: 3)
	->  Subquery Scan "NotIn_SUBQUERY"
		->  Seq Scan on subselect_tbl
			Filter: f3 IS NOT NULL
```
After
```
...
->  Broadcast Motion 3:3  (slice1; segments: 3)
	 ->  Seq Scan on subselect_tbl
	 	 Filter: f3 IS NOT NULL
```

New plans have better cost since we will not be executing
subquery scan for each tuple coming from its OUTER.

We are now consistent with plans produced for same queries
with IN instead of NOT IN.
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

745bcd2b

Fix PlaceHolderVar mechanism's interaction with outer joins. · e06bc957

由 Dhanashree Kashid 提交于 7月 21, 2017

commit dc9cc887b74bfa0d40829c4df66dead509fdd8f6
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue Sep 28 14:15:42 2010 -0400

The point of a PlaceHolderVar is to allow a non-strict expression to be
evaluated below an outer join, after which its value bubbles up like a Var
and can be forced to NULL when the outer join's semantics require that.
However, there was a serious design oversight in that, namely that we
didn't ensure that there was actually a correct place in the plan tree
to evaluate the placeholder :-(. It may be necessary to delay evaluation
of an outer join to ensure that a placeholder that should be evaluated
below the join can be evaluated there. Per recent bug report from Kirill
Simonov.

Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

e06bc957

Fix subquery pullup to wrap a PlaceHolderVar around the entire RowExpr · ac66ca53

由 Dhanashree Kashid 提交于 7月 24, 2017

commit 2bdd765f79df947b46a8b5a22e3b993b58cd6d32
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed Sep 2 17:52:33 2009 +0000

Fix subquery pullup to wrap a PlaceHolderVar around the entire RowExpr
that's generated for a whole-row Var referencing the subquery, when the
subquery is in the nullable side of an outer join. The previous coding
instead put PlaceHolderVars around the elements of the RowExpr. The effect
was that when the outer join made the subquery outputs go to null, the
whole-row Var produced ROW(NULL,NULL,...) rather than just NULL. There
are arguments afoot about whether those things ought to be semantically
indistinguishable, but for the moment they are not entirely so, and the
planner needs to take care that its machinations preserve the difference.
Per bug #5025.

Making this feasible required refactoring ResolveNew() to allow more caller
control over what is substituted for a Var. I chose to make ResolveNew()
a wrapper around a new general-purpose function replace_rte_variables().
I also fixed the ancient bogosity that ResolveNew might fail to set
a query's hasSubLinks field after inserting a SubLink in it. Although
all current callers make sure that happens anyway, we've had bugs of that
sort before, and it seemed like a good time to install a proper solution.

Back-patch to 8.4. The problem can be demonstrated clear back to 8.0,
but the fix would be too invasive in earlier branches; not to mention
that people may be depending on the subtly-incorrect behavior. The
8.4 series is new enough that fixing this probably won't cause complaints,
but it might in older branches. Also, 8.4 shows the incorrect behavior
in more cases than older branches do, because it is able to flatten
subqueries in more cases.
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

ac66ca53

Improve pull_up_subqueries logic w.r.t PlaceHolderVar · da29e67a

由 Ekta Khanna 提交于 7月 24, 2017

commit c59d8dd4
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Tue Apr 28 21:31:16 2009 +0000

    Improve pull_up_subqueries logic so that it doesn't insert unnecessary
    PlaceHolderVar nodes in join quals appearing in or below the lowest
    outer join that could null the subquery being pulled up.  This improves
    the planner's ability to recognize constant join quals, and probably
    helps with detection of common sort keys (equivalence classes) as well.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

da29e67a

Refrain from creating the planner's placeholder_list · 695c9fdf

由 Ekta Khanna 提交于 7月 21, 2017

commit 31468d05
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed Oct 22 20:17:52 2008 +0000

Dept of better ideas: refrain from creating the planner's placeholder_list
until vars are distributed to rels during query_planner() startup. We don't
really need it before that, and not building it early has some advantages.
First, we don't need to put it through the various preprocessing steps, which
saves some cycles and eliminates the need for a number of routines to support
PlaceHolderInfo nodes at all. Second, this means one less unused plan for any
sub-SELECT appearing in a placeholder's expression, since we don't build
placeholder_list until after sublink expansion is complete.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

695c9fdf

Add a concept of "placeholder" variables to the planner · 2b5c8201

由 Bhuvnesh Chaudhary 提交于 7月 18, 2017

commit e6ae3b5d
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue Oct 21 20:42:53 2008 +0000

Add a concept of "placeholder" variables to the planner. These are variables
that represent some expression that we desire to compute below the top level
of the plan, and then let that value "bubble up" as though it were a plain
Var (ie, a column value).

The immediate application is to allow sub-selects to be flattened even when
they are below an outer join and have non-nullable output expressions.
Formerly we couldn't flatten because such an expression wouldn't properly
go to NULL when evaluated above the outer join. Now, we wrap it in a
PlaceHolderVar and arrange for the actual evaluation to occur below the outer
join. When the resulting Var bubbles up through the join, it will be set to
NULL if necessary, yielding the correct results. This fixes a planner
limitation that's existed since 7.1.

In future we might want to use this mechanism to re-introduce some form of
Hellerstein's "expensive functions" optimization, ie place the evaluation of
an expensive function at the most suitable point in the plan tree.
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

2b5c8201

Arrange to convert EXISTS subqueries that are equivalent to hashable IN · 18a2f85d

由 Bhuvnesh Chaudhary 提交于 7月 19, 2017

commit bd3dadda
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Aug 22 00:16:04 2008 +0000

Arrange to convert EXISTS subqueries that are equivalent to hashable IN
subqueries into the same thing you'd have gotten from IN (except always with
unknownEqFalse = true, so as to get the proper semantics for an EXISTS).
I believe this fixes the last case within CVS HEAD in which an EXISTS could
give worse performance than an equivalent IN subquery.

The tricky part of this is that if the upper query probes the EXISTS for only
a few rows, the hashing implementation can actually be worse than the default,
and therefore we need to make a cost-based decision about which way to use.
But at the time when the planner generates plans for subqueries, it doesn't
really know how many times the subquery will be executed. The least invasive
solution seems to be to generate both plans and postpone the choice until
execution. Therefore, in a query that has been optimized this way, EXPLAIN
will show two subplans for the EXISTS, of which only one will actually get
executed.

There is a lot more that could be done based on this infrastructure: in
particular it's interesting to consider switching to the hash plan if we start
out using the non-hashed plan but find a lot more upper rows going by than we
expected. I have therefore left some minor inefficiencies in place, such as
initializing both subplans even though we will currently only use one.
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

18a2f85d

Cherry-pick upstream commit: · fe791d96

由 Jemish Patel 提交于 7月 10, 2017

Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sat Nov 22 22:47:06 2008 +0000

Switch the planner over to treating qualifications of a JOIN_SEMI join as
though it is an inner rather than outer join type. This essentially means
that we don't bother to separate "pushed down" qual conditions from actual
join quals at a semijoin plan node; which is okay because the restrictions of
SQL syntax make it impossible to have a pushed-down qual that references the
inner side of a semijoin. This allows noticeably better optimization of
IN/EXISTS cases than we had before, since the equivalence-class machinery can
now use those quals. Also fix a couple of other mistakes that had essentially
disabled the ability to unique-ify the inner relation and then join it to just
a subset of the left-hand relations. An example case using the regression
database is

select * from tenk1 a, tenk1 b
where (a.unique1,b.unique2) in (select unique1,unique2 from tenk1 c);

which is planned reasonably well by 8.3 and earlier but had been forcing a
cartesian join of a/b in CVS HEAD.

Ref [#147647503]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

fe791d96

Marginal improvement in sublink planning · 6e907566

由 Jemish Patel 提交于 7月 10, 2017

Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Wed Aug 20 19:58:24 2008 +0000

    Marginal improvement in sublink planning: allow unknownEqFalse optimization
    to be used for SubLinks that are underneath a top-level OR clause.  Just as at
    the very top level of WHERE, it's not necessary to be accurate about whether
    the sublink returns FALSE or NULL, because either result has the same impact
    on whether the WHERE will succeed.

(cherry picked from commit cc0dd438)

Ref [#142356645]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

6e907566

Cherry pick upstream commit: · 1ea5c119

由 Jemish Patel 提交于 7月 10, 2017

Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Wed Aug 20 15:49:30 2008 +0000

    Fix obsolete comment.  It's no longer the case that Param nodes don't
    carry typmod.

(cherry picked from commit 390e59cd)

Ref [#142356645]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

1ea5c119

Update subselect EXPLAIN tests · e0415acc

由 Shreedhar Hardikar 提交于 6月 30, 2017

This commit updates test answer files after merging e006a24a, 19e34b62 and
e549722a. As a result of these merges, we produce inferior plans for
which we have added FIX_ME's appropriately. They will be addressed in
later commits.

e0415acc

Implement CDB like pre-join deduplication · efb2777a

由 Dhanashree Kashid, Ekta Khanna and Omer Arap 提交于 6月 20, 2017

For flattened IN or EXISTS sublinks, if we chose INNER JOIN path instead
of SEMI JOIN then we need to apply duplicate suppression.

The deduplication can be done in two ways:
1. post-join dedup
unique-ify the inner join results. try_postjoin_dedup in CdbRelDedupInfo denotes
if we need to got for post-join dedup

2. pre-join dedup
unique-ify the rows coming from the rel containing the subquery result,
before that is joined with any other rels. join_unique_ininfo in
CdbRelDedupInfo denotes if we need to go for pre-join dedup.
semi_operators and semi_rhs_exprs are used for this. We ported a
function from 9.5 to compute these in make_outerjoininfo().

Upstream has completely different implementation of this. Upstream explores JOIN_UNIQUE_INNER
and JOIN_UNIQUE_OUTER paths for this and deduplication is done create_unique_path().
GPDB does this differently since JOIN_UNIQUE_INNER and JOIN_UNIQUE_OUTER are obsolete
for us. Hence we have kept the GPDB style deduplication mechanism as it in this merge.

Post-join has been implemented in previous merge commits.

Ref [#146890743]

efb2777a

CDB Specific changes, other fix-ups after merging · e5f6e826

由 Shreedhar Hardikar 提交于 6月 08, 2017

0. Fix up post join dedup logic after cherry-pick
0. Fix pull_up_sublinks_jointree_recurse returning garbage relids
0. Update gporca, rangefuncs, eagerfree answer fileis
	1. gporca
	Previously we were generating a Hash Inner Join with an
	HashAggregate for deduplication. Now we generate a Hash
	Semi Join in which case we do not need to deduplicate the
	inner.

	2. rangefuncs
	We updated this answer file during the cherry-pick of
	e006a24a since there was a change in plan.
	After these cherry-picks, we are back to the original
	plan as master. Hence we see the original error.

	3. eagerfree
	We are generating a not-very-useful subquery scan node
	with this change. This is not producing wrong results.
	But this subqeury scan needs to be removed.
	We will file a follow-up chore to investigate and fix this.

0. We no longer need helper function `hasSemiJoin()` to check whether
this specialInfo list has any specialJoinInfos constructed for Semi Join
(IN/EXISTS sublink). We have moved that check inside
`cdb_set_cheapest_dedup()`

0. We are not exercising the pre-join-deduplication code path after
this cherry-pick. Before this merge, we had three CDB specific
nodes in `InClauseInfo` in which we recorded information for
pre-join-dedup in case of simple uncorrelated IN sublinks.
`try_join_unique`, `sub_targetlist` and `InOperators`
Since we now have `SpecialJoinInfo` instead of `InClauseInfo`, we need
to devise a way to record this information in `SpecialJoinInfo`.
We have filed a follow-up story for this.

Ref [#142356521]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

e5f6e826

Get rid of the rather fuzzily defined FlattenedSubLink node type · 330dd1b3

由 Dhanashree Kashid 提交于 6月 01, 2017

commit e549722a
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Wed Feb 25 03:30:38 2009 +0000

    Get rid of the rather fuzzily defined FlattenedSubLink node type in favor of
    making pull_up_sublinks() construct a full-blown JoinExpr tree representation
    of IN/EXISTS SubLinks that it is able to convert to semi or anti joins.
    This makes pull_up_sublinks() a shade more complex, but the gain in semantic
    clarity is worth it.  I still have more to do in this area to address the
    previously-discussed problems, but this commit in itself fixes at least one
    bug in HEAD, as shown by added regression test case.

Ref [#142356521]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

330dd1b3

Improve sublink pullup code to handle ANY/EXISTS sublinks · 1ddcb97e

由 Ekta Khanna 提交于 6月 01, 2017

commit 19e34b62
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sun Aug 17 01:20:00 2008 +0000

    Improve sublink pullup code to handle ANY/EXISTS sublinks that are at top
    level of a JOIN/ON clause, not only at top level of WHERE.  (However, we
    can't do this in an outer join's ON clause, unless the ANY/EXISTS refers
    only to the nullable side of the outer join, so that it can effectively
    be pushed down into the nullable side.)  Per request from Kevin Grittner.

    In passing, fix a bug in the initial implementation of EXISTS pullup:
    it would Assert if the EXIST's WHERE clause used a join alias variable.
    Since we haven't yet flattened join aliases when this transformation
    happens, it's necessary to include join relids in the computed set of
    RHS relids.

Ref [#142356521]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

1ddcb97e

Update test output files and indentation · 102295b8

由 Ekta Khanna and Omer Arap 提交于 5月 15, 2017

This commit updates test answer files after merging with e006a24a.

0. Replace `EXISTS Join` to `Semi Join`
0. Replace `Left Anti Semi Join` to `Anti Join`
0. Updated plan for `table_functions` for IN queries as we do not pull
up the sublink and convert it into a join when sublink testexpr does not
contain Vars of parent query.
0. Updated the error in `rangefuncs.out` since we now create a different
plan for the following query :
```
CREATE TABLE foorescan (fooid int, foosubid int, fooname text, primary key(fooid,foosubid));
INSERT INTO foorescan values(5000,1,'abc.5000.1');
INSERT INTO foorescan values(5001,1,'abc.5001.1');
CREATE FUNCTION foorescan(int,int) RETURNS setof foorescan AS 'SELECT * FROM foorescan WHERE fooid >= $1 and fooid < $2 ;' LANGUAGE SQL;
SELECT * FROM foorescan f WHERE f.fooid IN (SELECT fooid FROM foorescan(5002,5004)) ORDER BY 1,2;
```
Plan before fix :
```
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Sort  (cost=270.41..270.54 rows=50 width=50)
   Sort Key: f.fooid, f.foosubid
   ->  HashAggregate  (cost=268.50..269.00 rows=50 width=50)
         Group By: f.ctid::bigint, f.gp_segment_id
         ->  Hash Join  (cost=5.12..268.25 rows=50 width=29)
               Hash Cond: foorescan.fooid = f.fooid
               ->  Function Scan on foorescan  (cost=0.00..260.00 rows=1000 width=4)
               ->  Hash  (cost=4.50..4.50 rows=17 width=29)
                     ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..4.50 rows=50 width=29)
                           ->  Seq Scan on foorescan f  (cost=0.00..3.50 rows=17 width=29)
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
(12 rows)

```
here the function scan is done on master and since the function is
accessing a distributed relation,  `cdbdisp_dispatchToGang()`
errors out.

Plan after fix:
```
 explain SELECT * FROM foorescan f WHERE f.fooid IN (SELECT fooid FROM foorescan(5002,5004)) ORDER BY 1,2;
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice3; segments: 3)  (cost=299.16..299.29 rows=50 width=19)
   Merge Key: f.fooid, f.foosubid
   ->  Sort  (cost=299.16..299.29 rows=17 width=19)
         Sort Key: f.fooid, f.foosubid
         ->  Hash Semi Join  (cost=292.50..297.75 rows=17 width=19)
               Hash Cond: f.fooid = foorescan.fooid
               ->  Redistribute Motion 3:3  (slice1; segments: 3)  (cost=0.00..4.50 rows=17 width=19)
                     Hash Key: f.fooid
                     ->  Seq Scan on foorescan f  (cost=0.00..3.50 rows=17 width=19)
               ->  Hash  (cost=280.00..280.00 rows=334 width=4)
                     ->  Redistribute Motion 1:3  (slice2)  (cost=0.00..280.00 rows=1000 width=4)
                           Hash Key: foorescan.fooid
                           ->  Function Scan on foorescan  (cost=0.00..260.00 rows=1000 width=4)
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
(15 rows)
```
With this new plan, function scan is executed on segment in which case
`init_sql_fcache()` first walks the query tree and checks if it is safe
to be planned and executed on the segment using `querytree_safe_for_segment_walker()`.
`querytree_safe_for_segment_walker()` errors out since the function is
accessing distributed table. Both the new and old errors are testing the same scenario.
But due to plan change, the place where we bail out is different.

Ref [#142355175]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

102295b8

Replace JOIN_LASJ by JOIN_ANTI · 6e7b4722

由 Ekta Khanna 提交于 5月 10, 2017

After merging with e006a24a, Anti Semi Join will
be denoted by `JOIN_ANTI` instead of `JOIN_LASJ`

Ref [#142355175]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

6e7b4722

Remove InClauseInfo and OuterJoinInfo · 8b63aafb

由 Ekta Khanna 提交于 5月 09, 2017

Since `InClauseInfo` and `OuterJoinInfo` are now combined into
`SpecialJoinInfo` after merging with e006a24a; this commit remove them
from the relevant places.

Access `join_info_list` instead of `in_info_list` and `oj_info_list`

Previously, `CdbRelDedupInfo` contained list of `InClauseInfo` s. While
making join decisions and overall join processing, we traversed this list
and invoked cdb specific functions: `cdb_make_rel_dedup_info()`, `cdbpath_dedup_fixup()`

Since `InClauseInfo` is no longer available,  `CdbRelDedupInfo` will contain list of
`SpecialJoinInfo` s. All the cdb specific routines which were previously called for
`InClauseInfo` list will now be called if `CdbRelDedupInfo` has valid `SpecialJoinInfo`
list and if join type in `SpecialJoinInfo` is `JOIN_SEMI`. A new helper routine `hasSemiJoin()`
has been added which traverses `SpecialJoinInfo` list to check if it contains `JOIN_SEMI`.

Ref [#142355175]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

8b63aafb

Replace JOIN_IN by JOIN_SEMI in ORCA translator · db853de9

由 Ekta Khanna 提交于 5月 09, 2017

After merging with e006a24a, the jointype JOIN_IN has been renamed to
JOIN_SEMI.
This commit makes appropriate changes in ORCA for the same.

Ref [#142355175]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

db853de9

Add pullup decisions in `convert_ANY_sublink_to_join` · 7be2c0ad

由 Ekta Khanna 提交于 5月 09, 2017

Add pullup decisions specific to CDB from `convert_IN_to_join`

Ref [#142355175]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

7be2c0ad

Add pullup decisions in `convert_EXISTS_sublink_to_join` · d91f0efb

由 Dhanashree Kashid 提交于 5月 09, 2017

After merging with e006a24a, this commit adds CDB specific restrictions
as follows:

0. Add pullup decisions specific to CDB from
`convert_EXISTS_to_join` and `convert_NOT_EXISTS_to_join`

0. Before this cherry-pick, we used to generate extra quals
for the NOT EXISTS query. This was done by calling `cdbpullup_expr()`
in `convert_NOT_EXISTS_to_join()`.
However, for the exact same query with EXISTS, we never generated these
extra quals.
```
create table foo(t text, n numeric, i int, v varchar(10)) distributed by (t);
explain select * from foo t0 where not exists (select 1 from foo t1 where t0.i=t1.i + 1);
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=1.08..2.12 rows=4 width=19)
   ->  Hash Left Anti Semi Join  (cost=1.08..2.12 rows=2 width=19)
         Hash Cond: t0.i = (t1.i + 1)
         ->  Seq Scan on foo t0  (cost=0.00..1.00 rows=1 width=19)
         ->  Hash  (cost=1.04..1.04 rows=1 width=4)
               ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..1.04 rows=1 width=4)
                     ->  Seq Scan on foo t1  (cost=0.00..1.00 rows=1 width=4)
                           Filter: (i + 1) IS NOT NULL  -> extra filter
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
(10 rows)

explain select * from foo t0 where exists (select 1 from foo t1 where t0.i=t1.i + 1);
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=1.08..2.12 rows=4 width=19)
   ->  Hash EXISTS Join  (cost=1.08..2.12 rows=2 width=19)
         Hash Cond: t0.i = (t1.i + 1)
         ->  Seq Scan on foo t0  (cost=0.00..1.00 rows=1 width=19)
         ->  Hash  (cost=1.04..1.04 rows=1 width=4)
               ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..1.04 rows=1 width=4)
                     ->  Seq Scan on foo t1  (cost=0.00..1.00 rows=1 width=4)
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
(9 rows)

```
Currently with this commit, the combined pull-up code for EXISTS and NOT EXISTS
does not generate extra filters. This will be a future TODO.

0. Use `is_simple_subquery` in `simplify_EXISTS_query` to check if
subquery can be pulled up or not.

Ref [#142355175]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

d91f0efb

Remove old pullup functions after merging · 2b5c1b9e

由 Dhanashree Kashid 提交于 5月 09, 2017

With the new flow, we don't need the following functions:

 - pull_up_IN_clauses
 - convert_EXISTS_to_join
 - convert_NOT_EXISTS_to_antijoin
 - not_null_inner_vars
 - safe_to_convert_NOT_EXISTS
 - convert_sublink_to_join

Ref [#142355175]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

2b5c1b9e

CDBlize the cherry-pick · 0feb1bd9

由 Ekta Khanna 提交于 5月 09, 2017

Original Flow:
cdb_flatten_sublinks
	+--> pull_up_IN_clauses
		+--> convert_sublink_to_join

New Flow:
cdb_flatten_sublinks
	+--> pull_up_sublinks

This commit contains relevant changes for the above flow.

Previously, `try_join_unique` was part of `InClauseInfo`. It was getting
set in `convert_IN_to_join()` and used in `cdb_make_rel_dedup_info()`.
Now, since `InClauseInfo` is not present and we construct
`FlattenedSublink` instead in `convert_ANY_sublink_to_join()`. And later
in the flow, we construct `SpecialJoinInfo` from `FlattenedSublink` in
`deconstruct_sublink_quals_to_rel()`. Hence, adding `try_join_unique` as
part of both `FlattenedSublink` and `SpecialJoinInfo`.

Ref [#142355175]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

0feb1bd9

Implement SEMI and ANTI joins in the planner and executor. · fe2eb2c9

由 Ekta Khanna 提交于 5月 09, 2017

commit e006a24a
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu Aug 14 18:48:00 2008 +0000

Implement SEMI and ANTI joins in the planner and executor. (Semijoins replace
the old JOIN_IN code, but antijoins are new functionality.) Teach the planner
to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
joins respectively. Also, LEFT JOINs with suitable upper-level IS NULL
filters are recognized as being anti joins. Unify the InClauseInfo and
OuterJoinInfo infrastructure into "SpecialJoinInfo". With that change,
it becomes possible to associate a SpecialJoinInfo with every join attempt,
which permits some cleanup of join selectivity estimation. That needs to be
taken much further than this patch does, but the next step is to change the
API for oprjoin selectivity functions, which seems like material for a
separate patch. So for the moment the output size estimates for semi and
especially anti joins are quite bogus.

Ref [#142355175]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

fe2eb2c9

L

docs - simplify the CREATE EXERNAL TABLE ref page (part 1) (#3379) · b68bcd89
由 Lisa Owen 提交于 9月 26, 2017

b68bcd89

docs - restructure admin guide top level topics (#3389) · 56cdb519

由 Lisa Owen 提交于 9月 26, 2017

* admin guide working w/dbs section - pull some topics up a level

* promote ddl, crud topics; move querying topic up

56cdb519

Force behave to use older version of its dependency · b4cfe392

由 Nadeem Ghani 提交于 9月 26, 2017

Behave by default just grabs the latest parse_type package as part of
the setup requirement. However, the newest parse_type package (0.4.2)
uses a new convention on how it handles packaging which is not in the
older python versions before 2.7.13. Since we're in python 2.7.12, we
break.

Force requirement to use an older version as a hack to bypass this
issue.
Signed-off-by: NMarbin Tan <mtan@pivotal.io>

b4cfe392

M

docs: COPY... ON SEGMENT w/ SELECT is not supported. (#3348) · 2bc5401c
由 Mel Kiyama 提交于 9月 26, 2017

2bc5401c
L
Revert "Running 'make clean' does not remove objects under bin/gpcheckcloud (#3377)" · f1bfd7d9
由 Lav Jain 提交于 9月 26, 2017
```
This reverts commit f35b369c.
```
f1bfd7d9
L

Running 'make clean' does not remove objects under bin/gpcheckcloud (#3377) · f35b369c
由 Lav Jain 提交于 9月 26, 2017

f35b369c
L

docs - resgroup concurrency can be 0 but not for admin_group (#3358) · e397112d
由 Lisa Owen 提交于 9月 26, 2017

e397112d

26 9月, 2017 5 次提交

Convert GPDB-specific GUCs to the new "enum" type · 59165cfa

由 Jacob Champion 提交于 9月 26, 2017

Several GUCs are simply enumerated strings that are parsed into integer
types behind the scenes. As of 8.4, the GUC system recognizes a new
type, enum, which will do this for us. Move as many as we can to the new
system.

As part of this,
- gp_idf_deduplicate was changed from a char* string to an int, and new
  IDF_DEDUPLICATE_* macros were added for each option
- password_hash_algorithm was changed to an int
- for codegen_optimization_level, "none" is the default now when codegen
  is not enabled during compilation (instead of the empty string).

A couple of GUCs that *could* be represented as enums
(optimizer_minidump, gp_workfile_compress_algorithm) have been
purposefully kept with the prior system because they require the GUC
variable to be something other than an integer anyway.
Signed-off-by: NJacob Champion <pchampion@pivotal.io>

59165cfa

Backport FILTER implementation, from PostgreSQL 9.4 (commit ). · 4c7f65be

由 Heikki Linnakangas 提交于 9月 26, 2017

We had a very simplistic implementation in parse-analysis already, which
converted the FILTER WHERE clause into a CASE-WHEN expression. That did
not work for non-strict aggregates, and didn't deparse back into a FILTER
expression nicely, to name a few problems with it. Replace it with the
PostgreSQL implementation.

TODO:

* ORCA support. It now falls back to the Postgres planner.
* I disabled the three-stage DQA plan types if there are any FILTERs

4c7f65be

L

Update bucket location for singlecluster · 33ea9a4e
由 Lav Jain 提交于 9月 25, 2017

33ea9a4e
X
pgindent the segadmin.c · cf1680c4
由 Xin Zhang 提交于 9月 25, 2017
```
Signed-off-by: NAshwin Agrawal <aagrawal@pivotal.io>
```
cf1680c4

Refactor the gpsegwalrep.py · 26d4fe10

由 Ashwin Agrawal 提交于 9月 21, 2017

Adding timestamp to better trace the command.

Adding ClusterConfiguration to improve the reuse, and remove
ColdStartMaster class.
Signed-off-by: NXin Zhang <xzhang@pivotal.io>

26d4fe10