- 14 3月, 2019 2 次提交
-
-
由 Daniel Gustafsson 提交于
As we merge with upstream and by that keep refining the Postgres planner, legacy planner is no longer a suitable name. This changes all variations of the spelling (legacy planner, legacy optimizer, legacy query optimizer etc) to say "Postgres" rather than "legacy". Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io> Reviewed-by: NDavid Yozie <dyozie@pivotal.io> Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
-
由 Heikki Linnakangas 提交于
* Avoid using start/end_ignore blocks where not necessary. * Remove unnecessary DROP commands. * Use begin/commit when building test tables, to reduce 2PC overhead * Reuse test tables, rather than drop and recreate them Shaves a few seconds from the total execution time.
-
- 01 2月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
Replace the use of the built-in hashing support for built-in datatypes, in cdbhash.c, with the normal PostgreSQL hash functions. Now is a good time to do this, since we've already made the change to use jump consistent hashing in GPDB 6, so we'll need to deal with the upgrade problems associated with changing the hash functions, anyway. It is no longer enough to track which columns/expressions are used to distribute data. You also need to know the hash function used. For that, a new field is added to gp_distribution_policy, to record the hash operator class used for each distribution key column. In the planner, a new opfamily field is added to DistributionKey, to track that throughout the planning. Normally, if you do "CREATE TABLE ... DISTRIBUTED BY (column)", the default hash operator class for the datatype is used. But this patch extends the syntax so that you can specify the operator class explicitly, like "... DISTRIBUTED BY (column opclass)". This is similar to how an operator class can be specified for each column in CREATE INDEX. To support upgrade, the old hash functions have been converted to special (non-default) operator classes, named cdbhash_*_ops. For example, if you want to use the old hash function for an integer column, you could do "DISTRIBUTED BY (intcol cdbhash_int4_ops)". The old hard-coded whitelist of operators that have "compatible" cdbhash functions has been replaced by putting the compatible hash opclasses in the same operator family. For example, all legacy integer operator classes, cdbhash_int2_ops, cdbhash_int4_ops and cdbhash_int8_ops, are all part of the cdbhash_integer_ops operator family). This removes the pg_database.hashmethod field. The hash method is now tracked on a per-table and per-column basis, using the opclasses, so it's not needed anymore. To help with upgrade from GPDB 5, this introduces a new GUC called 'gp_use_legacy_hashops'. If it's set, CREATE TABLE uses the legacy hash opclasses, instead of the default hash opclasses, if the opclass is not specified explicitly. pg_upgrade will set the new GUC, to force the use of legacy hashops, when restoring the schema dump. It will also set the GUC on all upgraded databases, as a per-database option, so any new tables created after upgrade will also use the legacy opclasses. It seems better to be consistent after upgrade, so that collocation between old and new tables work for example. The idea is that some time after the upgrade, the admin can reorganize all tables to use the default opclasses instead. At that point, he should also clear the GUC on the converted databases. (Or rather, the automated tool that hasn't been written yet, should do that.) ORCA doesn't know about hash operator classes, or the possibility that we might need to use a different hash function for two columns with the same datatype. Therefore, it cannot produce correct plans for queries that mix different distribution hash opclasses for the same datatype, in the same query. There are checks in the Query->DXL translation, to detect that case, and fall back to planner. As long as you stick to the default opclasses in all tables, we let ORCA to create the plan without any regard to them, and use the default opclasses when translating the DXL plan to a Plan tree. We also allow the case that all tables in the query use the "legacy" opclasses, so that ORCA works after pg_upgrade. But a mix of the two, or using any non-default opclasses, forces ORCA to fall back. One curiosity with this is the "int2vector" and "aclitem" datatypes. They have a hash opclass, but no b-tree operators. GPDB 4 used to allow them as DISTRIBUTED BY columns, but we forbid that in GPDB 5, in commit 56e7c16b. Now they are allowed again, so you can specify an int2vector or aclitem column in DISTRIBUTED BY, but it's still pretty useless, because the planner still can't form EquivalenceClasses on it, and will treat it as "strewn" distribution, and won't co-locate joins. Abstime, reltime, tinterval datatypes don't have default hash opclasses. They are being removed completely on PostgreSQL v12, and users shouldn't be using them in the first place, so instead of adding hash opclasses for them now, we accept that they can't be used as distribution key columns anymore. Add a check to pg_upgrade, to refuse upgrade if they are used as distribution keys in the old cluster. Do the same for 'money' datatype as well, although that's not being removed in upstream. The legacy hashing code for anyarray in GPDB 5 was actually broken. It could produce a different hash value for two arrays that are considered equal, according to the = operator, if there were differences in e.g. whether the null bitmap was stored or not. Add a check to pg_upgrade, to reject the upgrade if array types were used as distribution keys. The upstream hash opclass for anyarray works, though, so it is OK to use arrays as distribution keys in new tables. We just don't support binary upgrading them from GPDB 5. (See github issue https://github.com/greenplum-db/gpdb/issues/5467). The legacy hashing of 'anyrange' had the same problem, but that was new in GPDB 6, so we don't need a pg_upgrade check for that. This also tightens the checks ALTER TABLE ALTER COLUMN and CREATE UNIQUE INDEX, so that you can no longer create a situation where a non-hashable column becomes the distribution key. (Fixes github issue https://github.com/greenplum-db/gpdb/issues/6317) Discussion: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/4fZVeOpXllQCo-authored-by: NMel Kiyama <mkiyama@pivotal.io> Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io> Co-authored-by: NPengzhou Tang <ptang@pivotal.io> Co-authored-by: NChris Hajas <chajas@pivotal.io> Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io> Reviewed-by: NNing Yu <nyu@pivotal.io> Reviewed-by: NSimon Gao <sgao@pivotal.io> Reviewed-by: NJesse Zhang <jzhang@pivotal.io> Reviewed-by: NZhenghua Lyu <zlv@pivotal.io> Reviewed-by: NMelanie Plageman <mplageman@pivotal.io> Reviewed-by: NYandong Yao <yyao@pivotal.io>
-
- 09 1月, 2019 1 次提交
-
-
由 Richard Guo 提交于
The following identity holds true: (A antijoin B on (Pab)) innerjoin C on (Pac) = (A innerjoin C on (Pac)) antijoin B on (Pab) So we should not enforce join ordering for ANTI. Instead we need to collapse ANTI join nodes so that they participate fully in the join order search. For example: select * from a join b on a.i = b.i where not exists (select i from c where a.i = c.i); For this query, the origin join order is "(a innerjoin b) antijoin c". If we enforce ANTI join ordering, this will be the final join order. But another join order "(a antijoin c) innerjoin b" is also legal. We should take this order into consideration and pick a cheaper one. For LASJ, it is the same as ANTI joins. Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io> Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
-
- 07 1月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
Merge the two concepts, squelching and eager-freeing. There is now only one function, ExecSquelchNode(), that you can call. It recurses to children, as before, but it now also performs eager-freeing of resources. Previously that was done as a separate call, but that was an unnecessary distinction, because all callers of ExecSquelchNode() also called ExecEagerFree() The concept of eager-freeing still lives on, as ExecEagerFree*() functions specific to many node types. But it no longer recurses! The pattern is that ExecSquelchNode() always performs eager freeing of the node, and also recurses. In addition to that, some node types also call the node-specific EagerFree function of the same node, after reaching the end of tuples. This makes it more clear which function should be called when. ExecEagerWalker() used to have special handling for the pattern of: Result -> Material -> Broadcast Motion I tried removing that, but then I started to get "illegal rescan of motion node" errors in the regression tests, from queries with a LIMIT node in a subquery. Upon closer look, I believe that was because the Limit node was calling ExecSquelchNode() on the input, even though the Limit node was marked as rescannable. To fix that, I added delayEagerFree logic to Limit node, to not call ExecSquelchNode() when the node might get rescanned later. The planstate_walk_kids() code did not know how to recurse into the children of a MergeAppend node. We missed adding that logic, when we merged the MergeAppend node type from upstream, in 9.1. We don't use that mechanism for recursing in ExecSquelchNode() anymore, but that probably should be fixed, anyway, as a separate commit later. Fixes https://github.com/greenplum-db/gpdb/issues/6602 and https://github.com/greenplum-db/gpdb/issues/6074. Reviewed-by: NTang Pengzhou <ptang@pivotal.io>
-
- 29 10月, 2018 1 次提交
-
-
由 Tang Pengzhou 提交于
* Simplify direct dispatch related code This commit include two parts: * simplify direct-dispatch dispatching code * simplify direct-dispatch DTM related code Previously, cdbdisp_dispatchToGang need a CdbDispatchDirectDesc info, now gang only contain inuse segments, so direct-dispatch info is useless. Another thing is, we need to decide if DTM is available for direct-dispatch within dtmPreCommand, the logic is complex, you need to know if the main plan is direct-dispatch and if the init plan contain direct-dispatch. one example is: "update foo set foo.c2 = 2 where foo.c1 = 1 and exists (select * from bar where bar.c1=4)" main plan can be direct dispatched to segment 1, init plan can be direct dispatched to segment 2, with the old logic, the DTM like PREPARE need to dispatched to all segments, so dtmPreCommand need to dispatch a DTM named 'DTX_PROTOCOL_COMMAND_STAY_AT_OR_BECOME_IMPLIED' to all segment so those segments like segment 3 who didn't receive the plan can be ready for two phase commit. With the new gang API, we can simplify this process, we add a list in currentGxact to record which segments are actually get involved in a two phase commit, then we can dispatch DTM to them directly. This is also very usefully for queries on tables that are not fully expaned yet. * support direct dispatch to more than one segment
-
- 10 10月, 2018 1 次提交
-
-
由 Richard Guo 提交于
This patch takes LHS input into consideration when pull up sublink of type ALL_SUBLINK. As a result, there would be more subplans conveyed into joins, such as the query below: ``` select * from A,B where exists (select * from C where C.j = A.j and B.i not in (select C.i from C where C.i != 10)); ``` This is only possible after commit eb919e8f, in which a set of cartesian-product joins would be generated as a last-ditch effort, if we failed to find any usable joins. Authored-by: NRichard Guo <guofenglinux@gmail.com> Authored-by: NAlexandra Wang <leiwangcheme@gmail.com> Reviewed by: Heikki Linnakangas <hlinnakangas@pivotal.io>
-
- 25 9月, 2018 1 次提交
-
-
由 Paul Guo 提交于
create_unique_path() could be used to convert semi join to inner join. Previously, during the Semi-join refactor in commit d4ce0921, creating unique path was disabled for the case where duplicats might be on different QEs. In this patch we enable adding motion to unique_ify the path, only if unique mothod is not UNIQUE_PATH_NOOP. We don't create unique path for that case because if later on during plan creation, it is possible to create a motion above this unique path whose subpath is a motion. In that case, the unique path node will be ignored and we will get a motion plan node above a motion plan node and that is bad. We could further improve that, but not in this patch. Co-authored-by: NAlexandra Wang <lewang@pivotal.io> Co-authored-by: NPaul Guo <paulguo@gmail.com>
-
- 22 9月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
We had changed this in GPDB, to print less parens. That's fine and dandy, but it hardly seems worth it to carry a diff vs upstream for this. Which format is better, is a matter of taste. The extra parens make some expressions more clear, but OTOH, it's unnecessarily verbose for simple expressions. Let's follow the upstream on this. These changes were made to GPDB back in 2006, as part of backporting to EXPLAIN-related patches from PostgreSQL 8.2. But I didn't see any explanation for this particular change in output in that commit message. It's nice to match upstream, to make merging easier. However, this won't make much difference to that: almost all EXPLAIN plans in regression tests are different from upstream anyway, because GPDB needs Motion nodes for most queries. But every little helps.
-
- 18 9月, 2018 2 次提交
-
-
由 Richard Guo 提交于
-
由 Richard Guo 提交于
When we deduce new quals from non-equivalence clauses, we should only use operators from the same opfamily, so that the operators can have consistent semantics. In addition, datatype and expression should also be checked.
-
- 11 9月, 2018 1 次提交
-
-
由 ZhangJackey 提交于
When doing an inner join, we will test that if we can use redistribute motion by the function cdbpath_partkeys_from_preds. But if a_partkey is NIL(it is NIL at the beginning of the function), we append nothing into it. Thus this function will only return false. This leads to the planner can only generate a broadcast motion for the inner relation. We fix this by the same logic as an outer join. WTS node is immovable, this commit adds some code to handle it. Co-authored-by: NShujie Zhang <shzhang@pivotal.io> Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
-
- 07 9月, 2018 1 次提交
-
-
由 Richard Guo 提交于
Previously, for query below, we disabled ANY_SUBLINK pullup to work around the assertion failure that left_ec/right_ec not being set. ``` select * from A where exists (select * from B where A.i in (select C.i from C where C.i = B.i)); ``` This commit sets left_ec/right_ec properly in gen_implied_qual() and re-enables above ANY_SUBLINK pullup. Co-authored-by: NAlexandra Wang <lewang@pivotal.io> Co-authored-by: NRichard Guo <guofenglinux@gmail.com>
-
- 31 8月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
Among other things, this fixes the inaccuracy of integer avg() and sum() functions. (i.e. fixes https://github.com/greenplum-db/gpdb/issues/5525) The upstream versions are from PostgreSQL 9.6, using the 128-bit math from the following commit: commit 959277a4 Author: Andres Freund <andres@anarazel.de> Date: Fri Mar 20 10:26:17 2015 +0100 Use 128-bit math to accelerate some aggregation functions. On platforms where we support 128bit integers, use them to implement faster transition functions for sum(int8), avg(int8), var_*(int2/int4),stdev_*(int2/int4). Where not supported continue to use numeric as a transition type. In some synthetic benchmarks this has been shown to provide significant speedups. Bumps catversion. Discussion: 544BB5F1.50709@proxel.se Author: Andreas Karlsson Reviewed-By: Peter Geoghegan, Petr Jelinek, Andres Freund, Oskari Saarenmaa, David Rowley
-
- 18 8月, 2018 1 次提交
-
-
由 Zhenghua Lyu 提交于
Previously, Greenplum uses a very simple algorithm to reduce hash value to segment id: it just does a modulo computation. Simply modulo is not suitable for cluster expanding. In this commit, we change cdbhash to use jump consistent hash algorithm. More details, please refer the proposal on gpdb-dev mail-list: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/f5bqTzAZjAsCo-authored-by: NZhenghua Lyu <zlv@pivotal.io> Co-authored-by: NShujie Zhang <shzhang@pivotal.io> Co-authored-by: NJialun Du <jdu@pivotal.io> Co-authored-by: NDavid Kimura <dkimura@pivotal.io> Co-authored-by: NAsim R P <apraveen@pivotal.io>
-
- 03 8月, 2018 1 次提交
-
-
由 Karen Huddleston 提交于
This reverts commit 4750e1b6.
-
- 02 8月, 2018 1 次提交
-
-
由 Richard Guo 提交于
This is the final batch of commits from PostgreSQL 9.2 development, up to the point where the REL9_2_STABLE branch was created, and 9.3 development started on the PostgreSQL master branch. Notable upstream changes: * Index-only scan was included in the batch of upstream commits. It allows queries to retrieve data only from indexes, avoiding heap access. * Group commit was added to work effectively under heavy load. Previously, batching of commits became ineffective as the write workload increased, because of internal lock contention. * A new fast-path lock mechanism was added to reduce the overhead of taking and releasing certain types of locks which are taken and released very frequently but rarely conflict. * The new "parameterized path" mechanism was added. It allows inner index scans to use values from relations that are more than one join level up from the scan. This can greatly improve performance in situations where semantic restrictions (such as outer joins) limit the allowed join orderings. * SP-GiST (Space-Partitioned GiST) index access method was added to support unbalanced partitioned search structures. For suitable problems, SP-GiST can be faster than GiST in both index build time and search time. * Checkpoints now are performed by a dedicated background process. Formerly the background writer did both dirty-page writing and checkpointing. Separating this into two processes allows each goal to be accomplished more predictably. * Custom plan was supported for specific parameter values even when using prepared statements. * API for FDW was improved to provide multiple access "paths" for their tables, allowing more flexibility in join planning. * Security_barrier option was added for views to prevents optimizations that might allow view-protected data to be exposed to users. * Range data type was added to store a lower and upper bound belonging to its base data type. * CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The SELECT query is planned during the execution of the utility. To conform to this change, GPDB executes the utility statement only on QD and dispatches the plan of the SELECT query to QEs. Co-authored-by: NAdam Lee <ali@pivotal.io> Co-authored-by: NAlexandra Wang <lewang@pivotal.io> Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io> Co-authored-by: NAsim R P <apraveen@pivotal.io> Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io> Co-authored-by: NGang Xiong <gxiong@pivotal.io> Co-authored-by: NHaozhou Wang <hawang@pivotal.io> Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io> Co-authored-by: NJesse Zhang <sbjesse@gmail.com> Co-authored-by: NJinbao Chen <jinchen@pivotal.io> Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io> Co-authored-by: NMelanie Plageman <mplageman@pivotal.io> Co-authored-by: NPaul Guo <paulguo@gmail.com> Co-authored-by: NRichard Guo <guofenglinux@gmail.com> Co-authored-by: NShujie Zhang <shzhang@pivotal.io> Co-authored-by: NTaylor Vesely <tvesely@pivotal.io> Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
-
- 09 7月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
Instead of completely disabling the generation of Paths with disabled plan types, add a high penalty to their cost estimates, like in the upstream. This reduces our diff vs. upstream, making future merges more straightforward. Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/Az2cDcqf73g/_tY6Yv1kBgAJCo-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io> Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io> Reviewed-by: NRichard Guo <riguo@pivotal.io>
-
- 11 4月, 2018 1 次提交
-
-
由 Abhijit Subramanya 提交于
This FIXME is two-fold: - Handling LIMIT 0 The LIMIT is already handled in the caller, convert_EXISTS_sublink_to_join(): When an existential sublink contains an aggregate without GROUP BY or HAVING, we can safely replace it by a one-time TRUE/FALSE filter based on the type of sublink since the result of aggregate is always going to be one row even if it's input rows are 0. However this assumption is incorrect when sublink contains LIMIT/OFFSET, such as, if the final limit count after applying the offset is 0. - Rules for demoting HAVING to WHERE previously, simplify_EXISTS_query() only disallowed demoting HAVING quals to WHERE, if it did not contain any aggregates. To determine the same, previously it used query->hasAggs, which is incorrect since hasAggs indicates that aggregate is present either in targetlist or HAVING. This penalized the queries, wherein HAVING did not contain the agg but targetlist did (as demonstrated in the newly added test). This check is now replaced by contain_aggs_of_level(). Also, do not demote if HAVING contains volatile functions since they need to be evaluated once per group. Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
-
- 22 3月, 2018 1 次提交
-
-
由 Dhanashree Kashid 提交于
For queries of the form "NOT (subselect)", planner lost the "NOT" operator during the initial pull-up in pull_up_sublinks_qual_recurse() which resulted in incorrect filter and hence wrong results. Inside pull_up_sublinks_qual_recurse(), when a qual contains NOT, we check if sublink type is any one of these: EXISTS, ANY or ALL, and invoke appropriate sublink pull up routines. In case of qual of the form "NOT (SELECT <>)" the sublink type is EXPR; hence we recurse into the argument of NOT; at which point we lose the information about NOT operator. This commit fixes the issue by returning the node unmodified when the argument of NOT is an EXPR sublink. The EXPR sublink, later gets pulled up by preprocess_qual_conditions() wherein, pull_up_sublinks() is invoked again to handle sublinks in an expression. Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
-
- 09 2月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
This removes much of the GPDB machinery to handle "deduplication paths" within the planner. We will now use the upstream code to build JOIN_SEMI paths, as well as paths where the outer side of the join is first deduplicated (JOIN_UNIQUE_OUTER/INNER). The old style "join first and deduplicate later" plans can be better in some cases, however. To still be able to generate such plan, add new JOIN_DEDUP_SEMI join type, which is transformed into JOIN_INNER followed by the deduplication step after the join, during planning. This new way of constructing these plans is simpler, and allows removing a bunch of code, and reverting some more code to the way it is in the upstream. I'm not sure if this can generate the same plans that the old code could, in all cases. In particular, I think the old "late deduplication" mechanism could delay the deduplication further, all the way to the top of the join tree. I'm not sure when that woud be useful, though, and the regression suite doesn't seem to contain any such cases (with EXPLAIN). Or maybe I misunderstood the old code. In any case, I think this is good enough.
-
- 12 1月, 2018 1 次提交
-
-
由 Shreedhar Hardikar 提交于
This commit brings in ORCA changes that ensure that a Materialize node is not added under a Filter when its child contains outer references. Otherwise, the subplan is not rescanned (because it is under a Material), producing wrong results. A rescan is necessary for it evaluates the subplan for each of the outer referenced values. For example: ``` SELECT * FROM A,B WHERE EXISTS ( SELECT * FROM E WHERE E.j = A.j and B.i NOT IN ( SELECT E.i FROM E WHERE E.i != 10)); ``` For the above query ORCA produces a plan with two nested subplans: ``` Result Filter: (SubPlan 2) -> Gather Motion 3:1 -> Nested Loop Join Filter: true -> Broadcast Motion 3:3 -> Table Scan on a -> Table Scan on b SubPlan 2 -> Result Filter: public.c.j = $0 -> Materialize -> Result Filter: (SubPlan 1) -> Materialize -> Gather Motion 3:1 -> Table Scan on c SubPlan 1 -> Materialize -> Gather Motion 3:1 -> Table Scan on c Filter: i <> 10 ``` The Materialize node (on top of Filter with Subplan 1) has cdb_strict = true. The cdb_strict semantics dictate that when the Materialize is rescanned, instead of destroying its tuplestore, it resets the accessor pointer to the beginning and the subtree is NOT rescanned. So the entries from the first scan are returned for all future calls; i.e. the results depend on the first row output by the cross join. This causes wrong and non-deterministic results. Also, this commit reinstates this test in qp_correlated_query.sql. It also fixes another wrong result caused by the same issue. Note that the changes in rangefuncs_optimizer.out are because ORCA now no longer falls back for those queries. Instead it produces a plan which is executed on master (instead of the segments as was done by planner) which changes the error messages. Also bump ORCA version to 2.53.8. Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
- 11 11月, 2017 1 次提交
-
-
由 Dhanashree Kashid 提交于
This function had diverged a lot from upstream; post subselect merge. One of the main reason is that upstream has lot of restrictive checks which prevent pull-up of EXISTS/NOT EXISTS. GPDB handles them differently; thus producing a join/initplan or a one-time filter. The cases that GPDB handles and for which we have not ported the checks from upstream are as follows: - AGG with limit count with/without offset - HAVING clause without AGG - AGG without HAVING clause For other conditions, we bail out as upstream. Hence we have added checks differently for having and aggs inside simplify_EXISTS_query. Rest of the code is similar to upstream. Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
- 27 9月, 2017 8 次提交
-
-
由 Tom Lane 提交于
commit 07b9936a Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Fri Feb 27 23:30:29 2009 +0000 Temporarily (I hope) disable flattening of IN/EXISTS sublinks that are within the ON clause of an outer join. Doing so is semantically correct but results in de-optimizing queries that were structured to take advantage of the sublink style of execution, as seen in recent complaint from Kevin Grittner. Since the user can get the other behavior by reorganizing his query, having the flattening happen automatically is just a convenience, and that doesn't justify breaking existing applications. Eventually it would be nice to re-enable this, but that seems to require a significantly different approach to outer joins in the executor. Added relevant test case. Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
-
由 Ekta Khanna 提交于
Signed-off-by: NJemish Patel <jpatel@pivotal.io>
-
由 Shreedhar Hardikar 提交于
GPDB handles a lot of the cases that are restricted by is_simple_subquery; and the restrictions not handled, are checked for separately in convert_EXISTS_sublink_to_join(). Resulting from cascading ICG failures, we also fixed the following: - initialize all the members of IncrementVarSublevelsUp_context properly. - remove incorrect assertions brought in from upstream. In GPDB, these cases are handled. - improve plans for NOT EXISTS sub-queries containing an aggregation without limits by creating a "false" one-time filter. Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
-
由 Shreedhar Hardikar 提交于
This was used to keep information about the subquery join tree for pulled-up sublinks for use later in deconstruct_recurse(). With the upstream subselect merge, a JoinExpr constructed at the pull-up time itself, so this is no longer needed since the subquery join tree information is available in the constructed JoinExpr. Also with the merge, deconstruct_recurse() handles JOIN_SEMI JoinExprs. However, since GPDB differs from upstream by treating SEMI joins as INNER join for internal join planning, this commit also updates inner_join_rels correctly for SEMI joins (see regression test). Also remove unused function declaration for not_null_inner_vars(). Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
-
由 Shreedhar Hardikar 提交于
This issue was discovered during the subselect merge. wherin planner incorrectly commutes anti joins. `cdb_add_subquery_join_paths()` creates join paths for (rel1, rel2) and (rel2, rel1) for all join types including JOIN_ANTI and JOIN_LASJ_NOTIN. This produces wrong results since these joins are order-sensitive w.r.t inner and outer relations (see new regression tests). So, do not add (rel2, rel1) for JOIN_ANTI and JOIN_LASJ_NOTIN. This commit also refactors cdb_add_subquery_join_paths() and make_join_rel() to make it easier to control the commuting. Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
-
由 Tom Lane 提交于
commit fbcce080 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun Apr 5 19:59:40 2009 +0000 Change EXPLAIN output so that subplans and initplans (particularly CTEs) are individually labeled, rather than just grouped under an "InitPlan" or "SubPlan" heading. This in turn makes it possible for decompilation of a subplan reference to usefully identify which subplan it's referencing. I also made InitPlans identify which parameter symbol(s) they compute, so that references to those parameters elsewhere in the plan tree can be connected to the initplan that will be executed. Per a gripe from Robert Haas about EXPLAIN output of a WITH query being inadequate, plus some longstanding pet peeves of my own.
-
由 Dhanashree Kashid 提交于
After the introduction of `placeholder` mechanism, we see some changes in the plan. These are all good changes in that we remove the redundant `NOTIN_subquery` subquery scan nodes: Before ``` ... -> Broadcast Motion 3:3 (slice1; segments: 3) -> Subquery Scan "NotIn_SUBQUERY" -> Seq Scan on subselect_tbl Filter: f3 IS NOT NULL ``` After ``` ... -> Broadcast Motion 3:3 (slice1; segments: 3) -> Seq Scan on subselect_tbl Filter: f3 IS NOT NULL ``` New plans have better cost since we will not be executing subquery scan for each tuple coming from its OUTER. We are now consistent with plans produced for same queries with IN instead of NOT IN. Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
由 Shreedhar Hardikar 提交于
This commit updates test answer files after merging e006a24a, 19e34b62 and e549722a. As a result of these merges, we produce inferior plans for which we have added FIX_ME's appropriately. They will be addressed in later commits.
-
- 12 9月, 2017 1 次提交
-
-
由 Shreedhar Hardikar 提交于
During NOT EXISTS sublink pullup, we create a one-time false filter when the sublink contains aggregates without checking for limitcount. However in situations where the sublink contains an aggregate with limit 0, we should not generate such filter as it produces incorrect results. Added regress test. Also, initialize all the members of IncrementVarSublevelsUp_context properly. Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
-
- 30 6月, 2017 1 次提交
-
-
由 Shreedhar Hardikar 提交于
These tests compare generated plans to catch significant plan changes during subselect pull up for IN, EXISTS, NOT IN, NOT EXISTS related queries.
-
- 14 2月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
Most of these queries were already in the qp_correlated_query test. A few were not, moved those instead of outright removing them. This removed the last queries from schema_topology whose output was different with ORCA than without ORCA, so remove the ORCA-specific expected output file.
-
- 15 12月, 2016 3 次提交
-
-
由 Xin Zhang 提交于
- Remove RESET ALL so that it can actually test with optimizer=on - Remove duplicated population of table A, B, C - Rename tables touched by DML to avoid duplicated setup - Fix subquery to verify the correct result other than verify error message - Remove query with non-deterministic result Now, with GPORCA, all SQL statements passed without ERROR. Planner still got few failures due to multi-level correlated subqueries, which is expected.
-
由 Xin Zhang 提交于
- Remove RESET ALL so that it can actually test with optimizer=on - Remove duplicated population of table A, B, C - Rename tables touched by DML to avoid duplicated setup - Fix subquery to verify the correct result other than verify error message Now, with GPORCA, all SQL statements passed without ERROR. Planner still got few failures due to multi-level correlated subqueries, which is expected.
-
- 06 12月, 2016 1 次提交
-
-
由 Xin Zhang 提交于
When we have a correlated subquery with EXISTS and CTE, the planner produces wrong plan as: ``` pivotal=# explain with t as (select 1) select * from foo where exists (select * from bar where foo.a = 'a' and foo.b = 'b'); QUERY PLAN ---------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice3; segments: 3) (cost=0.01..1522.03 rows=5055210000 width=16) -> Nested Loop (cost=0.01..1522.03 rows=1685070000 width=16) -> Broadcast Motion 1:3 (slice2; segments: 1) (cost=0.01..0.03 rows=1 width=0) -> Limit (cost=0.01..0.01 rows=1 width=0) -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.01..0.03 rows=1 width=0) -> Limit (cost=0.01..0.01 rows=1 width=0) -> Result (cost=0.01..811.00 rows=23700 width=0) One-Time Filter: $0 = 'a'::bpchar AND $1 = 'b'::bpchar -> Seq Scan on bar (cost=0.01..811.00 rows=23700 width=0) -> Seq Scan on foo (cost=0.00..811.00 rows=23700 width=16) Optimizer status: legacy query optimizer (11 rows) ``` This failed during execution because $0 is referenced across the slices. Root Cause: That's because planner produce a plan with `$0` aka `param` but without `subplan`. The `param` is created by `replace_outer_var()`, when planner detects a query referring to relations from its outer/parent query. Such `var` is created when removing `sublink` in `convert_EXISTS_to_join()` function. In that function, when handling the `EXISTS` query, we convert the `EXISTS_sublink` to a `subquery RTE` (and expect it to get pulled up later by `pull_up_subquery()`. However the subquery cannot be pulled-up by `pull_up_subquery()` since it is not a simple subquery (`is_simple_subquery()` returns false because of CTE in this case). In this case, the `sublink` got removed, hence it cannot produce the `subplan` (which is an valid option). And the `var` left behind as outer-reference, and then covered to param, which is blowed up during query execution. There is a mismatching conditions between `convert_EXISTS_to_join()` and `pull_up_subquery()` about whether this subquery can be pulled-up. The fix is to reuse `is_simple_subquery()` checking when `convert_EXISTS_to_join()`, so that it can be consistent with `pull_up_subquery()` on whether subquery can be pulled or not. The correct plan after fix is: ``` pivotal=# explain with t as (select 1) select * from foo where exists (select * from bar where foo.a = 'a' and foo.b = 'b'); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice2; segments: 3) (cost=0.00..1977.50 rows=35550 width=16) -> Seq Scan on foo (cost=0.00..1977.50 rows=11850 width=16) Filter: (subplan) SubPlan 1 -> Result (cost=0.01..811.00 rows=23700 width=16) One-Time Filter: $0 = 'a'::bpchar AND $1 = 'b'::bpchar -> Result (cost=882.11..1593.11 rows=23700 width=16) -> Materialize (cost=882.11..1593.11 rows=23700 width=16) -> Broadcast Motion 3:3 (slice1; segments: 3) (cost=0.01..811.00 rows=23700 width=16) -> Seq Scan on bar (cost=0.01..811.00 rows=23700 width=16) Optimizer status: legacy query optimizer (11 rows) ``` Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
-
- 18 11月, 2016 2 次提交
-
-
由 Heikki Linnakangas 提交于
We have similar tests there, and have the same test tables set up already.
-
由 Heikki Linnakangas 提交于
Attach a suitable error code for many errors that were previously reported as "internal errors". GPDB's elog.c prints a source file name and line number for any internal errors, which is a bit ugly for errors that are in fact unexpected internal errors, but user-facing errors that happen as a result of e.g. an invalid query. To make sure we don't accumulate more of these, adjust the regression tests to not ignore the source file and line number in error messages. There are a few exceptions, which are listed explicitly.
-