- 14 3月, 2019 1 次提交
-
-
由 Daniel Gustafsson 提交于
As we merge with upstream and by that keep refining the Postgres planner, legacy planner is no longer a suitable name. This changes all variations of the spelling (legacy planner, legacy optimizer, legacy query optimizer etc) to say "Postgres" rather than "legacy". Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io> Reviewed-by: NDavid Yozie <dyozie@pivotal.io> Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
-
- 27 2月, 2019 1 次提交
-
-
由 Jialun 提交于
- Retire GP_POLICY_ALL_NUMSEGMENTS and GP_POLICY_ENTRY_NUMSEGMENTS, unify to getgpsegmentCount - retire GP_POLICY_MINIMAL_NUMSEGMENTS & GP_POLICY_RANDOM_NUMSEGMENTS - Change NUMSEGMENTS related macro from variable macro to function macro - Change default return value of getgpsegmentCount to 1, which represents a singleton postgresql in utility mode - change __GP_POLICY_INVALID_NUMSEGMENTS to GP_POLICY_INVALID_NUMSEGMENTS
-
- 15 2月, 2019 1 次提交
-
-
由 Paul Guo 提交于
Also refactor subquery_motionHazard_walker() to make it simpler.
-
- 14 2月, 2019 1 次提交
-
-
由 Paul Guo 提交于
After we have parameterized path since pg 9.2 and lateral (since pg9.3 although we do not support the full functionality), merge join path and hash join path need to consider that. Besides, for nestloop path, the previous code is wrong. 1. It did not allow motion for paths include index (path_contains_inner_index()). That is wrong. Here are two examples of index paths which allow motion. -> Broadcast Motion 3:3 (slice1; segments: 3) (cost=0.17..24735.67 rows=86100 width=0) -> Index Only Scan using t2i on t2 (cost=0.17..21291.67 rows=28700 width=0) -> Broadcast Motion 1:3 (slice1; segments: 1) (cost=0.17..6205.12 rows=259 width=8) -> Index Scan using t2i on t2 (cost=0.17..6201.67 rows=29 width=8) Index Cond: (4 = a) 2. The inner path and outer path might require upper nodes for parameterized paths so current code code bms_overlap(inner_req_outer, outer_path->parent->relids) is definitely not sufficient, besides, outer_path could have paramterized paths also. For nestloop join, case 1 is covered by the test case added in join_gp. For case 2, the test case in join.sql (although ignored) in this patch actually partially tested. Note the change in this patch is conservative. In theory, we could refer subplan code to allow broadcast for base rel if needed (for this solution no motion is needed), but that needs much effort and does not seem to be deserved given we will probably refactor related code for the lateral support in the near future.
-
- 12 2月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
In plans with a Nested Loop join on the inner side of another Nested Loop join, the planner could produce a plan where a Motion node was rescanned. That produced an error at execution time: ERROR: illegal rescan of motion node: invalid plan (nodeMotion.c:1604) (seg0 slice4 127.0.0.1:40000 pid=27206) (nodeMotion.c:1604) HINT: Likely caused by bad NL-join, try setting enable_nestloop to off Make sure we add a Materialize node to shield the from rescanning in such cases. While we're at it, add an explicit flag to MaterialPaths and plans, to indicate that the Material node was added to shield the child node from rescanning. There was a weaker test in ExecInitMaterial itself, which just checked if the immediately child was a Motion node, but that feels sketchy; what if there's a Result node in between, for example? However, I kept the direct check for a Motion node, too, because I'm not sure if there are other places where we add Material nodes on top of Motions, aside from the call in create_nestloop_path() that this fixes. ORCA probably does that, at least. Fixes https://github.com/greenplum-db/gpdb/issues/6769Reviewed-by: NPengzhou Tang <ptang@pivotal.io> Reviewed-by: NPaul Guo <pguo@pivotal.io>
-
- 01 2月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
Replace the use of the built-in hashing support for built-in datatypes, in cdbhash.c, with the normal PostgreSQL hash functions. Now is a good time to do this, since we've already made the change to use jump consistent hashing in GPDB 6, so we'll need to deal with the upgrade problems associated with changing the hash functions, anyway. It is no longer enough to track which columns/expressions are used to distribute data. You also need to know the hash function used. For that, a new field is added to gp_distribution_policy, to record the hash operator class used for each distribution key column. In the planner, a new opfamily field is added to DistributionKey, to track that throughout the planning. Normally, if you do "CREATE TABLE ... DISTRIBUTED BY (column)", the default hash operator class for the datatype is used. But this patch extends the syntax so that you can specify the operator class explicitly, like "... DISTRIBUTED BY (column opclass)". This is similar to how an operator class can be specified for each column in CREATE INDEX. To support upgrade, the old hash functions have been converted to special (non-default) operator classes, named cdbhash_*_ops. For example, if you want to use the old hash function for an integer column, you could do "DISTRIBUTED BY (intcol cdbhash_int4_ops)". The old hard-coded whitelist of operators that have "compatible" cdbhash functions has been replaced by putting the compatible hash opclasses in the same operator family. For example, all legacy integer operator classes, cdbhash_int2_ops, cdbhash_int4_ops and cdbhash_int8_ops, are all part of the cdbhash_integer_ops operator family). This removes the pg_database.hashmethod field. The hash method is now tracked on a per-table and per-column basis, using the opclasses, so it's not needed anymore. To help with upgrade from GPDB 5, this introduces a new GUC called 'gp_use_legacy_hashops'. If it's set, CREATE TABLE uses the legacy hash opclasses, instead of the default hash opclasses, if the opclass is not specified explicitly. pg_upgrade will set the new GUC, to force the use of legacy hashops, when restoring the schema dump. It will also set the GUC on all upgraded databases, as a per-database option, so any new tables created after upgrade will also use the legacy opclasses. It seems better to be consistent after upgrade, so that collocation between old and new tables work for example. The idea is that some time after the upgrade, the admin can reorganize all tables to use the default opclasses instead. At that point, he should also clear the GUC on the converted databases. (Or rather, the automated tool that hasn't been written yet, should do that.) ORCA doesn't know about hash operator classes, or the possibility that we might need to use a different hash function for two columns with the same datatype. Therefore, it cannot produce correct plans for queries that mix different distribution hash opclasses for the same datatype, in the same query. There are checks in the Query->DXL translation, to detect that case, and fall back to planner. As long as you stick to the default opclasses in all tables, we let ORCA to create the plan without any regard to them, and use the default opclasses when translating the DXL plan to a Plan tree. We also allow the case that all tables in the query use the "legacy" opclasses, so that ORCA works after pg_upgrade. But a mix of the two, or using any non-default opclasses, forces ORCA to fall back. One curiosity with this is the "int2vector" and "aclitem" datatypes. They have a hash opclass, but no b-tree operators. GPDB 4 used to allow them as DISTRIBUTED BY columns, but we forbid that in GPDB 5, in commit 56e7c16b. Now they are allowed again, so you can specify an int2vector or aclitem column in DISTRIBUTED BY, but it's still pretty useless, because the planner still can't form EquivalenceClasses on it, and will treat it as "strewn" distribution, and won't co-locate joins. Abstime, reltime, tinterval datatypes don't have default hash opclasses. They are being removed completely on PostgreSQL v12, and users shouldn't be using them in the first place, so instead of adding hash opclasses for them now, we accept that they can't be used as distribution key columns anymore. Add a check to pg_upgrade, to refuse upgrade if they are used as distribution keys in the old cluster. Do the same for 'money' datatype as well, although that's not being removed in upstream. The legacy hashing code for anyarray in GPDB 5 was actually broken. It could produce a different hash value for two arrays that are considered equal, according to the = operator, if there were differences in e.g. whether the null bitmap was stored or not. Add a check to pg_upgrade, to reject the upgrade if array types were used as distribution keys. The upstream hash opclass for anyarray works, though, so it is OK to use arrays as distribution keys in new tables. We just don't support binary upgrading them from GPDB 5. (See github issue https://github.com/greenplum-db/gpdb/issues/5467). The legacy hashing of 'anyrange' had the same problem, but that was new in GPDB 6, so we don't need a pg_upgrade check for that. This also tightens the checks ALTER TABLE ALTER COLUMN and CREATE UNIQUE INDEX, so that you can no longer create a situation where a non-hashable column becomes the distribution key. (Fixes github issue https://github.com/greenplum-db/gpdb/issues/6317) Discussion: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/4fZVeOpXllQCo-authored-by: NMel Kiyama <mkiyama@pivotal.io> Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io> Co-authored-by: NPengzhou Tang <ptang@pivotal.io> Co-authored-by: NChris Hajas <chajas@pivotal.io> Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io> Reviewed-by: NNing Yu <nyu@pivotal.io> Reviewed-by: NSimon Gao <sgao@pivotal.io> Reviewed-by: NJesse Zhang <jzhang@pivotal.io> Reviewed-by: NZhenghua Lyu <zlv@pivotal.io> Reviewed-by: NMelanie Plageman <mplageman@pivotal.io> Reviewed-by: NYandong Yao <yyao@pivotal.io>
-
- 15 12月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
This removes a lot of GPDB-specific code that was used to deal with dynamic scans, and code duplication between nodes dealing with Heap, AO and AOCS tables. * Resurrect SeqScan node. We had replaced it with TableScan in GPDB. Teach SeqScan to also work on append-only and AOCS tables, and remove TableScan and all the code changes that were made in GPDB earlier to deal with all three table types. * Merge BitmapHeapScan, BitmapAppendOnlyScan, and BitmapTableScan node types. They're all BitmapHeapScans now. We used to use BitmapTableScans in ORCA-generated plans, and BitmapHeapScan/BitmapAppendOnlyScan in planner-generated plans, and there was no good reason for the difference. The "heap" part in the name is a bit misleading, but I prefer to keep the upstream name, even though it now handles AO tables as well. It's more like the old BitmapTableScan now, which also handled all three table types, but the code is refactored to stay as close to upstream as possible. * Introduce DynamicBitmapHeapScan. BitmapTableScan used to perform Dynamic scans too, now it's the responsibility of the new DynamicBitmapHeapScan plan node, just like we have DynamicTableScan and DynamicIndexScan as wrappers around SeqScand and IndexScans. * Get rid of BitmapAppendOnlyPath in the planner, too. Use BitmapHeapPath also for AO tables. * Refactor the way Dynamic Table Scan works. A Dynamic Table Scan node is now just a thin wrapper around SeqScan. It initializes a new SeqScan executor node for every partition, and lets it do the actual scanning. It now works the same way that I refactored Dynamic Index Scans to work in commit 198f701e. This allowed removing a lot of code that we used to use for both Dynamic Index Scans and Dynamic Table Scans, but is no longer used. There's now some duplication in the Dynamic* nodes, to walk through the partitions. They all have a function called setPidIndex(), for example, which does the same thing. But I think it's much more clear this way, than the previous DynamicController stuff. We could perhaps extract some of the code to common helper functions, but I think this is OK for now. This also fixes issue #6274. I'm not sure what exactly the bug was, but it was clearly in the Bitmap Table Scan code that is used with ORCA-generated plans. Now that we use the same code for plans generated with the Postgres planner and ORCA, it's not surprising that the bug is gone. Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io> Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
-
- 07 12月, 2018 1 次提交
-
-
由 Ning Yu 提交于
When creating a partition table we want children have the same numsegments with parent. As they all set their numsegments to DEFAULT, does this meet our expectation? No, because DEFAULT does not always equal to DEFAULT itself. When DEFAULT is set to RANDOM a different value is returned each time. So we have to align numsegments explicitly. Also removed an incorrect assert and comment.
-
- 03 12月, 2018 2 次提交
-
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
-
- 27 11月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
In PostgreSQL, a PathKey represents sort ordering, but we have been using it in GPDB to also represent the distribution keys of hash-distributed data in the planner. (i.e. the keys in DISTRIBUTED BY of a table, but also when data is redistributed by some other key on the fly). That's been convenient, and there's some precedent for that, since PostgreSQL also uses PathKey to represent GROUP BY columns, which is quite similar to DISTRIBUTED BY. However, there are some differences. The opfamily, strategy and nulls_first fields in PathKey are not applicable to distribution keys. Using the same struct to represent ordering and hash distribution is sometimes convenient, for example when we need to test whether the sort order or grouping is "compatible" with the distribution. But at other times, it's confusing. To clarify that, introduce a new DistributionKey struct, to represent a hashed distribution. While we're at it, simplify the representation of HashedOJ locus types, by including a List of EquivalenceClasses in DistributionKey, rather than just one EC like a PathKey has. CdbPathLocus now has only one 'distkey' list that is used for both Hashed and HashedOJ locus, and it's a list of DistributionKeys. Each DistributionKey in turn can contain multiple EquivalenceClasses. Looking ahead, I'm working on a patch to generalize the "cdbhash" mechanism, so that we'd use the normal Postgres hash opclasses for distribution keys, instead of hard-coding support for specific datatypes. With that, the hash operator class or family will be an important part of the distribution key, in addition to the datatype. The plan is to store that also in DistributionKey. Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
-
- 23 11月, 2018 1 次提交
-
-
由 Ning Yu 提交于
When Append node contains SingleQE subpath we used to put Append on ALL the segments, however if the SingleQE is partially distributed then apparently we could not put the SingleQE on ALL the segments, this conflict could results in runtime or incorrect results. To fix this we should put Append on SingleQE's segments. In the other hand when there are multiple SingleQE subpaths we should put Append on the common segments of SingleQEs. Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-
- 19 11月, 2018 1 次提交
-
-
由 Adam Lee 提交于
This commit adds the support and option of `mpp_execute 'MASTER | ANY | ALL SEGMENTS'` for foreign tables. MASTER is the default, FDW requests for data from master. ANY, FDW requests for data from master or one any segment, depends on which path costs less. ALL SEGMENTS, FDW requests for data from all segments, wrappers need to have a policy matching the segments to data. For instance, file_fdw probes the mpp_execute vaule, then load different files based on the segment number. But something like gpfdist on the foreign side doesn't need this, which hands out a different slice of the data to each request, all segments could request the same location.
-
- 12 11月, 2018 1 次提交
-
-
由 Pengzhou Tang 提交于
Previously, when creating a APPEND node for inheritance table, if subpaths has different number segments in gp_distribution_policy, the whole APPEND node might be assigned with a wrong numsegments, so some segments can not get plans and lost data in the results.
-
- 05 11月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
All callers of cdbpathlocus_compare were asking for strict equality check.
-
- 12 10月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
Function Scan materializes the result of each in a TupleStore, and can be rescanned. Mark it as rescannable in the planner, so that we avoid putting a pointless Materialize node on top of it. Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
-
- 28 9月, 2018 1 次提交
-
-
由 ZhangJackey 提交于
There was an assumption in gpdb that a table's data is always distributed on all segments, however this is not always true for example when a cluster is expanded from M segments to N (N > M) all the tables are still on M segments, to workaround the problem we used to have to alter all the hash distributed tables to randomly distributed to get correct query results, at the cost of bad performance. Now we support table data to be distributed on a subset of segments. A new columne `numsegments` is added to catalog table `gp_distribution_policy` to record how many segments a table's data is distributed on. By doing so we could allow DMLs on M tables, joins between M and N tables are also supported. ```sql -- t1 and t2 are both distributed on (c1, c2), -- one on 1 segments, the other on 2 segments select localoid::regclass, attrnums, policytype, numsegments from gp_distribution_policy; localoid | attrnums | policytype | numsegments ----------+----------+------------+------------- t1 | {1,2} | p | 1 t2 | {1,2} | p | 2 (2 rows) -- t1 and t1 have exactly the same distribution policy, -- join locally explain select * from t1 a join t1 b using (c1, c2); QUERY PLAN ------------------------------------------------ Gather Motion 1:1 (slice1; segments: 1) -> Hash Join Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2 -> Seq Scan on t1 a -> Hash -> Seq Scan on t1 b Optimizer: legacy query optimizer -- t1 and t2 are both distributed on (c1, c2), -- but as they have different numsegments, -- one has to be redistributed explain select * from t1 a join t2 b using (c1, c2); QUERY PLAN ------------------------------------------------------------------ Gather Motion 1:1 (slice2; segments: 1) -> Hash Join Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2 -> Seq Scan on t1 a -> Hash -> Redistribute Motion 2:1 (slice1; segments: 2) Hash Key: b.c1, b.c2 -> Seq Scan on t2 b Optimizer: legacy query optimizer ```
-
- 25 9月, 2018 1 次提交
-
-
由 Paul Guo 提交于
create_unique_path() could be used to convert semi join to inner join. Previously, during the Semi-join refactor in commit d4ce0921, creating unique path was disabled for the case where duplicats might be on different QEs. In this patch we enable adding motion to unique_ify the path, only if unique mothod is not UNIQUE_PATH_NOOP. We don't create unique path for that case because if later on during plan creation, it is possible to create a motion above this unique path whose subpath is a motion. In that case, the unique path node will be ignored and we will get a motion plan node above a motion plan node and that is bad. We could further improve that, but not in this patch. Co-authored-by: NAlexandra Wang <lewang@pivotal.io> Co-authored-by: NPaul Guo <paulguo@gmail.com>
-
- 21 9月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
They were all treated the same, with the SeqScan code being duplicated for AppendOnlyScans and AOCSScans. That is a merge hazard: if some code is changed for SeqScans, we would have to remember to manually update the other copies. Small differences in the code had already crept up, although given that everything worked, I guess it had no effect. Or only had a small effect on the computed costs. To avoid the duplication, use SeqScan for all of them. Also get rid of TableScan as a separate node type, and have ORCA translator also create SeqScans. The executor for SeqScan node can handle heap, AO and AOCS tables, because we're not actually using the upstream SeqScan code for it. We're using the GPDB code in nodeTableScan.c, and a TableScanState, rather than SeqScanState, as the executor node. That's how it worked before this patch already, what this patch changes is that we now use SeqScan *before* the executor phase, instead of SeqScan/AppendOnlyScan/AOCSScan/TableScan. To avoid having to change all the expected outputs for tests that use EXPLAIN, add code to still print the SeqScan as "Seq Scan", "Table Scan", "Append-only Scan" or "Append-only Columnar Scan", depending on whether the plan was generated by ORCA, and what kind of a table it is.
-
- 19 9月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
When building a Sort node to represent the ordering that is preserved by a Motion node, in make_motion(), the call to make_sort_from_pathkeys() would sometimes fail with "could not find pathkey item to sort". This happened when the ordering was over a UNION ALL operation. When building Motion nodes for MergeAppend subpaths, the path keys that represented the ordering referred to the items in the append rel's target list, not the subpaths. In create_merge_append_plan(), where we do a similar thing for each subpath, we correctly passed the 'relids' argument to prepare_sort_from_pathkeys(), so that prepare_sort_from_pathkeys() can match the target list entries of the append relation with the entries of the subpaths. But when creating the Motion nodes for each subpath, we were passing NULL as 'relids' (via make_sort_from_pathkeys()). At a high level, the fix is straightforward: we need to pass the correct 'relids' argument to prepare_sort_from_pathkeys(), in cdbpathtoplan_create_motion_plan(). However, the current code structure makes that not so straightforward, so this required some refactoring of the make_motion() and related functions: Previously, make_motion() and make_sorted_union_motion() would take a path key list as argument, to represent the ordering, and it called make_sort_from_pathkeys() to extract the sort columns, operators etc. After this patch, those functions take arrays of sort columns, operators, etc. directly as arguments, and the caller is expected to do the call to make_sort_from_pathkeys() to get them, or build them through some other means. In cdbpathtoplan_create_motion_plan(), call prepare_sort_from_pathkeys() directly, rather than the make_sort_from_pathkeys() wrapper, so that we can pass the 'relids' argument. Because prepare_sort_from_pathkeys() is marked as 'static', move cdbpathtoplan_create_motion_plan() from cdbpathtoplan.c to createplan.c, so that it can call it. Add test case. It's a slightly reduced version of a query that we already had in 'olap_group' test, but seems better to be explicit. Revert the change in expected output of 'olap_group', made in commit 28087f4e, which memorized the error in the expected output. Fixes https://github.com/greenplum-db/gpdb/issues/5695. Reviewed-by: NPengzhou Tang <ptang@pivotal.io> Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
-
- 29 8月, 2018 1 次提交
-
-
由 Richard Guo 提交于
This keeps the same with PostgreSQL. Co-authored-by: NAlexandra Wang <lewang@pivotal.io> Co-authored-by: NRichard Guo <guofenglinux@gmail.com>
-
- 24 8月, 2018 1 次提交
-
-
由 Jinbao Chen 提交于
After 8.4 merge, we have two restrictlist 'mergeclause_list' and 'hashclause_list' in function 'add_paths_to_joinrel'. We use mergeclause_list in cdb motion in hashjoin. But some of keys should not been used as distribution keys. Add a whitelist that which operator is distribution-compatible.
-
- 15 8月, 2018 1 次提交
-
-
由 xiong-gang 提交于
* Remove ERRCODE_GP_FEATURE_NOT_SUPPORTED and use ERRCODE_FEATURE_NOT_SUPPORTED instead * Remove ERROR_INVALID_WINDOW_FRAME_PARAMETER and use ERRCODE_WINDOWING_ERROR instead Co-authored-by: NAlexandra Wang <lewang@pivotal.io> Co-authored-by: NGang Xiong <gxiong@pivotal.io>
-
- 13 8月, 2018 1 次提交
-
-
由 xiong-gang 提交于
Replace function `cdbpath_rows(root, path)` with path->rows, this is more in line with upstream 9.2, thus removes a GPDB_92_MERGE_FIXME Co-authored-by: Alexandra Wang<lewang@pivotal.io> Co-authored-by: Gang Xiong<gxiong@pivotal.io>
-
- 03 8月, 2018 1 次提交
-
-
由 Karen Huddleston 提交于
This reverts commit 4750e1b6.
-
- 02 8月, 2018 1 次提交
-
-
由 Richard Guo 提交于
This is the final batch of commits from PostgreSQL 9.2 development, up to the point where the REL9_2_STABLE branch was created, and 9.3 development started on the PostgreSQL master branch. Notable upstream changes: * Index-only scan was included in the batch of upstream commits. It allows queries to retrieve data only from indexes, avoiding heap access. * Group commit was added to work effectively under heavy load. Previously, batching of commits became ineffective as the write workload increased, because of internal lock contention. * A new fast-path lock mechanism was added to reduce the overhead of taking and releasing certain types of locks which are taken and released very frequently but rarely conflict. * The new "parameterized path" mechanism was added. It allows inner index scans to use values from relations that are more than one join level up from the scan. This can greatly improve performance in situations where semantic restrictions (such as outer joins) limit the allowed join orderings. * SP-GiST (Space-Partitioned GiST) index access method was added to support unbalanced partitioned search structures. For suitable problems, SP-GiST can be faster than GiST in both index build time and search time. * Checkpoints now are performed by a dedicated background process. Formerly the background writer did both dirty-page writing and checkpointing. Separating this into two processes allows each goal to be accomplished more predictably. * Custom plan was supported for specific parameter values even when using prepared statements. * API for FDW was improved to provide multiple access "paths" for their tables, allowing more flexibility in join planning. * Security_barrier option was added for views to prevents optimizations that might allow view-protected data to be exposed to users. * Range data type was added to store a lower and upper bound belonging to its base data type. * CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The SELECT query is planned during the execution of the utility. To conform to this change, GPDB executes the utility statement only on QD and dispatches the plan of the SELECT query to QEs. Co-authored-by: NAdam Lee <ali@pivotal.io> Co-authored-by: NAlexandra Wang <lewang@pivotal.io> Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io> Co-authored-by: NAsim R P <apraveen@pivotal.io> Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io> Co-authored-by: NGang Xiong <gxiong@pivotal.io> Co-authored-by: NHaozhou Wang <hawang@pivotal.io> Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io> Co-authored-by: NJesse Zhang <sbjesse@gmail.com> Co-authored-by: NJinbao Chen <jinchen@pivotal.io> Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io> Co-authored-by: NMelanie Plageman <mplageman@pivotal.io> Co-authored-by: NPaul Guo <paulguo@gmail.com> Co-authored-by: NRichard Guo <guofenglinux@gmail.com> Co-authored-by: NShujie Zhang <shzhang@pivotal.io> Co-authored-by: NTaylor Vesely <tvesely@pivotal.io> Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
-
- 09 7月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
Instead of completely disabling the generation of Paths with disabled plan types, add a high penalty to their cost estimates, like in the upstream. This reduces our diff vs. upstream, making future merges more straightforward. Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/Az2cDcqf73g/_tY6Yv1kBgAJCo-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io> Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io> Reviewed-by: NRichard Guo <riguo@pivotal.io>
-
- 29 3月, 2018 1 次提交
-
-
由 Pengzhou Tang 提交于
* Support replicated table in GPDB Currently, tables are distributed across all segments by hash or random in GPDB. There are requirements to introduce a new table type that all segments have the duplicate and full table data called replicated table. To implement it, we added a new distribution policy named POLICYTYPE_REPLICATED to mark a replicated table and added a new locus type named CdbLocusType_SegmentGeneral to specify the distribution of tuples of a replicated table. CdbLocusType_SegmentGeneral implies data is generally available on all segments but not available on qDisp, so plan node with this locus type can be flexibly planned to execute on either single QE or all QEs. it is similar with CdbLocusType_General, the only difference is that CdbLocusType_SegmentGeneral node can't be executed on qDisp. To guarantee this, we try our best to add a gather motion on the top of a CdbLocusType_SegmentGeneral node when planing motion for join even other rel has bottleneck locus type, a problem is such motion may be redundant if the single QE is not promoted to executed on qDisp finally, so we need to detect such case and omit the redundant motion at the end of apply_motion(). We don't reuse CdbLocusType_Replicated since it's always implies a broadcast motion bellow, it's not easy to plan such node as direct dispatch to avoid getting duplicate data. We don't support replicated table with inherit/partition by clause now, the main problem is that update/delete on multiple result relations can't work correctly now, we can fix this later. * Allow spi_* to access replicated table on QE Previously, GPDB didn't allow QE to access non-catalog table because the data is incomplete, we can remove this limitation now if it only accesses replicated table. One problem is QE need to know if a table is replicated table, previously, QE didn't maintain the gp_distribution_policy catalog, so we need to pass policy info to QE for replicated table. * Change schema of gp_distribution_policy to identify replicated table Previously, we used a magic number -128 in gp_distribution_policy table to identify replicated table which is quite a hack, so we add a new column in gp_distribution_policy to identify replicated table and partitioned table. This commit also abandon the old way that used 1-length-NULL list and 2-length-NULL list to identify DISTRIBUTED RANDOMLY and DISTRIBUTED FULLY clause. Beside, this commit refactor the code to make the decision-making of distribution policy more clear. * support COPY for replicated table * Disable row ctid unique path for replicated table. Previously, GPDB use a special Unique path on rowid to address queries like "x IN (subquery)", For example: select * from t1 where t1.c2 in (select c2 from t3), the plan looks like: -> HashAggregate Group By: t1.ctid, t1.gp_segment_id -> Hash Join Hash Cond: t2.c2 = t1.c2 -> Seq Scan on t2 -> Hash -> Seq Scan on t1 Obviously, the plan is wrong if t1 is a replicated table because ctid + gp_segment_id can't identify a tuple, in replicated table, a logical row may have different ctid and gp_segment_id. So we disable such plan for replicated table temporarily, it's not the best way because rowid unique way maybe the cheapest plan than normal hash semi join, so we left a FIXME for later optimization. * ORCA related fix Reported and added by Bhuvnesh Chaudhary <bchaudhary@pivotal.io> Fallback to legacy query optimizer for queries over replicated table * Adapt pg_dump/gpcheckcat to replicated table gp_distribution_policy is no longer a master-only catalog, do same check as other catalogs. * Support gpexpand on replicated table && alter the dist policy of replicated table
-
- 09 3月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
The immediate reason to do this was the "this ‘else’ clause does not guard" gcc warning from create_mergejoin_path(). But while we're at it, might as well clean up the whole file. I spotted one piece of code that looks broken, marked that with a FIXME to make sure we revisit that.
-
- 08 3月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
Like in the 'join' regression test: postgres=# select * from int4_tbl a full join int4_tbl b on true; ERROR: Query requires a feature that has been disabled by a configuration setting. DETAIL: Could not devise a query plan for the given query. HINT: Current settings: optimizer=off
-
- 06 3月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
This fixes "unrecognized path type" error with the queries that are added to the test suite. The tests were originally written by Kavinder back in October, but were not added to the test suite then, because they were failing. Co-authored-by: NKavinder Dhaliwal <kavinderd@gmail.com>
-
- 09 2月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
This removes much of the GPDB machinery to handle "deduplication paths" within the planner. We will now use the upstream code to build JOIN_SEMI paths, as well as paths where the outer side of the join is first deduplicated (JOIN_UNIQUE_OUTER/INNER). The old style "join first and deduplicate later" plans can be better in some cases, however. To still be able to generate such plan, add new JOIN_DEDUP_SEMI join type, which is transformed into JOIN_INNER followed by the deduplication step after the join, during planning. This new way of constructing these plans is simpler, and allows removing a bunch of code, and reverting some more code to the way it is in the upstream. I'm not sure if this can generate the same plans that the old code could, in all cases. In particular, I think the old "late deduplication" mechanism could delay the deduplication further, all the way to the top of the join tree. I'm not sure when that woud be useful, though, and the regression suite doesn't seem to contain any such cases (with EXPLAIN). Or maybe I misunderstood the old code. In any case, I think this is good enough.
-
- 24 1月, 2018 1 次提交
-
-
由 Tom Lane 提交于
If we're inside a lateral subquery, there may be no unparameterized paths for a particular child relation of an appendrel, in which case we *must* be able to create similarly-parameterized paths for each other child relation, else the planner will fail with "could not devise a query plan for the given query". This means that there are situations where we'd better be able to reparameterize at least one path for each child. This calls into question the assumption in reparameterize_path() that it can just punt if it feels like it. However, the only case that is known broken right now is where the child is itself an appendrel so that all its paths are AppendPaths. (I think possibly I disregarded that in the original coding on the theory that nested appendrels would get folded together --- but that only happens *after* reparameterize_path(), so it's not excused from handling a child AppendPath.) Given that this code's been like this since 9.3 when LATERAL was introduced, it seems likely we'd have heard of other cases by now if there were a larger problem. Per report from Elvis Pranskevichus. Back-patch to 9.3. Discussion: https://postgr.es/m/5981018.zdth1YWmNy@hammer.magicstack.net
-
- 27 9月, 2017 7 次提交
-
-
Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Tue Jul 8 14:03:32 2014 -0400 While the x output of "select x from t group by x" can be presumed unique, this does not hold for "select x, generate_series(1,10) from t group by x", because we may expand the set-returning function after the grouping step. (Perhaps that should be re-thought; but considering all the other oddities involved with SRFs in targetlists, it seems unlikely we'll change it.) Put a check in query_is_distinct_for() so it's not fooled by such cases. Back-patch to all supported branches. David Rowley (cherry picked from commit 2e7469dc8b3bac4fe0f9bd042aaf802132efde85)
-
We had a bunch of fixmes that we added as part of the subselect merge; All of the fixmes are now marked as `GPDB_84_MERGE_FIXME` so that they can be grepped easily.
-
For flattened IN or EXISTS sublinks, if we chose INNER JOIN path instead of SEMI JOIN then we need to apply duplicate suppression. The deduplication can be done in two ways: 1. post-join dedup unique-ify the inner join results. try_postjoin_dedup in CdbRelDedupInfo denotes if we need to got for post-join dedup 2. pre-join dedup unique-ify the rows coming from the rel containing the subquery result, before that is joined with any other rels. join_unique_ininfo in CdbRelDedupInfo denotes if we need to go for pre-join dedup. semi_operators and semi_rhs_exprs are used for this. We ported a function from 9.5 to compute these in make_outerjoininfo(). Upstream has completely different implementation of this. Upstream explores JOIN_UNIQUE_INNER and JOIN_UNIQUE_OUTER paths for this and deduplication is done create_unique_path(). GPDB does this differently since JOIN_UNIQUE_INNER and JOIN_UNIQUE_OUTER are obsolete for us. Hence we have kept the GPDB style deduplication mechanism as it in this merge. Post-join has been implemented in previous merge commits. Ref [#146890743]
-
由 Shreedhar Hardikar 提交于
0. Fix up post join dedup logic after cherry-pick 0. Fix pull_up_sublinks_jointree_recurse returning garbage relids 0. Update gporca, rangefuncs, eagerfree answer fileis 1. gporca Previously we were generating a Hash Inner Join with an HashAggregate for deduplication. Now we generate a Hash Semi Join in which case we do not need to deduplicate the inner. 2. rangefuncs We updated this answer file during the cherry-pick of e006a24a since there was a change in plan. After these cherry-picks, we are back to the original plan as master. Hence we see the original error. 3. eagerfree We are generating a not-very-useful subquery scan node with this change. This is not producing wrong results. But this subqeury scan needs to be removed. We will file a follow-up chore to investigate and fix this. 0. We no longer need helper function `hasSemiJoin()` to check whether this specialInfo list has any specialJoinInfos constructed for Semi Join (IN/EXISTS sublink). We have moved that check inside `cdb_set_cheapest_dedup()` 0. We are not exercising the pre-join-deduplication code path after this cherry-pick. Before this merge, we had three CDB specific nodes in `InClauseInfo` in which we recorded information for pre-join-dedup in case of simple uncorrelated IN sublinks. `try_join_unique`, `sub_targetlist` and `InOperators` Since we now have `SpecialJoinInfo` instead of `InClauseInfo`, we need to devise a way to record this information in `SpecialJoinInfo`. We have filed a follow-up story for this. Ref [#142356521] Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
-
由 Ekta Khanna 提交于
Since `InClauseInfo` and `OuterJoinInfo` are now combined into `SpecialJoinInfo` after merging with e006a24a; this commit remove them from the relevant places. Access `join_info_list` instead of `in_info_list` and `oj_info_list` Previously, `CdbRelDedupInfo` contained list of `InClauseInfo` s. While making join decisions and overall join processing, we traversed this list and invoked cdb specific functions: `cdb_make_rel_dedup_info()`, `cdbpath_dedup_fixup()` Since `InClauseInfo` is no longer available, `CdbRelDedupInfo` will contain list of `SpecialJoinInfo` s. All the cdb specific routines which were previously called for `InClauseInfo` list will now be called if `CdbRelDedupInfo` has valid `SpecialJoinInfo` list and if join type in `SpecialJoinInfo` is `JOIN_SEMI`. A new helper routine `hasSemiJoin()` has been added which traverses `SpecialJoinInfo` list to check if it contains `JOIN_SEMI`. Ref [#142355175] Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
-
由 Ekta Khanna 提交于
Original Flow: cdb_flatten_sublinks +--> pull_up_IN_clauses +--> convert_sublink_to_join New Flow: cdb_flatten_sublinks +--> pull_up_sublinks This commit contains relevant changes for the above flow. Previously, `try_join_unique` was part of `InClauseInfo`. It was getting set in `convert_IN_to_join()` and used in `cdb_make_rel_dedup_info()`. Now, since `InClauseInfo` is not present and we construct `FlattenedSublink` instead in `convert_ANY_sublink_to_join()`. And later in the flow, we construct `SpecialJoinInfo` from `FlattenedSublink` in `deconstruct_sublink_quals_to_rel()`. Hence, adding `try_join_unique` as part of both `FlattenedSublink` and `SpecialJoinInfo`. Ref [#142355175] Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
由 Ekta Khanna 提交于
commit e006a24a Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Thu Aug 14 18:48:00 2008 +0000 Implement SEMI and ANTI joins in the planner and executor. (Semijoins replace the old JOIN_IN code, but antijoins are new functionality.) Teach the planner to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti joins respectively. Also, LEFT JOINs with suitable upper-level IS NULL filters are recognized as being anti joins. Unify the InClauseInfo and OuterJoinInfo infrastructure into "SpecialJoinInfo". With that change, it becomes possible to associate a SpecialJoinInfo with every join attempt, which permits some cleanup of join selectivity estimation. That needs to be taken much further than this patch does, but the next step is to change the API for oprjoin selectivity functions, which seems like material for a separate patch. So for the moment the output size estimates for semi and especially anti joins are quite bogus. Ref [#142355175] Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-