- 14 3月, 2019 1 次提交
-
-
由 Daniel Gustafsson 提交于
As we merge with upstream and by that keep refining the Postgres planner, legacy planner is no longer a suitable name. This changes all variations of the spelling (legacy planner, legacy optimizer, legacy query optimizer etc) to say "Postgres" rather than "legacy". Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io> Reviewed-by: NDavid Yozie <dyozie@pivotal.io> Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
-
- 11 3月, 2019 1 次提交
-
-
由 Ning Yu 提交于
This method was introduced to improve the data redistribution performance during gpexpand phase2, however per benchmark results the effect does not reach our expectation. For example when expanding a table from 7 segments to 8 segments the reshuffle method is only 30% faster than the traditional CTAS method, when expanding from 4 to 8 segments reshuffle is even 10% slower than CTAS. When there are indexes on the table the reshuffle performance can be worse, and extra VACUUM is needed to actually free the disk space. According to our experiments the bottleneck of reshuffle method is on the tuple deletion operation, it is much slower than the insertion operation used by CTAS. The reshuffle method does have some benefits, it requires less extra disk space, it also requires less network bandwidth (similar to CTAS method with the new JCH reduce method, but less than CTAS + MOD). And it can be faster in some cases, however as we can not automatically determine when it is faster it is not easy to get benefit from it in practice. On the other side the reshuffle method is less tested, it is possible to have bugs in corner cases, so it is not production ready yet. In such a case we decided to retire it entirely for now, we might add it back in the future if we can get rid of the slow deletion or find out reliable ways to automatically choose between reshuffle and ctas methods. Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/8xknWag-SkI/5OsIhZWdDgAJReviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io> Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
-
- 27 2月, 2019 1 次提交
-
-
由 Jialun 提交于
- Retire GP_POLICY_ALL_NUMSEGMENTS and GP_POLICY_ENTRY_NUMSEGMENTS, unify to getgpsegmentCount - retire GP_POLICY_MINIMAL_NUMSEGMENTS & GP_POLICY_RANDOM_NUMSEGMENTS - Change NUMSEGMENTS related macro from variable macro to function macro - Change default return value of getgpsegmentCount to 1, which represents a singleton postgresql in utility mode - change __GP_POLICY_INVALID_NUMSEGMENTS to GP_POLICY_INVALID_NUMSEGMENTS
-
- 06 2月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
The code to look up the hash functions for a Reshuffle plan used get_opfamily_proc() instead of the more versatile cdb_hashproc_in_family() function, which is used in other similar places where we need to look up the hash functions for a distribution key. Like in makeCdbHashForRelation(). That lead to errors if the datatype didn't have a hash function defined directly for the datatype, but only via a binary-coercible cast. Domain and enum types are such cases, for example. Fixes https://github.com/greenplum-db/gpdb/issues/6901Reviewed-by: NDavid Kimura <dkimura@pivotal.io> Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
-
- 01 2月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
Replace the use of the built-in hashing support for built-in datatypes, in cdbhash.c, with the normal PostgreSQL hash functions. Now is a good time to do this, since we've already made the change to use jump consistent hashing in GPDB 6, so we'll need to deal with the upgrade problems associated with changing the hash functions, anyway. It is no longer enough to track which columns/expressions are used to distribute data. You also need to know the hash function used. For that, a new field is added to gp_distribution_policy, to record the hash operator class used for each distribution key column. In the planner, a new opfamily field is added to DistributionKey, to track that throughout the planning. Normally, if you do "CREATE TABLE ... DISTRIBUTED BY (column)", the default hash operator class for the datatype is used. But this patch extends the syntax so that you can specify the operator class explicitly, like "... DISTRIBUTED BY (column opclass)". This is similar to how an operator class can be specified for each column in CREATE INDEX. To support upgrade, the old hash functions have been converted to special (non-default) operator classes, named cdbhash_*_ops. For example, if you want to use the old hash function for an integer column, you could do "DISTRIBUTED BY (intcol cdbhash_int4_ops)". The old hard-coded whitelist of operators that have "compatible" cdbhash functions has been replaced by putting the compatible hash opclasses in the same operator family. For example, all legacy integer operator classes, cdbhash_int2_ops, cdbhash_int4_ops and cdbhash_int8_ops, are all part of the cdbhash_integer_ops operator family). This removes the pg_database.hashmethod field. The hash method is now tracked on a per-table and per-column basis, using the opclasses, so it's not needed anymore. To help with upgrade from GPDB 5, this introduces a new GUC called 'gp_use_legacy_hashops'. If it's set, CREATE TABLE uses the legacy hash opclasses, instead of the default hash opclasses, if the opclass is not specified explicitly. pg_upgrade will set the new GUC, to force the use of legacy hashops, when restoring the schema dump. It will also set the GUC on all upgraded databases, as a per-database option, so any new tables created after upgrade will also use the legacy opclasses. It seems better to be consistent after upgrade, so that collocation between old and new tables work for example. The idea is that some time after the upgrade, the admin can reorganize all tables to use the default opclasses instead. At that point, he should also clear the GUC on the converted databases. (Or rather, the automated tool that hasn't been written yet, should do that.) ORCA doesn't know about hash operator classes, or the possibility that we might need to use a different hash function for two columns with the same datatype. Therefore, it cannot produce correct plans for queries that mix different distribution hash opclasses for the same datatype, in the same query. There are checks in the Query->DXL translation, to detect that case, and fall back to planner. As long as you stick to the default opclasses in all tables, we let ORCA to create the plan without any regard to them, and use the default opclasses when translating the DXL plan to a Plan tree. We also allow the case that all tables in the query use the "legacy" opclasses, so that ORCA works after pg_upgrade. But a mix of the two, or using any non-default opclasses, forces ORCA to fall back. One curiosity with this is the "int2vector" and "aclitem" datatypes. They have a hash opclass, but no b-tree operators. GPDB 4 used to allow them as DISTRIBUTED BY columns, but we forbid that in GPDB 5, in commit 56e7c16b. Now they are allowed again, so you can specify an int2vector or aclitem column in DISTRIBUTED BY, but it's still pretty useless, because the planner still can't form EquivalenceClasses on it, and will treat it as "strewn" distribution, and won't co-locate joins. Abstime, reltime, tinterval datatypes don't have default hash opclasses. They are being removed completely on PostgreSQL v12, and users shouldn't be using them in the first place, so instead of adding hash opclasses for them now, we accept that they can't be used as distribution key columns anymore. Add a check to pg_upgrade, to refuse upgrade if they are used as distribution keys in the old cluster. Do the same for 'money' datatype as well, although that's not being removed in upstream. The legacy hashing code for anyarray in GPDB 5 was actually broken. It could produce a different hash value for two arrays that are considered equal, according to the = operator, if there were differences in e.g. whether the null bitmap was stored or not. Add a check to pg_upgrade, to reject the upgrade if array types were used as distribution keys. The upstream hash opclass for anyarray works, though, so it is OK to use arrays as distribution keys in new tables. We just don't support binary upgrading them from GPDB 5. (See github issue https://github.com/greenplum-db/gpdb/issues/5467). The legacy hashing of 'anyrange' had the same problem, but that was new in GPDB 6, so we don't need a pg_upgrade check for that. This also tightens the checks ALTER TABLE ALTER COLUMN and CREATE UNIQUE INDEX, so that you can no longer create a situation where a non-hashable column becomes the distribution key. (Fixes github issue https://github.com/greenplum-db/gpdb/issues/6317) Discussion: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/4fZVeOpXllQCo-authored-by: NMel Kiyama <mkiyama@pivotal.io> Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io> Co-authored-by: NPengzhou Tang <ptang@pivotal.io> Co-authored-by: NChris Hajas <chajas@pivotal.io> Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io> Reviewed-by: NNing Yu <nyu@pivotal.io> Reviewed-by: NSimon Gao <sgao@pivotal.io> Reviewed-by: NJesse Zhang <jzhang@pivotal.io> Reviewed-by: NZhenghua Lyu <zlv@pivotal.io> Reviewed-by: NMelanie Plageman <mplageman@pivotal.io> Reviewed-by: NYandong Yao <yyao@pivotal.io>
-
- 25 1月, 2019 1 次提交
-
-
由 Alexandra Wang 提交于
The GPDB_92_MERGE_FIXME for whether we need to deep copy or memcopy suffices in case of subroot can be removed as from the subroot all we care about is the `parse->rtable`, therefore, creating a deep copy of it is unnecessary. This commit also removes the `Assert()` which is valid in Upstream but for GPDB, since we create a new copy of the subplan if two SubPlans refer to the same initplan. Therefore, when we try to set references for subqueryscans in plans with copies of subplans refering to same initplan, we cannot directly Assert on the RelOptInfo's subplan being same as the subqueryscan's subplan. Added a test case for the same, which will ensure we do not merge back the Assert back from Upstream in future merges. Co-authored-by: NEkta Khanna <ekhanna@pivotal.io>
-
- 15 1月, 2019 1 次提交
-
-
由 Wang Hao 提交于
Some plan node types such as ModifyTable or MergeAppend is not covered in assign_plannode_id(), leads children nodes of them are not assigned with proper plan_node_id. The plan_node_id is required by gpmon and instrument for monitoring purpose, without proper plan_node_id assigned, the consistency of monitoring data will be broken. This commit refactor assign_plannode_id() to use plan_tree_walker. As a result, ModifyTable, MergeAppend and potentially Sequence are covered. Another advantage of using plan_tree_walker is when new types are introduced, we don't need to take care of assign_plannode_id anymore, plan_tree_walker should do that. Fixes https://github.com/greenplum-db/gpdb/issues/5247Reviewed-by: NNing Yu <nyu@pivotal.io> Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-
- 19 12月, 2018 1 次提交
-
-
由 Zhenghua Lyu 提交于
Split-update is used for update-statement on a hash-distributed table's hash-columns. A redistributed-motion has to be added above the split node in the plan and this was achieved by marking the split node strew. However, if the subplan's flow is an entry, we should not mark it strewn.
-
- 14 12月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
-
- 13 12月, 2018 1 次提交
-
-
由 Daniel Gustafsson 提交于
The Greenplum specific error handling via ereport()/elog() calls was in need of a unification effort as some parts of the code was using a different messaging style to others (and to upstream). This aims at bringing many of the GPDB error calls in line with the upstream error message writing guidelines and thus make the user experience of Greenplum more consistent. The main contributions of this patch are: * errmsg() messages shall start with a lowercase letter, and not end with a period. errhint() and errdetail() shall be complete sentences starting with capital letter and ending with a period. This attempts to fix this on as many ereport() calls as possible, with too detailed errmsg() content broken up into details and hints where possible. * Reindent ereport() calls to be more consistent with the common style used in upstream and most parts of Greenplum: ereport(ERROR, (errcode(<CODE>), errmsg("short message describing error"), errhint("Longer message as a complete sentence."))); * Avoid breaking messages due to long lines since it makes grepping for error messages harder when debugging. This is also the de facto standard in upstream code. * Convert a few internal error ereport() calls to elog(). There are no doubt more that can be converted, but the low hanging fruit has been dealt with. Also convert a few elog() calls which are user facing to ereport(). * Update the testfiles to match the new messages. Spelling and wording is mostly left for a follow-up commit, as this was getting big enough as it was. The most obvious cases have been handled but there is work left to be done here. Discussion: https://github.com/greenplum-db/gpdb/pull/6378Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io> Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-
- 03 12月, 2018 2 次提交
-
-
由 Heikki Linnakangas 提交于
For consistency: this is how we represent column indexes e.g. in Sort, Unique, MergeAppend and many other plan types. Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
-
由 Heikki Linnakangas 提交于
ORCA generated plans, where the "hash filter" in the Result node was set to an empty set of columns. That meant "discard all the rows, on all segments, except one segment". This is used at least with set-returning functions, where we don't care where the function is executed, but it only needs to be executed once. (The planner creates a one-to-many Redistribute Motion plan in that scenario, which makes a lot more sense to me, but doing the same in ORCA would require more invasive surgery than what I'm capable of.) Instead of executing the subplan, and throwing away the result one row at a time, use a Result plan with a One-Off Filter. That's more efficient. Also, it allows removing the Result.hashFilter boolean flag, because there the weird case of a hashFilter with zero columns is gone. You can check "hashList != NIL" directly now. The old method would always choose the same segment, which seems bad for load distribution. The way it was chosen seemed totally accidental too: we initialized the cdbhash object to the initial constant value, and then reduced that into the target segment number, using the jump consistent hash algorithm. We computed that for every row, but the result was always the same. On a three-node cluster, the target was always segment 1. Now, we pick a segment at random when generating the plan. Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
-
- 27 11月, 2018 1 次提交
-
-
由 Zhenghua Lyu 提交于
Previously the reshuffle node's numsegments is always set to the cluster size. Now we have flexible gang & dispath API, we should correct the numsegments field of reshuffle node to set it as the its lefttree's flow->numsegments. Co-authored-by: NShujie Zhang <shzhang@pivotal.io> Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
-
- 23 11月, 2018 3 次提交
-
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
-
由 Pengzhou Tang 提交于
Previously, when creating join path between CdbLocusType_SingleQE path and CdbLocusType_SegmentGeneral path, we always add a motion on top of CdbLocusType_SegmentGeneral path so even the join path is promoted to executed on QD, the CdbLocusType_SegmentGeneral path can still be executed to segments. join (CdbLocusType_SingleQE) / \ / \ CdbLocusType_SingleQE Gather Motion \ CdbLocusType_SegmentGeneral For example: (select * from partitioned_table limit 1) as t1 Nested Loop -> Gather Motion 1:1 -> Seq Scan on replicated_table -> Materialize -> Subquery Scan on t1 -> Limit -> Gather Motion 3:1 -> Limit -> Seq Scan on partitioned_table replicated_table only store tuples on segments, so without the gather motion, seq scan of replicated_table doesn't provide tuples. There is another problem, if join path is not promoted to QD, the gather motion might be redundant, For example: (select * from replicated_table, (select * from partitioned_table limit 1) t1) sub1; Gather Motion 3:1 -> Nested Loop -> Seq Scan on partitioned_table_2 -> Materialize -> Broadcast Motion 1:3 -> Nested Loop -> Gather Motion 1:1 (redundant motion) -> Seq Scan on replicated_table -> Materialize -> Subquery Scan on t1 -> Limit -> Gather Motion 3:1 -> Limit -> Seq Scan on partitioned_table So in apply_motion_mutator(), we omit such redundant motion if it's not gathered to top slice (QD). sliceDepth == 0 means it is top slice, however, sliceDepth now is shared by both init plans and main plan, so if main plan increased the sliceDepth, init plan may omit the gather motion unexpectedly which create a wrong results. The fix is simple to reset sliceDepth for init plans
-
- 22 11月, 2018 3 次提交
-
-
由 Heikki Linnakangas 提交于
There was some confusion on how NULLs are distributed, when CdbPathLocus is of Hashed or HashedOJ type. The comment in cdbpathlocus.h suggested that NULLs can be on any segment. But the rest of the code assumed that that's true only for HashedOJ, and that for Hashed, all NULLs are stored on a particular segment. There was a comment in cdbgroup.c that said "Or would HashedOJ ok, too?"; the answer to that is "No!". Given the comment in cdbpathlocus.h, I'm not suprised that the author was not very sure about that. Clarify the comments in cdbpathlocus.h and cdbgroup.c on that. There were a few cases where we got that actively wrong. repartitionPlan() function is used to inject a Redistribute Motion into queries used for CREATE TABLE AS and INSERT, if the "current" locus didn't match the target table's policy. It did not check for HashedOJ. Because of that, if the query contained FULL JOINs, NULL values might end up on all segments. Code elsewhere, particularly in cdbgroup.c, assumes that all NULLs in a table are stored on a single segment, identified by the cdbhash value of a NULL datum. Fix that, by adding a check for HashedOJ in repartitionPlan(), and forcing a Redistribute Motion. CREATE TABLE AS had a similar problem, in the code to decide which distribution key to use, if the user didn't specify DISTRIBUTED BY explicitly. The default behaviour is to choose a distribution key that matches the distribution of the query, so that we can avoid adding an extra Redistribute Motion. After fixing repartitionPlan, there was no correctness problem, but if we chose the key based on a HashedOJ locus, there is no performance benefit because we'd need a Redistribute Motion anyway. So modify the code that chooses the CTAS distribution key to ignore HashedOJ. While we're at it, refactor the code to choose the CTAS distribution key, by moving it to a separate function. It had become ridiculously deeply indented. Fixes https://github.com/greenplum-db/gpdb/issues/6154, and adds tests. Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
-
由 Heikki Linnakangas 提交于
Fix indentation. In the code to generate a NOTICE, remove if() for condition that we had checked earlier in the function already, and use a StringInfo for building the string.
-
由 Ning Yu 提交于
Introduced a new debugging extension gp_debug_numsegments to get / set the default numsegments when creating tables. gp_debug_get_create_table_default_numsegments() gets the default numsegments. gp_debug_set_create_table_default_numsegments(text) sets the default numsegments in text format, valid values are: - 'FULL': all the segments; - 'RANDOM': pick a random set of segments each time; - 'MINIMAL': the minimal set of segments; gp_debug_set_create_table_default_numsegments(integer) sets the default numsegments directly, valid range is [1, gp_num_contents_in_cluster]. gp_debug_reset_create_table_default_numsegments(text) or gp_debug_reset_create_table_default_numsegments(integer) reset the default numsegments to the specified value, and the value can be reused later. gp_debug_reset_create_table_default_numsegments() resets the default numsegments to the value passed last time, if there is no previous call to it the value is 'FULL'. Refactored ICG test partial_table.sql to create partial tables with this extension.
-
- 13 11月, 2018 1 次提交
-
-
由 Jinbao Chen 提交于
In ‘copy (select statement) to file’, we generate a query plan and set its dest receivor to copy_dest_receive. And run the dest receivor on QD. In 'copy (select statement) to file on segment', we modify the query plan, delete gather mothon, and let dest receivor run on QE. Change 'isCtas' in Query to 'parentStmtType' to be able to mark the upper utility statement type. Add a CopyIntoClause node to store copy informations. Add copyIntoClause to PlannedStmt. In postgres, we don't need to make a different query plan for the query in the utility stament. But in greenplum, we need to. So we use a field to indicate whether the query is contained in utitily statemnt, and the type of utitily statemnt. Actually the behavior of 'copy (select statement) to file on segment' is very similar to 'SELECT ... INTO ...' and 'CREATE TABLE ... AS SELECT ...'. We use distribution policy inherent in the query result as the final data distribution policy. If not, we use the first clomn in target list as the key, and redistribute. The only difference is that we used 'copy_dest_receiver' instead of 'intorel_dest_receiver'
-
- 07 11月, 2018 1 次提交
-
-
由 ZhangJackey 提交于
Now we have partial tables and flexible GANG API, so we can allocate GANG according to numsegments. With the commit 4eb65a53, GPDB supports table distributed on partial segments, and with the series of commits (a3ddac06, 576690f2), GPDB supports flexible gang API. Now it is a good time to combine both the new features. The goal is that creating gang only on the necessary segments for each slice. This commit also improves singleQE gang scheduling and does some code clean work. However, if ORCA is enabled, the behavior is just like before. The outline of this commit is: * Modify the FillSliceGangInfo API so that gang_size is truly flexible. * Remove numOutputSegs and outputSegIdx fields in motion node. Add a new field isBroadcast to mark if the motion is a broadcast motion. * Remove the global variable gp_singleton_segindex and make singleQE segment_id randomly(by gp_sess_id). * Remove the field numGangMembersToBeActive in Slice because it is now exactly slice->gangsize. * Modify the message printed if the GUC Test_print_direct_dispatch_info is set. * Explicitly BEGIN create a full gang now. * format and remove destSegIndex * The isReshuffle flag in ModifyTable is useless, because it only is used when we want to insert tuple to the segment which is out the range of the numsegments. Co-authored-by: Zhenghua Lyu zlv@pivotal.io
-
- 06 11月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
Avoids looking through domains, array types, etc. on every call. That seems like a more sensible API, since the data types don't change during the lifetime of a CdbHash. Make cdbhash() more convenient for callers, by handling NULLs within the function. This way the callers don't need to do the NULL check and call either cdbhash() or cdbhashnull(). This also fixes the performance issue caused by the syscache lookups reported in https://github.com/greenplum-db/gpdb/issues/5961. The type's type is now checked only once, when the CdbHash object is initialized, instead of every row. Reviewed-by: NMelanie Plageman <mplageman@pivotal.io> Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
-
- 29 10月, 2018 2 次提交
-
-
由 Heikki Linnakangas 提交于
Most callers were passing CurrentMemoryContext, so this makes most callers slightly simpler. The few places that needed to pass a different context now switch to the correct one before calling the GpPolicy*() function. Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
-
由 Pengzhou Tang 提交于
Previously, when updating a table with update triggers on its distribution column, GPDB report an error like "ERROR: UPDATE on distributed key column not allowed on relation with update triggers" because current GPDB executor don't support statement-level update triggers and will also skip row-level update triggers for a split-update is actually consist of delete and insert, so if the result relation has update triggers, GPDB reject and error out because it's not functional. There is an exception for 'ALTER TABLE SET WITH (RESHUFFLE)', RESHUFFLE also use split-update node internal to rebalance/expand table, however, from the view of users, ALTER TABLE should not hit any kind of triggers, so we don't need to error out as same as UPDATE command.
-
- 23 10月, 2018 1 次提交
-
-
由 ZhangJackey 提交于
Each table has a `numsegments` attribute in the GP_DISTRIBUTION_POLICY table, it indicates that the table's data is distributed on the first N segments, In the common case, the `numsegments` equal the total segment count of this cluster. When we add new segments into the cluster, `numsegments` no longer equal the actual segment count in the cluster, we need to reshuffle the table data to all segments in 2 steps: * Reshuffle the table data to all segments * Update `numsegments` It is easy to update `numsegments`, so we focus on how to reshuffle the table data, There are 3 type tables in the Greenplum database, they are reshuffled in different ways. For the hash distributed table, we reshuffle data based on Update statement. Updating the hash keys of the table ill generate a Plan like: Update ->Redistributed Motion ->SplitUpdate ->SeqScan We can not use this Plan to reshuffle table data directly. The problem is that we need to know the segment count when Motion node computes the destination segment. When we compute the destination segment of deleting tuple, it need the old segment count which is equal `numsegments`; n the other hand, we need to use the new segment count to compute the destination segment for inserting tuple. So we have to add a new operator Reshuffle to compute the destination segment, it records the O and N (O is the count of old segments and N is the count of new segments), then the Plan would be adjusted like: Update ->Explicit Motion ->Reshuffle ->SplitUpdate ->SeqScan It can compute the destination segments directly with O and N, at the same time we change the Motion type to Explicit, it can send a tuple to the destination segment which we computed in the Reshuffle node. With changing the hash method to the `jump hash`, not all the table data need to reshuffle, so we add an new ReshuffleExpr to filter the tuples which are need to reshuffle, this expression will compute the destination segment ahead of schedule, if the destination segment is current segment, the tuple do not need to reshuffle, with the ReshuffleExpr the plan would adjust like that: Update ->Explicit Motion ->Reshuffle ->SplitUpdate ->SeqScan |-ReshuffleExpr When we want to reshuffle one table, we use the SQL `ALTER TABLE xxx SET WITH (RESHUFFLE)`, Actually it will generate an new UpdateStmt parse tree, the parse tree is similar to the parse tree which is generated by SQL `UPDATE xxx SET xxx.aaa = COALESCE(xxx.aaa...) WHERE ReshuffleExpr`. We set an reshuffle flag in the UpdateStmt, so it can distinguish the common update and the reshuffling. In conclusion, we reshuffle hash distributed table by Reshuffle node and ReshuffleExpr, the ReshuffleExpr filter the tuple need to reshuffle and the Reshuffle node do the real reshuffling work, we can use that framework to implement reshuffle random distributed table and replicated table. For random distributed table, it has no hash keys, each old segment need reshuffle (O - N) / N data to the new segments, In the ReshuffleExpr, we can generate a random value between [0, N), if the random values is greater than O, it means that the tuple need to reshuffle, so SeqScan node can return this tuple to ReshuffleNode. Reshuffle node will generate a random value between [O, N), it means which new segment the tuple need to insert. For replicated table, the table data is same in the all old segments, so there do not need to delete any tuples, it only need copy the tuple which is in the old segments to the new segments, so the ReshuffleExpr do not filte any tuples, In the Reshuffle node, we neglect the tuple which is generated for deleting, only return the inserting tuple to motion. Let me illustrate this with an example: If there are 3 old segments in the cluster and we add 4 new segments, the segment ID of old segments is (0,1,2) and the segment ID of new segments is (3,4,5,6), when reshuffle the replicated table, the seg#0 is responsible to copy data to seg#3 and seg#6, the seg#1 is responsible to copy data to seg#4, the seg#2 is responsible to copy data to seg#5. Co-authored-by: Ning Yu nyu@pivotal.io Co-authored-by: Zhenghua Lyu zlv@pivotal.io Co-authored-by: Shujie Zhang shzhang@pivotal.io
-
- 20 10月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
One specialty in a Split Update is that the node needs the *old* values for all the distribution key columns, to compute the distribution hash for each old row, so that they can be deleted. That was previously handled at the time when the SplitUpdate node was created, by adding any missing Vars for the old values to the subplan's target list, pushing them down through joins and any other plan nodes, all the way down to the Scan node for that relation. That seemed complicated and fragile. The reason to tackle this right now is that we were seeing failures related to this, while working on the PostgreSQL 9.4 merge. It added a test case, where a Split Update was done through a security barrier view. The security barrier view added a SubqueryScan to the plan tree, and the mechanism to push through the old attributes couldn't cope with that. I'm sure we could've hacked that to make it work, but this refactoring seems like a better long term fix. This patch makes it the responsibility of preprocess_targetlist(), to ensure that the old values are made available to the top of the tree, if a Split Update is needed. preprocess_targetlist() seems like the appropriate place, because it already does that for columns that are not modified by the UPDATE. Now that we are making the decision on whether to do a split update in preprocess_targetlist() already, add a flag to PlannerInfo to remember that decision, until the point where the ModifyTable node is added to the top of the plan tree. Also add a test case, for an inherited table where some children have a different distribution key, and an UPDATE on some of the children require a Split Update, and others don't. That was causing me trouble at one point during the development, and I'm not sure if there was any existing test to cover that.
-
- 28 9月, 2018 1 次提交
-
-
由 ZhangJackey 提交于
There was an assumption in gpdb that a table's data is always distributed on all segments, however this is not always true for example when a cluster is expanded from M segments to N (N > M) all the tables are still on M segments, to workaround the problem we used to have to alter all the hash distributed tables to randomly distributed to get correct query results, at the cost of bad performance. Now we support table data to be distributed on a subset of segments. A new columne `numsegments` is added to catalog table `gp_distribution_policy` to record how many segments a table's data is distributed on. By doing so we could allow DMLs on M tables, joins between M and N tables are also supported. ```sql -- t1 and t2 are both distributed on (c1, c2), -- one on 1 segments, the other on 2 segments select localoid::regclass, attrnums, policytype, numsegments from gp_distribution_policy; localoid | attrnums | policytype | numsegments ----------+----------+------------+------------- t1 | {1,2} | p | 1 t2 | {1,2} | p | 2 (2 rows) -- t1 and t1 have exactly the same distribution policy, -- join locally explain select * from t1 a join t1 b using (c1, c2); QUERY PLAN ------------------------------------------------ Gather Motion 1:1 (slice1; segments: 1) -> Hash Join Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2 -> Seq Scan on t1 a -> Hash -> Seq Scan on t1 b Optimizer: legacy query optimizer -- t1 and t2 are both distributed on (c1, c2), -- but as they have different numsegments, -- one has to be redistributed explain select * from t1 a join t2 b using (c1, c2); QUERY PLAN ------------------------------------------------------------------ Gather Motion 1:1 (slice2; segments: 1) -> Hash Join Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2 -> Seq Scan on t1 a -> Hash -> Redistribute Motion 2:1 (slice1; segments: 2) Hash Key: b.c1, b.c2 -> Seq Scan on t2 b Optimizer: legacy query optimizer ```
-
- 27 9月, 2018 1 次提交
-
-
由 Paul Guo 提交于
As the comment said, this was useful howerver now that we have upstream add_rte_to_flat_rtable() to handle that, let's remove this call.
-
- 23 9月, 2018 1 次提交
-
-
由 Daniel Gustafsson 提交于
getgpsegmentCount() was defined in both cdbvars.h and cdbutil.h. While not needing another header include in some cases, getgpsegmentCount() is not a variable and the correct location is cdbutil.h. Remove the prototype from cdbvars.g and update includes as required. Also fix the function comment to match reality and minor tweaking of the debug elog() performed. Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io> Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
-
- 19 9月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
When building a Sort node to represent the ordering that is preserved by a Motion node, in make_motion(), the call to make_sort_from_pathkeys() would sometimes fail with "could not find pathkey item to sort". This happened when the ordering was over a UNION ALL operation. When building Motion nodes for MergeAppend subpaths, the path keys that represented the ordering referred to the items in the append rel's target list, not the subpaths. In create_merge_append_plan(), where we do a similar thing for each subpath, we correctly passed the 'relids' argument to prepare_sort_from_pathkeys(), so that prepare_sort_from_pathkeys() can match the target list entries of the append relation with the entries of the subpaths. But when creating the Motion nodes for each subpath, we were passing NULL as 'relids' (via make_sort_from_pathkeys()). At a high level, the fix is straightforward: we need to pass the correct 'relids' argument to prepare_sort_from_pathkeys(), in cdbpathtoplan_create_motion_plan(). However, the current code structure makes that not so straightforward, so this required some refactoring of the make_motion() and related functions: Previously, make_motion() and make_sorted_union_motion() would take a path key list as argument, to represent the ordering, and it called make_sort_from_pathkeys() to extract the sort columns, operators etc. After this patch, those functions take arrays of sort columns, operators, etc. directly as arguments, and the caller is expected to do the call to make_sort_from_pathkeys() to get them, or build them through some other means. In cdbpathtoplan_create_motion_plan(), call prepare_sort_from_pathkeys() directly, rather than the make_sort_from_pathkeys() wrapper, so that we can pass the 'relids' argument. Because prepare_sort_from_pathkeys() is marked as 'static', move cdbpathtoplan_create_motion_plan() from cdbpathtoplan.c to createplan.c, so that it can call it. Add test case. It's a slightly reduced version of a query that we already had in 'olap_group' test, but seems better to be explicit. Revert the change in expected output of 'olap_group', made in commit 28087f4e, which memorized the error in the expected output. Fixes https://github.com/greenplum-db/gpdb/issues/5695. Reviewed-by: NPengzhou Tang <ptang@pivotal.io> Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
-
- 18 9月, 2018 1 次提交
-
-
由 Daniel Gustafsson 提交于
The relation_close() call directly following ereport(ERROR.. will never be called as the ereport won't return. While closing and cleaning up any used resources is a good thing, they will be automatically handled by the error handler so remove. Also editorialized the error message to fit the error message style guide and fixed test fallout. Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-
- 15 9月, 2018 1 次提交
-
-
由 Daniel Gustafsson 提交于
Also do minor style fixups such as line length and capitalization to some of the affected lines.
-
- 10 9月, 2018 1 次提交
-
-
由 Lei (Alexandra) Wang 提交于
Correct cost calculation for SplitUpdate plan This partly addresses a GPDB_90_MERGE_FIXME introduced in 73801e8. As mentioned in the FIXME, this will not help generating a better plan because we have no choice other than simply adding the SplitUpdate node. Note that only the cost is adjusted, the width is still incorrect. We will not fix width for now because upstream commit 3fc6e2d7 will fix it. Co-authored-by: NShujie Zhang <shzhang@pivotal.io> Co-authored-by: NAlexandra Wang <leiwangcheme@gmail.com>
-
- 03 9月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
Be more careful to not build a Redistribute Motion on an expression that's not GPDB-hashable. Fixes github issue #4868, as well as a couple of other similar cases that were found while investigating this.
-
- 21 8月, 2018 1 次提交
-
-
由 Taylor Vesely 提交于
When the query_planner determines that a relation does not to need scanning due to constraint exclusion, it will create a 'dummy' plan for that operation. When we plan a split update, it does not understand this 'dummy' plan shape, and will fail with an assertion. Instead, because an excluded relation will never return tuples, do not attempt to create a split update at all.
-
- 03 8月, 2018 1 次提交
-
-
由 Karen Huddleston 提交于
This reverts commit 4750e1b6.
-
- 02 8月, 2018 1 次提交
-
-
由 Richard Guo 提交于
This is the final batch of commits from PostgreSQL 9.2 development, up to the point where the REL9_2_STABLE branch was created, and 9.3 development started on the PostgreSQL master branch. Notable upstream changes: * Index-only scan was included in the batch of upstream commits. It allows queries to retrieve data only from indexes, avoiding heap access. * Group commit was added to work effectively under heavy load. Previously, batching of commits became ineffective as the write workload increased, because of internal lock contention. * A new fast-path lock mechanism was added to reduce the overhead of taking and releasing certain types of locks which are taken and released very frequently but rarely conflict. * The new "parameterized path" mechanism was added. It allows inner index scans to use values from relations that are more than one join level up from the scan. This can greatly improve performance in situations where semantic restrictions (such as outer joins) limit the allowed join orderings. * SP-GiST (Space-Partitioned GiST) index access method was added to support unbalanced partitioned search structures. For suitable problems, SP-GiST can be faster than GiST in both index build time and search time. * Checkpoints now are performed by a dedicated background process. Formerly the background writer did both dirty-page writing and checkpointing. Separating this into two processes allows each goal to be accomplished more predictably. * Custom plan was supported for specific parameter values even when using prepared statements. * API for FDW was improved to provide multiple access "paths" for their tables, allowing more flexibility in join planning. * Security_barrier option was added for views to prevents optimizations that might allow view-protected data to be exposed to users. * Range data type was added to store a lower and upper bound belonging to its base data type. * CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The SELECT query is planned during the execution of the utility. To conform to this change, GPDB executes the utility statement only on QD and dispatches the plan of the SELECT query to QEs. Co-authored-by: NAdam Lee <ali@pivotal.io> Co-authored-by: NAlexandra Wang <lewang@pivotal.io> Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io> Co-authored-by: NAsim R P <apraveen@pivotal.io> Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io> Co-authored-by: NGang Xiong <gxiong@pivotal.io> Co-authored-by: NHaozhou Wang <hawang@pivotal.io> Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io> Co-authored-by: NJesse Zhang <sbjesse@gmail.com> Co-authored-by: NJinbao Chen <jinchen@pivotal.io> Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io> Co-authored-by: NMelanie Plageman <mplageman@pivotal.io> Co-authored-by: NPaul Guo <paulguo@gmail.com> Co-authored-by: NRichard Guo <guofenglinux@gmail.com> Co-authored-by: NShujie Zhang <shzhang@pivotal.io> Co-authored-by: NTaylor Vesely <tvesely@pivotal.io> Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
-
- 23 7月, 2018 1 次提交
-
-
由 Zhenghua Lyu 提交于
Before, we cannot update distribution column in legacy planner, because the OLD tuple and NEW tuple maybe belong to different segments. We enable this by borrowing ORCA's logic, namely, split each update operation into delete and insert. The delete operation is hashed by OLD tuple attributes, and insert operation is hashed by NEW tuple attributes. This change includes following items: * We need push missed OLD attributes to sub plan tree so that that attribute could be passed to top Motion. * In addition, if the result relation has oids, we also need to put oid in the targetlist. * If result relation is partitioned, we should special treat it because resultRelations is partition tables instead of root table, but that is true for normal Insert. * Special treats for update triggers, because trigger cannot be executed across segments. * Special treatment in nodeModifyTable, so that it can process Insert/Delete for update purpose. * Proper initialization of SplitUpdate. There are still TODOs: * We don't handle cost gracefully, because we add SplitUpdate node after plan generated. Already added a FIXME for this * For deletion, we could optimize in just sending distribution columns instead of all columns Author: Xiaoran Wang <xiwang@pivotal.io> Author: Max Yang <myang@pivotal.io> Author: Shujie Zhang <shzhang@pivotal.io> Author: Zhenghua Lyu <zlv@pivotal.io>
-
- 11 7月, 2018 1 次提交
-
-
由 Pengzhou Tang 提交于
To keep it consistent with the "Create table" syntax, CTAS should also disallow duplicate distributed keys, otherwise backup and restore will mess up.
-
- 29 5月, 2018 1 次提交
-
-
由 Ning Yu 提交于
* rpt: reorganize data when ALTER from/to replicated. There was a bug that altering from/to a replicated table has no effect, the root cause is that we did not change gp_distribution_policy neither reorganize the data. Now we perform the data reorganization by creating a temp table with the new dist policy and transfering all the data to it. * rpt: support RETURNING for replicated tables. This is to support below syntax (suppose foo is a replicated table): INSERT INTO foo VALUES(1) RETURNING *; UPDATE foo SET c2=c2+1 RETURNING *; DELETE * FROM foo RETURNING *; A new motion type EXPLICIT GATHER MOTION is introduced in EXPLAIN output, data will be received from one explicit sender in this motion type. * rpt: fix motion type under explicit gather motion. Consider below query: INSERT INTO foo SELECT f1+10, f2, f3+99 FROM foo RETURNING *, f1+112 IN (SELECT q1 FROM int8_tbl) AS subplan; We used to generate a plan like this: Explicit Gather Motion 3:1 (slice2; segments: 3) -> Insert -> Seq Scan on foo SubPlan 1 (slice2; segments: 3) -> Gather Motion 3:1 (slice1; segments: 1) -> Seq Scan on int8_tbl A gather motion is used for the subplan, which is wrong and will cause a runtime error. A correct plan is like below: Explicit Gather Motion 3:1 (slice2; segments: 3) -> Insert -> Seq Scan on foo SubPlan 1 (slice2; segments: 3) -> Materialize -> Broadcast Motion 3:3 (slice1; segments: 3) -> Seq Scan on int8_tbl * rpt: add test case for with both PRIMARY and UNIQUE. On a replicated table we could set both PRIMARY KEY and UNIQUE constraints, test cases are added to ensure this feature during future development. (cherry picked from commit 72af4af8)
-