- 27 11月, 2018 9 次提交
-
-
由 Heikki Linnakangas 提交于
In PostgreSQL, a PathKey represents sort ordering, but we have been using it in GPDB to also represent the distribution keys of hash-distributed data in the planner. (i.e. the keys in DISTRIBUTED BY of a table, but also when data is redistributed by some other key on the fly). That's been convenient, and there's some precedent for that, since PostgreSQL also uses PathKey to represent GROUP BY columns, which is quite similar to DISTRIBUTED BY. However, there are some differences. The opfamily, strategy and nulls_first fields in PathKey are not applicable to distribution keys. Using the same struct to represent ordering and hash distribution is sometimes convenient, for example when we need to test whether the sort order or grouping is "compatible" with the distribution. But at other times, it's confusing. To clarify that, introduce a new DistributionKey struct, to represent a hashed distribution. While we're at it, simplify the representation of HashedOJ locus types, by including a List of EquivalenceClasses in DistributionKey, rather than just one EC like a PathKey has. CdbPathLocus now has only one 'distkey' list that is used for both Hashed and HashedOJ locus, and it's a list of DistributionKeys. Each DistributionKey in turn can contain multiple EquivalenceClasses. Looking ahead, I'm working on a patch to generalize the "cdbhash" mechanism, so that we'd use the normal Postgres hash opclasses for distribution keys, instead of hard-coding support for specific datatypes. With that, the hash operator class or family will be an important part of the distribution key, in addition to the datatype. The plan is to store that also in DistributionKey. Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
-
由 xiong-gang 提交于
EvalPlanQual materializes the slot to a heap tuple, PRIVATE_tts_values point to freed memory. We need to reset PRIVATE_tts_nvalid in ExecMaterializeSlot, to prevent the following ExecFilterJunk from referencing the dangling pointer.
-
由 Zhenghua Lyu 提交于
Previously the reshuffle node's numsegments is always set to the cluster size. Now we have flexible gang & dispath API, we should correct the numsegments field of reshuffle node to set it as the its lefttree's flow->numsegments. Co-authored-by: NShujie Zhang <shzhang@pivotal.io> Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
-
由 Zhenghua Lyu 提交于
When we expand a partial replicated table via `alter table t expand table`, internally we use the split-update framework to implement the expansion. That framework is designed for hash-distribtued tables at first. For replicated table, we do not need the reshuffle_expr(filter condition) at all because we need to transfer all data in a replicated table.
-
由 Kalen Krempely 提交于
This allows the data to be visible on the segments. The segements should not interpret any transaction id from master during or after upgrade. Co-authored-by: NAsim R P <apraveen@pivotal.io> Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>
-
由 Kalen Krempely 提交于
Without this commit auxiliary tables such as toast and aoseg are skipped during vacuum when run in utility mode (such as during pg_upgrade). Co-authored-by: NAsim R P <apraveen@pivotal.io> Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>
-
由 Ashwin Agrawal 提交于
-
由 Ivan Leskin 提交于
Add a unit test (and its infrastructure) for 'zstd_compress()'. The test checks whether 'zstd_compress()' returns correct output in case compression fails (compressed data is larger than uncompressed). To do that, 'ZSTD_compressCCtx()' is mocked to always return 'ZSTD_error_dstSize_tooSmall'. Also, an 'ifndef' is added around 'PG_MODULE_MAGIC' in zstd_compression.c
-
由 Ivan Leskin 提交于
When ZSTD compression is used for AO CO tables, insertion of data may cause an error "Destination buffer is too small". This happens when compressed data is larger than uncompressed input data. This commit adds handling of this situation: do not change output buffer and return size used equal to source size. The caller (e.g., 'cdbappendonlystoragewrite.c') is able to handle such output; in this case, it copies data from input to output itself.
-
- 26 11月, 2018 4 次提交
-
-
由 Ning Yu 提交于
In CREATE TABLE we used to decide numsegments from LIKE, INHERITS and DISTRIBUTED BY clauses. However we do not want partially distributed tables to be created by end users, so change the logic to always create tables with DEFAULT as numsegments. We still allow developers to hack the DEFAULT numsegments with the gp_debug_numsegments extension.
-
由 Daniel Gustafsson 提交于
Commit 226e8867 removed oidcasted_pk and max_content from the SQL query, but didn't remove the arguments. While they don't cause an issue as they will be unused, remove to avoid confusing readers. Reviewed-by: Heikki Linnakangas
-
- 25 11月, 2018 3 次提交
-
-
由 Daniel Gustafsson 提交于
Commit 17f9b7f070dbe17b2844a8b4dd428 in the pgweb repository removed the /static/ portion on all doc URLs, leaving a redirect in place. To avoid incurring a needless redirect, remove the /static/ part in the links to the PostgreSQL documentation. The /static/ URLs stem from a time when there were interactive docs that had comment functionality. These docs were removed a very long time ago, but the static differentiator was left in place until now. Reviewed-by: Mel Kiyama
-
由 Heikki Linnakangas 提交于
With OpenSSL 1.1.0 and above, CRYPTO_set_id_callback and CRYPTO_set_locking_callback are no-op macros, which rendered id_function() and locking_function() unused. That produced compiler warnings. Reviewed-by: NPaul Guo <pguo@pivotal.io> Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
-
由 Heikki Linnakangas 提交于
I was getting these compiler warnings: src/s3log.cpp: In function ‘void _LogMessage(const char*, __va_list_tag*)’: src/s3log.cpp:17:42: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] vsnprintf(buf, sizeof(buf), fmt, args); ^ src/s3log.cpp: In function ‘void _send_to_remote(const char*, __va_list_tag*)’: src/s3log.cpp:27:55: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] size_t len = vsnprintf(buf, sizeof(buf), fmt, args); ^ src/s3log.cpp: In function ‘void LogMessage(LOGLEVEL, const char*, ...)’: src/s3log.cpp:41:39: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] vfprintf(stderr, fmt, args); ^ Those are good suggestions. I couldn't figure out the correct way to mark the _LogMessage() and _send_to_remote() local functions, so I decided to inline them into the caller, LogMessage(), instead. They were almost one-liners, and LogMessage() is still very small, too, so I don't think there's any significant loss to readability. A few format strings in debugging messages were treating pthread_self() as a pointer, while others were treating it as a wrong kind of integer. Harmonize by casting it to "uint64_t", and using PRIX64 as the format string. This isn't totally portable: pthread_t can be an arithmetic type, or a struct, and casting a struct to unsigned int won't work. In principle, that was a problem before this patch already, but now you should get a compiler error, if you try to compile on a platform where pthread_t is not an arithmetic type. I think that's better than silent type confusion. Reviewed-by: NPaul Guo <pguo@pivotal.io> Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
-
- 24 11月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
Old error: ERROR: cannot use expression as distribution key, because it is not hashable (cdbmutate.c:1329) The new error is the same you get with CREATE TABLE. While we're at it, also change "can't" contraction to "cannot" in the error message, to follow the PostgreSQL error message guidelines.
-
- 23 11月, 2018 9 次提交
-
-
由 BaiShaoqi 提交于
Here revert four commits 1. Remove Master/Standby SyncRepWait Greenplum hack: 7f6066ea 2. Add alter system synchronous_standby_names to * when gpinitstandby -n 1136f2fb 3. Hot fix in gpinitstandby behave test failure: b6c77b2f 4. Remove unused variables, to silence compiler warnings 88a185a5 The first commit should revert, because gpinitstandby did not change correctly comform to the this commit. It will cause If the standby is down, and the synchronous_standby_names is *, the cluster will not start and hang on The second, third, fourth should revert, because it is relavent to the first commit.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
-
由 Shaoqi Bai 提交于
Co-authored-by: NNing Yu <nyu@pivotal.io>
-
由 Pengzhou Tang 提交于
Previously, when creating join path between CdbLocusType_SingleQE path and CdbLocusType_SegmentGeneral path, we always add a motion on top of CdbLocusType_SegmentGeneral path so even the join path is promoted to executed on QD, the CdbLocusType_SegmentGeneral path can still be executed to segments. join (CdbLocusType_SingleQE) / \ / \ CdbLocusType_SingleQE Gather Motion \ CdbLocusType_SegmentGeneral For example: (select * from partitioned_table limit 1) as t1 Nested Loop -> Gather Motion 1:1 -> Seq Scan on replicated_table -> Materialize -> Subquery Scan on t1 -> Limit -> Gather Motion 3:1 -> Limit -> Seq Scan on partitioned_table replicated_table only store tuples on segments, so without the gather motion, seq scan of replicated_table doesn't provide tuples. There is another problem, if join path is not promoted to QD, the gather motion might be redundant, For example: (select * from replicated_table, (select * from partitioned_table limit 1) t1) sub1; Gather Motion 3:1 -> Nested Loop -> Seq Scan on partitioned_table_2 -> Materialize -> Broadcast Motion 1:3 -> Nested Loop -> Gather Motion 1:1 (redundant motion) -> Seq Scan on replicated_table -> Materialize -> Subquery Scan on t1 -> Limit -> Gather Motion 3:1 -> Limit -> Seq Scan on partitioned_table So in apply_motion_mutator(), we omit such redundant motion if it's not gathered to top slice (QD). sliceDepth == 0 means it is top slice, however, sliceDepth now is shared by both init plans and main plan, so if main plan increased the sliceDepth, init plan may omit the gather motion unexpectedly which create a wrong results. The fix is simple to reset sliceDepth for init plans
-
由 Pengzhou Tang 提交于
-
由 Pengzhou Tang 提交于
Implement "ALTER TABLE table EXPAND TABLE" to expand tables. "Expanding" and "Set Distributed by" are actually two different kind of operations on tables, old gpexpand used to use "Set Distributed by" to expand tables for historical reasons and our early version of expand were also squashed into "Set Distributed by", this make code hard to hard to understand and concept confused. This commit divide "Expanding" and "Set Distributed by" totally and implement "Expanding" with new syntax. We have two method to implement data movement, one is CTAS, another is RESHUFFLE, depend on how much data need to move. If tuples to move is less than 10000, choose RESHUFFLE, or if scale to move is less than 30% choose RESHUFFLE, otherwise, choose CTAS For partition table, we disallow expand leaf partition seperately because root partition cannot has different numsegments with leaf partitions, SELECT/UPDATE should be fine if numsegments is inconsistent, however, INSERT will make trouble that data are inserted to unexpected place. The new syntax is supposed to only used by gpexpand and not be exposed to normal users, so no need to update document.
-
由 Ning Yu 提交于
When Append node contains SingleQE subpath we used to put Append on ALL the segments, however if the SingleQE is partially distributed then apparently we could not put the SingleQE on ALL the segments, this conflict could results in runtime or incorrect results. To fix this we should put Append on SingleQE's segments. In the other hand when there are multiple SingleQE subpaths we should put Append on the common segments of SingleQEs. Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-
由 Ning Yu 提交于
There are 3 reshuffle tests, the ao one, the co one, and the heap one. They share almost the same cases, but different on table names and create table options. There are also some differences caused when adding regression tests, they are only added in one file but not others. We want to keep minimal differences between these tests, so we ensure that a regression test for ao also covers similar case for heap. And once we understand one of the test file we have almost the same knowledge on the others. Here is a list of changes to these tests: - reduce differences on table names by using schema; - reduce differences on CREATE TABLE options by setting default storage options; - simplify the creation of partially distributed tables by using the gp_debug_numsegments extension; - copy some regression tests to all the tests; - retire the no longer used helper function; - move the tests into an existing parallel test group; pg_regress test framework provides some @@ tokens for ao/co tests, however we still can not merge the ao and co tests into one file as WITH (OIDS) is only supported by ao but not co.
-
- 22 11月, 2018 13 次提交
-
-
由 Heikki Linnakangas 提交于
When determining the locus for a LEFT or RIGHT JOIN, we can use the outer side's distribution key as is. The EquivalenceClasses from the nullable side are not of interest above the join, and the outer side's distribution key can lead to better plans, because it can be made a Hashed locus, rather than HashedOJ. A Hashed locus can be used for grouping, for example, unlike a HashedOJ. This buys back better plans for some INSERT and CTAS queries, that started to need Redistribute Motions after the previous commit. Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
-
由 Heikki Linnakangas 提交于
There was some confusion on how NULLs are distributed, when CdbPathLocus is of Hashed or HashedOJ type. The comment in cdbpathlocus.h suggested that NULLs can be on any segment. But the rest of the code assumed that that's true only for HashedOJ, and that for Hashed, all NULLs are stored on a particular segment. There was a comment in cdbgroup.c that said "Or would HashedOJ ok, too?"; the answer to that is "No!". Given the comment in cdbpathlocus.h, I'm not suprised that the author was not very sure about that. Clarify the comments in cdbpathlocus.h and cdbgroup.c on that. There were a few cases where we got that actively wrong. repartitionPlan() function is used to inject a Redistribute Motion into queries used for CREATE TABLE AS and INSERT, if the "current" locus didn't match the target table's policy. It did not check for HashedOJ. Because of that, if the query contained FULL JOINs, NULL values might end up on all segments. Code elsewhere, particularly in cdbgroup.c, assumes that all NULLs in a table are stored on a single segment, identified by the cdbhash value of a NULL datum. Fix that, by adding a check for HashedOJ in repartitionPlan(), and forcing a Redistribute Motion. CREATE TABLE AS had a similar problem, in the code to decide which distribution key to use, if the user didn't specify DISTRIBUTED BY explicitly. The default behaviour is to choose a distribution key that matches the distribution of the query, so that we can avoid adding an extra Redistribute Motion. After fixing repartitionPlan, there was no correctness problem, but if we chose the key based on a HashedOJ locus, there is no performance benefit because we'd need a Redistribute Motion anyway. So modify the code that chooses the CTAS distribution key to ignore HashedOJ. While we're at it, refactor the code to choose the CTAS distribution key, by moving it to a separate function. It had become ridiculously deeply indented. Fixes https://github.com/greenplum-db/gpdb/issues/6154, and adds tests. Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
-
由 Heikki Linnakangas 提交于
The case where some, but not all, of the query's distribution keys were present in the result set, was not covered by any existing tests. Per Paul Guo's observation.
-
由 Heikki Linnakangas 提交于
Fix indentation. In the code to generate a NOTICE, remove if() for condition that we had checked earlier in the function already, and use a StringInfo for building the string.
-
由 Heikki Linnakangas 提交于
These were left behind by commit 7f6066ea.
-
由 Heikki Linnakangas 提交于
It returns a simple list of PathKeys, not a list of lists. The code was changed in the 8.3-era merge of equivalence classes already, but we neglected the comment.
-
由 Zhenghua Lyu 提交于
This commit is the first step to refactor ATExecSetDistributedBy. Its main purpose is to remove some dead code in this function and during the process we find some helper functions can also be simplified so the simplification is also in this commit. According to MPP-7770, we should disable changing storage options for now. It is ugly to just throw an error when encounter `appendonly` option but without removing the code. In this commit remove all related logic. Because of with clause can only contain reshuffle|reorganize, we only new_rel_opts if the table itself is ao|aoco. No need to deduce it from with clause. We also remove the unnecessary checks at the start of this function. Because These checks have been already done in the function `ATPrepCmd`. Co-authored-by: Shuejie Zhang <shzhang@pivotal.io >
-
由 Shaoqi Bai 提交于
After applying that commit 22e04dc12df9e0577ba93a75dbef160c8c1ed258, the master will block when the standby master is down. There are a couple things that need to be done to unblock the master. 1. Run gpinitstandby -n to start the standby master back up. 2. Run psql postgres -c "ALTER SYSTEM SET synchronous_standby_names = '';" and reload the master segment. Note that the ALTER SYSTEM SET has to be called again to set synchronous_standby_names back to '*' (and master config reloaded) to enable synchronous replication again. Thoughts are to make it 1 step combined into gpinitstandby -n instead of documenting a multi-step process. What this commit is just to make it 1 step combined into gpinitstandby -n. Co-authored-by: NNing Yu <nyu@pivotal.io>
-
由 Jimmy Yih 提交于
When the standby master is unavailable, the master will not block on commits even though we enable synchronous replication. This is because we have a Greenplum hack which checks if the WAL stream with the standby master is valid. If the stream is invalid, the master will quickly skip the SyncRepWait and continue on its commit. Remove this hack in order to make Master/Standby and Primary/Mirror WAL replication more similar. Co-authored-by: NShaoqi Bai <sbai@pivotal.io>
-
由 Ning Yu 提交于
Suppose t1 has numsegments=1 and t2 has numsegments=2, then below query will have incorrect plan: explain (costs off) select * from t2 a join t2 b using(c2) union all select * from t1 c join t1 d using(c2); QUERY PLAN ------------------------------------------------------------------------ Gather Motion 1:1 (slice3; segments: 1) -> Append -> Hash Join Hash Cond: (a.c2 = b.c2) -> Redistribute Motion 2:2 (slice1; segments: 2) Hash Key: a.c2 -> Seq Scan on t2 a -> Hash -> Redistribute Motion 2:2 (slice2; segments: 2) Hash Key: b.c2 -> Seq Scan on t2 b -> Hash Join Hash Cond: (c.c2 = d.c2) -> Seq Scan on t1 c -> Hash -> Seq Scan on t1 d Optimizer: legacy query optimizer (17 rows) slice2 has a 2:2 redistribute motion to slice3, however slice3 only has 1 segment, this is due to Append's numsegments is decided from the last subpath. To fix the issue we should use max numsegments of subpaths for Append. The issue was already fixed in 39856768, we are only adding tests for it now.
-
由 Ning Yu 提交于
Introduced a new debugging extension gp_debug_numsegments to get / set the default numsegments when creating tables. gp_debug_get_create_table_default_numsegments() gets the default numsegments. gp_debug_set_create_table_default_numsegments(text) sets the default numsegments in text format, valid values are: - 'FULL': all the segments; - 'RANDOM': pick a random set of segments each time; - 'MINIMAL': the minimal set of segments; gp_debug_set_create_table_default_numsegments(integer) sets the default numsegments directly, valid range is [1, gp_num_contents_in_cluster]. gp_debug_reset_create_table_default_numsegments(text) or gp_debug_reset_create_table_default_numsegments(integer) reset the default numsegments to the specified value, and the value can be reused later. gp_debug_reset_create_table_default_numsegments() resets the default numsegments to the value passed last time, if there is no previous call to it the value is 'FULL'. Refactored ICG test partial_table.sql to create partial tables with this extension.
-
由 Daniel Gustafsson 提交于
rel_partitioning_is_uniform() and addMCVToHashTable() inserted with HASH_ENTER, and subsequently checked the returnvalue for NULL in order to error out on "out of memory". HASH_ENTER however doesn't return if it couldn't insert and will error out itself so remove the test as it cannot happen. groupHashNew() was using HASH_ENTER_NULL which does return NULL in out of memory situations, but it failed to correctly handle the returnvalue and dereferenced without check risking a null pointer deref under memory pressure. Fix by using HASH_ENTER instead as the code clearly expect that behavior. Reviewed-by: NPaul Guo <paulguo@gmail.com>
-
由 Heikki Linnakangas 提交于
Commit cc2e211f attempted to silence the assertion in vac_update_relstats(), but the assertion it in turn added, was hit heavily. vacuum_appendonly_fill_stats() function, where I added the check for zero pages and non-zero tuples combination, is also reached in QD mode, contrary to the comments and the assertion that I added. I'm not sure why we look at the totals in QD mode - AFAICS we just throw away them away - but I'm reluctant to start restructuring this code right now. So move the code to zap reltuples to 0, into vac_update_relstats(). Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
-
- 21 11月, 2018 1 次提交
-
-
由 Daniel Gustafsson 提交于
In the unlikely event that we reach this codepath with a samplerows value of zero (which albeit unlikely could happen), avoid performing a division by zero and instead set the null fraction to zero as we clearly don't have any more information to go on. The HLL code calls calls the compute_stats function pointer with zero samplerows, and while that's using a different compute_stats function, it's an easy mistake to make when not all functions can handle a division by zero. This is defensive programming prompted by a report that triggered an old bug like this without actually hitting this, but there is little reason to take the risk of a crash. Suspenders go well with belts. Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-