- 22 11月, 2018 12 次提交
-
-
由 Heikki Linnakangas 提交于
There was some confusion on how NULLs are distributed, when CdbPathLocus is of Hashed or HashedOJ type. The comment in cdbpathlocus.h suggested that NULLs can be on any segment. But the rest of the code assumed that that's true only for HashedOJ, and that for Hashed, all NULLs are stored on a particular segment. There was a comment in cdbgroup.c that said "Or would HashedOJ ok, too?"; the answer to that is "No!". Given the comment in cdbpathlocus.h, I'm not suprised that the author was not very sure about that. Clarify the comments in cdbpathlocus.h and cdbgroup.c on that. There were a few cases where we got that actively wrong. repartitionPlan() function is used to inject a Redistribute Motion into queries used for CREATE TABLE AS and INSERT, if the "current" locus didn't match the target table's policy. It did not check for HashedOJ. Because of that, if the query contained FULL JOINs, NULL values might end up on all segments. Code elsewhere, particularly in cdbgroup.c, assumes that all NULLs in a table are stored on a single segment, identified by the cdbhash value of a NULL datum. Fix that, by adding a check for HashedOJ in repartitionPlan(), and forcing a Redistribute Motion. CREATE TABLE AS had a similar problem, in the code to decide which distribution key to use, if the user didn't specify DISTRIBUTED BY explicitly. The default behaviour is to choose a distribution key that matches the distribution of the query, so that we can avoid adding an extra Redistribute Motion. After fixing repartitionPlan, there was no correctness problem, but if we chose the key based on a HashedOJ locus, there is no performance benefit because we'd need a Redistribute Motion anyway. So modify the code that chooses the CTAS distribution key to ignore HashedOJ. While we're at it, refactor the code to choose the CTAS distribution key, by moving it to a separate function. It had become ridiculously deeply indented. Fixes https://github.com/greenplum-db/gpdb/issues/6154, and adds tests. Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
-
由 Heikki Linnakangas 提交于
The case where some, but not all, of the query's distribution keys were present in the result set, was not covered by any existing tests. Per Paul Guo's observation.
-
由 Heikki Linnakangas 提交于
Fix indentation. In the code to generate a NOTICE, remove if() for condition that we had checked earlier in the function already, and use a StringInfo for building the string.
-
由 Heikki Linnakangas 提交于
These were left behind by commit 7f6066ea.
-
由 Heikki Linnakangas 提交于
It returns a simple list of PathKeys, not a list of lists. The code was changed in the 8.3-era merge of equivalence classes already, but we neglected the comment.
-
由 Zhenghua Lyu 提交于
This commit is the first step to refactor ATExecSetDistributedBy. Its main purpose is to remove some dead code in this function and during the process we find some helper functions can also be simplified so the simplification is also in this commit. According to MPP-7770, we should disable changing storage options for now. It is ugly to just throw an error when encounter `appendonly` option but without removing the code. In this commit remove all related logic. Because of with clause can only contain reshuffle|reorganize, we only new_rel_opts if the table itself is ao|aoco. No need to deduce it from with clause. We also remove the unnecessary checks at the start of this function. Because These checks have been already done in the function `ATPrepCmd`. Co-authored-by: Shuejie Zhang <shzhang@pivotal.io >
-
由 Shaoqi Bai 提交于
After applying that commit 22e04dc12df9e0577ba93a75dbef160c8c1ed258, the master will block when the standby master is down. There are a couple things that need to be done to unblock the master. 1. Run gpinitstandby -n to start the standby master back up. 2. Run psql postgres -c "ALTER SYSTEM SET synchronous_standby_names = '';" and reload the master segment. Note that the ALTER SYSTEM SET has to be called again to set synchronous_standby_names back to '*' (and master config reloaded) to enable synchronous replication again. Thoughts are to make it 1 step combined into gpinitstandby -n instead of documenting a multi-step process. What this commit is just to make it 1 step combined into gpinitstandby -n. Co-authored-by: NNing Yu <nyu@pivotal.io>
-
由 Jimmy Yih 提交于
When the standby master is unavailable, the master will not block on commits even though we enable synchronous replication. This is because we have a Greenplum hack which checks if the WAL stream with the standby master is valid. If the stream is invalid, the master will quickly skip the SyncRepWait and continue on its commit. Remove this hack in order to make Master/Standby and Primary/Mirror WAL replication more similar. Co-authored-by: NShaoqi Bai <sbai@pivotal.io>
-
由 Ning Yu 提交于
Suppose t1 has numsegments=1 and t2 has numsegments=2, then below query will have incorrect plan: explain (costs off) select * from t2 a join t2 b using(c2) union all select * from t1 c join t1 d using(c2); QUERY PLAN ------------------------------------------------------------------------ Gather Motion 1:1 (slice3; segments: 1) -> Append -> Hash Join Hash Cond: (a.c2 = b.c2) -> Redistribute Motion 2:2 (slice1; segments: 2) Hash Key: a.c2 -> Seq Scan on t2 a -> Hash -> Redistribute Motion 2:2 (slice2; segments: 2) Hash Key: b.c2 -> Seq Scan on t2 b -> Hash Join Hash Cond: (c.c2 = d.c2) -> Seq Scan on t1 c -> Hash -> Seq Scan on t1 d Optimizer: legacy query optimizer (17 rows) slice2 has a 2:2 redistribute motion to slice3, however slice3 only has 1 segment, this is due to Append's numsegments is decided from the last subpath. To fix the issue we should use max numsegments of subpaths for Append. The issue was already fixed in 39856768, we are only adding tests for it now.
-
由 Ning Yu 提交于
Introduced a new debugging extension gp_debug_numsegments to get / set the default numsegments when creating tables. gp_debug_get_create_table_default_numsegments() gets the default numsegments. gp_debug_set_create_table_default_numsegments(text) sets the default numsegments in text format, valid values are: - 'FULL': all the segments; - 'RANDOM': pick a random set of segments each time; - 'MINIMAL': the minimal set of segments; gp_debug_set_create_table_default_numsegments(integer) sets the default numsegments directly, valid range is [1, gp_num_contents_in_cluster]. gp_debug_reset_create_table_default_numsegments(text) or gp_debug_reset_create_table_default_numsegments(integer) reset the default numsegments to the specified value, and the value can be reused later. gp_debug_reset_create_table_default_numsegments() resets the default numsegments to the value passed last time, if there is no previous call to it the value is 'FULL'. Refactored ICG test partial_table.sql to create partial tables with this extension.
-
由 Daniel Gustafsson 提交于
rel_partitioning_is_uniform() and addMCVToHashTable() inserted with HASH_ENTER, and subsequently checked the returnvalue for NULL in order to error out on "out of memory". HASH_ENTER however doesn't return if it couldn't insert and will error out itself so remove the test as it cannot happen. groupHashNew() was using HASH_ENTER_NULL which does return NULL in out of memory situations, but it failed to correctly handle the returnvalue and dereferenced without check risking a null pointer deref under memory pressure. Fix by using HASH_ENTER instead as the code clearly expect that behavior. Reviewed-by: NPaul Guo <paulguo@gmail.com>
-
由 Heikki Linnakangas 提交于
Commit cc2e211f attempted to silence the assertion in vac_update_relstats(), but the assertion it in turn added, was hit heavily. vacuum_appendonly_fill_stats() function, where I added the check for zero pages and non-zero tuples combination, is also reached in QD mode, contrary to the comments and the assertion that I added. I'm not sure why we look at the totals in QD mode - AFAICS we just throw away them away - but I'm reluctant to start restructuring this code right now. So move the code to zap reltuples to 0, into vac_update_relstats(). Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
-
- 21 11月, 2018 11 次提交
-
-
由 Daniel Gustafsson 提交于
In the unlikely event that we reach this codepath with a samplerows value of zero (which albeit unlikely could happen), avoid performing a division by zero and instead set the null fraction to zero as we clearly don't have any more information to go on. The HLL code calls calls the compute_stats function pointer with zero samplerows, and while that's using a different compute_stats function, it's an easy mistake to make when not all functions can handle a division by zero. This is defensive programming prompted by a report that triggered an old bug like this without actually hitting this, but there is little reason to take the risk of a crash. Suspenders go well with belts. Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-
由 Heikki Linnakangas 提交于
There's an assertion in vac_update_relstats(), that if num_tuples is non-zero, num_pages must also be non-zero. Makes sense: if the table takes up no space, there can't be any tuples in it. But we hit that case, on vacuum of an AO table in the QD node in utility mode. The QD has the total tuple counts in each AO segment in the pg_aoseg table, across the whole cluster, but it doesn't have the physical sizes. So it reported N tuples, but 0 bytes. We started hitting that assertion after commit c0ce2eb9, which fixed the rounding in RelationGuessNumberOfBlocks(), so that it now returns 0 blocks, for 0-byte relation. It used to return 1, which masked the problem. Fix by reporting 0 tuples and 0 blocks in the QD. That's a change from the old behaviour, which was to report N tuples and 1 block, but it's consistent with heap tables. Reviewed-by: NJacob Champion <pchampion@pivotal.io> Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
-
由 Mel Kiyama 提交于
-
由 Chuck Litzell 提交于
* Update DITA docs with Postgres 8.4 merge, interation 3 changes to SGML files. (PR #3780) * Update DITA docs with Postgres 8.4 merge, interation 3 changes to SGML files. (PR #3780) * Fix example syntax * Add missing <p> codes and move into sectiondiv
-
由 David Yozie 提交于
-
由 Mel Kiyama 提交于
* docs - using dblink as a non-superuser -update installing dblink on 6.0 - uses create extension -clarify some dblink_connect() information -add using dblink_connect_u() as a non-superuser * docs - using dblink as a non-superuser - review comment updates * docs - using dblink as a non-superuser - more review comment updates
-
由 Mel Kiyama 提交于
* docs - gpcopy - add option --dest-dbname Also, add list of database names must be enclosed in quotes. * docs - gpcopy --dest-dbname review comment updates. Also, changed list of multiple db names to specify no spaces between names. * docs - gpcopy --dest-dbname - fix typo
-
由 Ashwin Agrawal 提交于
Melanie suggested usage of `declare -F <function_name> > /dev/null && <function_name>` in pg_rewind test. -F will return only the name of the function and then the error code will be non-zero if it is not defined and then by and-ing it with the function definition alone, that function will only run in run_test if it is defined. So, this helps to avoid coding empty functions if the test doesn't wish to have any implementation for the same. Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
-
由 Ashwin Agrawal 提交于
Copy tail end of AO file from source to target (with some exceptions) if xlog record for the same is found on target after divergence. COPY_TAIL is used as ao inserts are not fixed size hence best we can do is copy till end of file, starting from modification offset. Truncate record for ao is ignored similar to heap for same reasons. Test is added for AO and CO tables to validate COPY and COPY_TAIL is performed correctly. Co-authored-by: NMelanie Plageman <mplageman@pivotal.io> Co-authored-by: NEkta Khanna <ekhanna@pivotal.io>
-
由 Ashwin Agrawal 提交于
This XLOG_BTREE_NEWROOT xlog record type doesn't log the update to the metapage. Hence pg_rewind misses to rewind change to meta page. The redo routine handling the XLOG_BTREE_NEWROOT implicitly makes update to meta-page which is always hard-coded to block zero. Co-authored-by: NDavid Kimura <dkimura@pivotal.io> Co-authored-by: NEkta Khanna <ekhanna@pivotal.io>
-
由 Ashwin Agrawal 提交于
Enhance the test framework to verify by promoting the old master after pg_rewind and actually running queries against it. Previously, queries were run only against standby which didn't undergo pg_rewind. So, basically tests missed validating if pg_rewind did its job correctly or not. Co-authored-by: NEkta Khanna <ekhanna@pivotal.io>
-
- 20 11月, 2018 10 次提交
-
-
由 Daniel Gustafsson 提交于
There is no restriction in the SQL specification regarding duplicate column aliases in a CTE. Remove the specific check and align us with how queries work in upstream. Also fix test fallout to match the new behavior with slightly tweaked queries to keep output volume down.
-
由 Daniel Gustafsson 提交于
Scribbling on unused parameters rather than allocating local variables isn't adviced behavior even when it's safe to do. Move to using our own variables instead. Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io> Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
-
由 Ning Yu 提交于
A temp table is created during data reorganization, its numsegments must be the same with the original table, otherwise some data will be lost after the reorganization. Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io> Reviewed-by: NZhenghua Lyu <zlv@pivotal.io> Reviewed-by: NShujie Zhang <shzhang@pivotal.io>
-
由 BaiShaoqi 提交于
Commit 4c54c894 eliminates our divergence in deparsing filter expressions mostly by adding parentheses. However only two out of four expectfiles were updated, mostly likely because the author didn't run alternate systems (i.e. macOS), . This leads to one of those duh-it-fails-on-my-mac failures that can be easily fixed. This commit brings the two expectfiles up-to-date. Co-authored-by: NJesse Zhang <sbjesse@gmail.com> Reviewed by Paul Guo, Heikki Linnakangas
-
由 Heikki Linnakangas 提交于
The total number of rows in table, stored in pg_class.reltuples, was calculated incorrectly. It was set to the number of rows in the sample, not the (estimated?) total number of rows in the table. We didn't notice, because most of the existing tests use small tables, so that the whole table fit in the sample. Put back code like we used to have, for collecting the totals. This papers over the assertion failure we're seeing in the 'aocs' test in the concourse pipeline, with ORCA, by changing the plan to one that doesn't crash. It needs a proper fix, of course, but we're tracking that separately as https://github.com/greenplum-db/gpdb/issues/6274.
-
由 Jacob Champion 提交于
The walrep_2 and gpexpand_1 jobs are paused while we ensure that all relevant tests have been ported from them. But they have been unintentionally blocking the RC gate while paused. Remove them from the gate list.
-
由 Heikki Linnakangas 提交于
The previous fix (cc6d45c7)'t work correctly for indexes. Indexes don't have entries in gp_distribution_policy, check the parent table's policy instead.
-
由 dyozie 提交于
-
由 Bradford D. Boyle 提交于
Authored-by: NBradford D. Boyle <bboyle@pivotal.io>
-
由 Heikki Linnakangas 提交于
In commit baa76536, I got this condition right in acquire_sample_rows() function, but not in acquire_number_of_blocks(). As a result, we sometimes ended up with a sample with some rows, but relpages = 0. There was an assertion that failed with that combination.
-
- 19 11月, 2018 7 次提交
-
-
由 Heikki Linnakangas 提交于
The user-visible feature here is to speed up ANALYZE on heap tables, by using the block-level sampling method from PostgreSQL, instead of the scanning the whole table. This refactors the ANALYZE code, to make it possible to call the upstream function to do the block-level sampling again. A secondary feature is that this enables the ANALYZE callback for Foreign Data Wrappers, introduced in PostgreSQL 9.2. It was disabled in the 9.2 merge, because the function signature had been changed in GPDB, making it cumbersome to call it. This patch reverts acquire-function's signature to the same as in upstream, so the FDW callback now works again. This also moves the logic to decide which rels to analyze, based on optimizer_analyze_root_partition or optimizer_analyze_midlevel_partition GUCs and the ROOTPARTITION option, wholly to get_rel_oids() in vacuum.c. It was mostly done there already, but there were some warts in analyze.c. None of the code in analyze.c should now treat partitions differently from normal inherited tables, which helps to minimize the diff vs upstream. (Except that the HLL code still heavily uses partition-related functions. That's a TODO.)
-
由 Heikki Linnakangas 提交于
Commit 4a174240 changed the hash method, which changed the data distribution of data. It adjusted many test cases that were dependent on the data distribution, but missed this one. This one isn't failing at the moment, but it was clearly mean to look at the relfrozenxid of the segment that it inserts all the data to. Change the value of the distribution key so that all the data falls into segment 0 again.
-
由 Heikki Linnakangas 提交于
Doesn't matter much for AO tables, but if this is to be used for heap tables, where the relation size is always a multiple of BLCKSZ, it would be nice to not be off-by-one.
-
由 Heikki Linnakangas 提交于
The per-column 'toowide_cnt' was only used to check if there are any, so it's redundant with the actual flags. Replace the bool array with Bitmapset. The Bitmapset is left as NULL when there are no bits set, so we can use a NULL check in place of checking toowide_cnt.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
The HLL method of updating the root stats only works if the leaf partitions have already been analyzed. So by always analyzing the leafs first, we can make use of HLL in updating the root stats. Otherwise, it's pure luck if it works out or not, depending on the order the leafs and parents are analyzed.
-
由 Hubert Zhang 提交于
For background worker, MyProcPort is not initialized, which is used to set dbname and username when creategang. We could use MyDatabaseId and AuthenticatedUserId instead, which are initialized in InitPostgres() for background worker.
-