- 28 9月, 2018 1 次提交
-
-
由 ZhangJackey 提交于
There was an assumption in gpdb that a table's data is always distributed on all segments, however this is not always true for example when a cluster is expanded from M segments to N (N > M) all the tables are still on M segments, to workaround the problem we used to have to alter all the hash distributed tables to randomly distributed to get correct query results, at the cost of bad performance. Now we support table data to be distributed on a subset of segments. A new columne `numsegments` is added to catalog table `gp_distribution_policy` to record how many segments a table's data is distributed on. By doing so we could allow DMLs on M tables, joins between M and N tables are also supported. ```sql -- t1 and t2 are both distributed on (c1, c2), -- one on 1 segments, the other on 2 segments select localoid::regclass, attrnums, policytype, numsegments from gp_distribution_policy; localoid | attrnums | policytype | numsegments ----------+----------+------------+------------- t1 | {1,2} | p | 1 t2 | {1,2} | p | 2 (2 rows) -- t1 and t1 have exactly the same distribution policy, -- join locally explain select * from t1 a join t1 b using (c1, c2); QUERY PLAN ------------------------------------------------ Gather Motion 1:1 (slice1; segments: 1) -> Hash Join Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2 -> Seq Scan on t1 a -> Hash -> Seq Scan on t1 b Optimizer: legacy query optimizer -- t1 and t2 are both distributed on (c1, c2), -- but as they have different numsegments, -- one has to be redistributed explain select * from t1 a join t2 b using (c1, c2); QUERY PLAN ------------------------------------------------------------------ Gather Motion 1:1 (slice2; segments: 1) -> Hash Join Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2 -> Seq Scan on t1 a -> Hash -> Redistribute Motion 2:1 (slice1; segments: 2) Hash Key: b.c1, b.c2 -> Seq Scan on t2 b Optimizer: legacy query optimizer ```
-
- 27 9月, 2018 8 次提交
-
-
由 Heikki Linnakangas 提交于
The proprietary build can install them as normal C language functions, with CREATE FUNCTION, instead. In the passing, remove some unused QuickLZ debugging GUCs. This doesn't yet get rid of all references to QuickLZ, unfortunately. The GUC and reloption validation code still needs to know about it, so that they can validate the options read from postgresql.conf, when starting up postmaster. For the same reason, you cannot yet add custom compression algorithms, besides quicklz, as an extension. But this is another step in the right direction, anyway. Co-authored-by: NJimmy Yih <jyih@pivotal.io> Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
-
由 Daniel Gustafsson 提交于
The comment states that "small" might be defined by socket.h, and while thats not true for all versions of sys/socket.h, it's still not a good name to use as it's common in Windows headers (should we ever revive a Windows port). Renaming to a non-colliding name is a small price to pay to avoid subtle bugs, so rename and remove the preprocessor dance. Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-
由 Daniel Gustafsson 提交于
The test suite, which was ported over from TINC, was ignoring so much of the memorized output that it more or less didn't test anything (and the ignored blocks was as full of outdated output as one would imagine). The code was also formatted in weird ways and had needless NOTICEs thrown during execution. This refactors the testsuite to remove all ignore blocks, removes some utterly pointless tests (there are many more of them left), formats the code to be readable, fixes the output to work and removes some duplicate tests. The remaining bits of the suite is by no means terribly interestering, but it runs fast enough that it's worth keeping the leftovers for now. Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-
由 Tang Pengzhou 提交于
* change type of db_descriptors to SegmentDatabaseDescriptor ** A new gang definination may consist of cached segdbDesc and new created segdbDesc, there is no need to palloc all segdbDesc struct as new. * Remove unnecessary allocate gang unit test * Manage idle segment dbs using CdbComponentDatabases instead of available* lists. To support vary size gang, we now need to manage segment dbs in a lower granularity, previously, idle QEs is managed by a bunch of lists like availablePrimaryWriterGang, availableReaderGangsN, this restrict dispatcher to only create N-size (N = number of segments) or 1-size gang. CdbComponentDatabases is a snapshot of segment components within current cluster, now it maintains a freelist for each segment component. When creating gang, dispatcher will make up a gang from each segment component (from freelist or create a new segment db). When cleaning up a gang, dispatcher will return idle segment dbs to each segment component. CdbComponentDatabases provide a few functions to manipulate segment dbs (SegmentDatabaseDescriptor *): * cdbcomponent_getCdbComponents * cdbcomponent_destroyCdbComponents * cdbcomponent_allocateIdleSegdb * cdbcomponent_recycleIdleSegdb * cdbcomponent_cleanupIdleSegdbs CdbComponentDatabases is also FTS version sensitive, so once a FTS version changed, CdbComponentDatabases destroy all idle segment dbs and allocate QEs in the new promoted segment. This provides the ability to transparent mirror failover to users. Since segment dbs(SegmentDatabaseDescriptor *) are managed by CdbComponentDatabases now, we can simplify the memory context management by replacing GangContext & perGangContext with DispatcherContext & CdbComponentsContext. * Postpone the error hanlding when creating gang Now we have AtAbort_DispatcherState, one advantage of it is that we can postpone gang error hanlding in this function and make code cleaner. * Handle FTS version change correctly In some cases, when a FTS version changed, we can't update current snapshot of segment components, to be more specifically, we can't destroy current writer segment dbs and create new segment dbs. These cases include: * session has temp table created. * query need two-phase commit and gxid has been dispatched to segments. * Replace <gangId, sliceId> map with <qeIdentifier, sliceId> map We used to dispatch a <gangId, sliceId> map along with query to segment dbs so segment dbs can know which slice they should execute. Now gangId is useless for a segment db because a segment db can be reused by different gang, so we need a new way to tell the info to segment dbs. To resolve this, CdbComponentDatabases assign a unique identifier to each segment db and make up a bitmap set which consist of segment identifiers for each slice, segment dbs then can go through the slice table and find the right slice to execute. * Allow dispatcher to create vary size gang and refine AssignGangs() Previously, dispatcher can only create N-size gang for GANGTYPE_PRIMARY_WRITER or GANGTYPE_PRIMARY_READER. this restrict dispatcher in many ways, one example is direct dispatch, it always create a N-size gang even it only dispatch the command to one segment, another example is some operations may be able to use N+ size gang, like hash join, if both inner and outer plan is redistributed, the hash join node can associate with a N+ size gang to execute. This commit changes the API of createGang() so the caller can specify a list of segments (partial or even duplicate segments), CdbCompoentDatabase will guarantee each segment has only one writer in a session. With this it also resolves another pain point of AssignGangs(), so the caller don't need to promote a GANGTYPE_PRIMARY_READER to GANGTYPE_PRIMARY_WRITER, or promote a GANGTYPE_SINGLETON _READER to GANGTYPE_PRIMARY_WRITER for replicated table (see FinalizeSliceTree()). With this commit, AssignGang() is very clear now.
-
由 Paul Guo 提交于
As the comment said, this was useful howerver now that we have upstream add_rte_to_flat_rtable() to handle that, let's remove this call.
-
由 Daniel Gustafsson 提交于
Fixes clang (and probably gcc) compiler warning on unused variable. Reviewed-by: NPaul Guo <pguo@pivotal.io> Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
-
由 David Kimura 提交于
Until we have replication slots this will keep enough xlog segments around so that mirrors have an opportunity to reconnect when a checkpoint removes a segment while the mirror is not streaming. Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
-
由 Heikki Linnakangas 提交于
As far as I can see, the 'is_internal' flag is passed through to possible object access hook, but it has no other effect. Mark the LOV index and heap created for bitmap indexes, as well as constrains created for exchanged partitions as 'internal'.
-
- 26 9月, 2018 13 次提交
-
-
由 Heikki Linnakangas 提交于
I'm not entirely sure what was going on here before. I suspect we had backported some fixes from later upstream versions, and they caused merge conflicts and confusion now. But in any case, I see no reason to deviate from upstream now, so just remove the FIXME.
-
由 Heikki Linnakangas 提交于
We had backported upstream commits 425bef6ee7 and 2cd72ba42d earlier, but those got partially reverted in the 9.3 merge. Or earlier, or we hadn't backported them completely to begin with - I didn't investigate the exact path of how we got here. In any case, a partial backport is confusing, so take the code around this from the tip of 9.3 stable, so that we have both of those commits fully backported.
-
由 Adam Berlin 提交于
-
由 Adam Berlin 提交于
-
由 Adam Berlin 提交于
-
由 Adam Berlin 提交于
-
由 Asim R P 提交于
The functions allow obtaining or removing entries from the shared hash table maintained on QD. Default size of this hash table is 1000 and entries are removed only after it is filled to capacity. The two functions should be helpful for testing as well as troubleshooting issues with appendonly tables in production deployments. Co-authored-by: NJimmy Yih <jyih@pivotal.io>
-
由 Asim R P 提交于
A segment file that is compacted by vacuum is left in awaiting drop state on QEs. Such a segment file should not be chosen for new inserts because it will never be considered for reading during scans. This patch fixes a bug in the logic to determine if a segment file is in awaiting drop state. Precondition for the bug includes a specific interleaving of vacuum and insert transactions on the same appendonly table, manifested in the accompanying test. The fix is to use SnapshotNow instead of MVCC snapshot. A segment file whose state is updated to awaiting drop by a vacuum compaction transaction may still be be seen as available for inserts through MVCC snapshot. When a vacuum compaction transaction is in progress, the aoentry for the relation in appendonly hash cannot be evicted and the need for obtaining state from QEs does not arise.
-
由 Asim R P 提交于
Spotted while reading.
-
由 Asim R P 提交于
This commit promotes a few assertions into elog(ERROR) so as to avoid new data being appended to a segmene file that is not in available state. Scans on an AO table do not read segment files that are awaiting to be dropped. New data, if inserted in such a segment file, will be lost forever. The accompanying isolation2 test demonstrates a bug that hits these errors. The test uses a newly added UDF to evict an entry from the appendonly hash table. In production, an entry is evicted when the appendonly hash table is filled (default capacity of 1000 entries). Note: the bug will be fixed in a separate patch. Co-authored-by: NAdam Berlin <aberlin@pivotal.io>
-
由 David Kimura 提交于
-
由 David Kimura 提交于
The test is included in "installcheck" under the zstd module. It should be eventually included as part of ICW. CAVEAT: the test for zstd, as it turns out, is wrong (nondeterministic) since its inclusion in commit 724f9d27. This move eliminiates the need to have a separate "error: zstd not supported" answer file in ICG. Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
-
由 David Kimura 提交于
After this patch: - zstd functions are no longer part of the built-in catalog - is only built when enabled with `--with-zstd` in autoconf Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
-
- 25 9月, 2018 9 次提交
-
-
由 Adam Berlin 提交于
In GPDB, we only want an autovacuum worker to start once we know there is a database to vacuum. When we changed the default value of the `autovacuum_start_daemon` from `true` to `false` for GPDB, we made the behavior of the AutoVacuumLauncherMain() be to immediately start an autovacuum worker from the launcher and exit, which is called 'emergency mode'. When the 'emergency mode' is running it is possible to continuously start an autovacuum worker. Within the worker, the PMSIGNAL_START_AUTOVAC_LAUNCHER signal is sent when a database is found that is old enough to be vacuumed, but because we only autovacuum non-connectable databases (template0) in GPDB and we do not have logic to filter out connectable databases in the autovacuum worker. This change allows the autovacuum launcher to do more up-front decision making about whether it should start an autovacuum worker, including GPDB specific rules. Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
-
由 Paul Guo 提交于
create_unique_path() could be used to convert semi join to inner join. Previously, during the Semi-join refactor in commit d4ce0921, creating unique path was disabled for the case where duplicats might be on different QEs. In this patch we enable adding motion to unique_ify the path, only if unique mothod is not UNIQUE_PATH_NOOP. We don't create unique path for that case because if later on during plan creation, it is possible to create a motion above this unique path whose subpath is a motion. In that case, the unique path node will be ignored and we will get a motion plan node above a motion plan node and that is bad. We could further improve that, but not in this patch. Co-authored-by: NAlexandra Wang <lewang@pivotal.io> Co-authored-by: NPaul Guo <paulguo@gmail.com>
-
由 Daniel Gustafsson 提交于
The bkuprestore test was imported along with the source code during the initial open sourcing, but has never been used and hasn't worked in a long time. Rather than trying to save this broken mess, let's remove it and start fresh with a pg_dump TAP test which is a much better way to test backup/restore. Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io> Reviewed-by: NJimmy Yih <jyih@pivotal.io>
-
由 Dhanashree Kashid 提交于
-
由 Ashwin Agrawal 提交于
Regular fault injection doesn't work for mirrors. Hence, using SIGUSR2 signal and on-disk file coupled with it just for testing a fault injection mechanism was coded. This seems very hacky and intrusive, hence plan is to get rid of the same. Most of the tests using this framework are found not useful as majority of code is upstream. Even if needs testing, better alternative would be explored.
-
由 Ashwin Agrawal 提交于
Most of the backup block related modification for providing the wal_consistency_checking was removed as part of 9.3 merge. This was mainly done to avoid merge conflicts. The masking functions are still used by gp_replica_check tool to perform checking between primary and mirrors. But the online version of checking during each replay of record was let go. So, in this commit cleaning up remaining pieces which are not used. We will get back this in properly working condition when we catch up to upstream.
-
由 Ashwin Agrawal 提交于
Removing the fault types which do not have implementation. Or have implementation but doesn't seem usable. This will just help to have only working subset of faults. Like data corruption fault seems pretty useless. Even if needed then can be easily coded for specific usecase using the skip fault, instead of having special one defined for it. Fault type "fault" is redundant with "error" hence removing the same as well.
-
由 Ashwin Agrawal 提交于
-
由 Dhanashree Kashid 提交于
Following commits have been cherry-picked again: b1f543f3. b0359e69. a341621d. The contrib/dblink tests were failing with ORCA after the above commits. The issue has been fixed now in ORCA v3.1.0. Hence we re-enabled these commits and bumping the ORCA version.
-
- 24 9月, 2018 3 次提交
-
-
由 Heikki Linnakangas 提交于
I couldn't find an easy way to make this assertion work, with the "flattened" range table in 9.3. The information needed for this is zapped away in add_rte_to_flat_rtable(). I think we can live without this assertion.
-
由 Heikki Linnakangas 提交于
Updating a distribution key column is performed as a "split update", i.e. separate DELETE and INSERT operations, which may happen on different nodes. In case of RETURNING, the DELETE operation was also returning a row, and it was also incorrectly counted in the row count returned to the client, in the command tag (e.g. "UPDATE 2"). Fix, and add a regression test. Fixes https://github.com/greenplum-db/gpdb/issues/5839
-
由 Heikki Linnakangas 提交于
The reason we needed the FIXME pq_getmessage() call, marked with the FIXME comment, was that we were missing the pq_getmessage() call from ProcessStandbyMessage(), that the corresponding upstream version, at the point that we're caught up in the merge, had. I believe the reason it was missing from ProcessStandbyMessage() was that we had earlier backported upstream commit cd19848bd55. That commit removed the pq_getmessage() call from ProcessStandbyMessage(), and added one in ProcessRepliesIfAny(), instead. Clarify this by changing the code to match upstream commit cd19848bd55. (Except that we don't have pq_startmsgread() yet, that will arrive when we merge the rest of commit cd19848bd55.)
-
- 23 9月, 2018 2 次提交
-
-
由 Daniel Gustafsson 提交于
getgpsegmentCount() was defined in both cdbvars.h and cdbutil.h. While not needing another header include in some cases, getgpsegmentCount() is not a variable and the correct location is cdbutil.h. Remove the prototype from cdbvars.g and update includes as required. Also fix the function comment to match reality and minor tweaking of the debug elog() performed. Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io> Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
-
由 Daniel Gustafsson 提交于
There is already an assertion in getgpsegmentCount() testing the count to be > 0 (and 0 can only be returned in utility mode which still holds this assertion always true). Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io> Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
-
- 22 9月, 2018 4 次提交
-
-
由 Jesse Zhang 提交于
Commit 825ca1e3 didn't seem to work well when we hook up ORCA's memory system to memory accounting. We are tripping multiple asserts in regression tests. The reg test failures seem to suggest we are double-free'ing somewhere (or incorrectly accounting). Reverting for now to get master back to green. This reverts commit 825ca1e3.
-
由 Taylor Vesely 提交于
The memory accounting system generates a new memory account for every execution node initialized in ExecInitNode. The address to these memory accounts is stored in the shortLivingMemoryAccountArray. If the memory allocated for shortLivingMemoryAccountArray is full, we will repalloc the array with double the number of available entries. After creating approximately 67000000 memory accounts, it will need to allocate more than 1GB of memory to increase the array size, and throw an ERROR, canceling the running query. PL/pgSQL and SQL functions will create new executors/plan nodes that must be tracked my the memory accounting system. This level of detail is not necessary for tracking memory leaks, and creating a separate memory account for every executor will use large amount of memory just to track these memory accounts. Instead of tracking millions of individual memory accounts, we consolidate any child executor account into a special 'X_NestedExecutor' account. If explain_memory_verbosity is set to 'detailed' and below, consolidate all child executors into this account. If more detail is needed for debugging, set explain_memory_verbosity to 'debug', where, as was the previous behavior, every executor will be assigned its own MemoryAccountId. Originally we tried to remove nested execution accounts after they finish executing, but rolling over those accounts into a 'X_NestedExecutor' account was impracticable to accomplish without the possibility of a future regression. If any accounts are created between nested executors that are not rolled over to an 'X_NestedExecutor' account, recording which accounts are rolled over can grow in the same way that the shortLivingMemoryAccountArray is growing today, and would also grow too large to reasonably fit in memory. If we were to iterate through the SharedHeaders every time that we finish a nested executor, it is not likely to be very performant. While we were at it, convert some of the convenience macros dealing with memory accounting for executor / planner node into functions, and move them out of memory accounting header files into the sole callers' compilation units. Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io> Co-authored-by: NEkta Khanna <ekhanna@pivotal.io> Co-authored-by: NAdam Berlin <aberlin@pivotal.io> Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io> Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
-
由 Taylor Vesely 提交于
Functions using SQL and PL/pgSQL will plan and execute arbitrary SQL inside a running query. The first time we initialize a plan for an SQL block, the memory accounting system creates a new memory account for each Executor/Node. In the case that we are executing a cached plan, (i.e. plancache.c) the memory accounts will have already been assigned in a previous execution of the plan. As a result, when explain_memory_verbosity is set to 'detail', it is not clear what memory account corresponds to which executor. Instead, move the memoryAccountId into PlanState/QueryDesc, which will insure that every time we initialize an executor, it will be assigned a unique memoryAccountId. Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
-
由 Heikki Linnakangas 提交于
The FIXME was added to GPDB in commit f86622d9, which backported the local cache of resource owners attached to LOCALLOCK. I think the comment was added, because in the upstream commit that added the cache, the upstream didn't thave the check guarding the pfree() yet. It was added later in upstream, too, in commit 7e6e3bdd3c, and that had already been backported to GPDB. So it's alright, the guard on the pfree is a good thing to have, and there's nothing further to do here.
-