- 23 9月, 2017 3 次提交
-
-
由 Kavinder Dhaliwal 提交于
There are cases where during execution a Memory Intensive Operator (MI) may not use all the memory that is allocated to it. This means that this extra memory (quota - allocated) can be relinquished for other MI nodes to use during execution of a statement. For example -> Hash Join -> HashAggregate -> Hash In the above query fragment the HashJoin operator has a MI operator for both its inner and outer subtree. If there ever is the case that the Hash node used much less memory than was given as its quota it will now call MemoryAccounting_DeclareDone() and the difference between its quota and allocated amount will be added to the allocated amount of the RelinquishedPool. Doing this will enable HashAggregate to request memory from this RelinquishedPool if it exhausts its quota to prevent spilling. This PR adds two new API's to the MemoryAccounting Framework MemoryAccounting_DeclareDone(): Add the difference between a memory account's quota and its allocated amount to the long living RelinquishedPool MemoryAccounting_RequestQuotaIncrease(): Retrieve all relinquished memory by incrementing an operator's operatorMemKb and setting the RelinquishedPool to 0 Note: This PR introduces the facility for Hash to relinquish memory to the RelinquishedPool memory account and for the Agg operator (specifically HashAgg) to request an increase to its quota before it builds its hash table. This commit does not generally apply this paradigm to all MI operators Signed-off-by: NSambitesh Dash <sdash@pivotal.io> Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>
-
由 sambitesh 提交于
Before this cherry-pick the below query would have errored out WITH outermost(x) AS ( SELECT 1 UNION (WITH innermost as (SELECT 2) SELECT * FROM innermost UNION SELECT 3) ) SELECT * FROM outermost; Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>
-
由 Taylor Vesely 提交于
In order to view the primary segments' replication stream data from their pg_stat_replication view, we currently need to connect to the primary segment individually via utility mode. To make life easier, we introduce a function that will fetch each primary segment's replication stream data and wrap it with a view named gp_stat_replication. It will now be possible to view all the cluster replication information from the master in a regular psql session. Authors: Taylor Vesely and Jimmy Yih
-
- 22 9月, 2017 2 次提交
-
-
由 Kavinder Dhaliwal 提交于
Before this commit all memory allocations made by ORCA/GPOS were a blackbox to GPDB. However the ground work had been in place to allow GPDB's Memory Accounting Framework to track memory consumption by ORCA. This commit introduces two new functions Ext_OptimizerAlloc and Ext_OptimizerFree which pass through their parameters to gp_malloc and gp_free and do some bookeeping against the Optimizer Memory Account. This introduces very little overhead to the GPOS memory management framework. Signed-off-by: NMelanie Plageman <mplageman@pivotal.io> Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
-
由 Heikki Linnakangas 提交于
We can encounter tuples that belong to later batches even after the first pass. Revert the comment to the way it is in upstream. I forgot to update
-
- 21 9月, 2017 9 次提交
-
-
由 Heikki Linnakangas 提交于
The WITH RECURSIVE test case in 'join_gp' would miss some rows, if the hash algorithm (src/backend/access/hash/hashfunc.c) was replaced with the one from PostgreSQL 8.4, or if statement_mem was lowered from 1000 kB to 700 kB. This is what happened: 1. A tuple belongs to batch 0, and is kept in memory during processing batch 0. 2. The outer scan finishes, and we spill the inner batch 0 from memory to a file, with SpillFirstBatch, and start processing tuple 1 3. While processing batch 1, the number of batches is increased, and the tuple that belonged to batch 0, and was already written to the batch 0's file, is moved, to a later batch. 4. After the first scan is complete, the hash join is re-scanned 5. We reload the batch file 0 into memory. While reloading, we encounter the tuple that now doesn't seem to belong to batch 0, and throw it away. 6. We perform the rest of the re-scan. We have missed any matches to the tuple that was thrown away. It was not part of the later batch files, because in the first pass, it was handled as part of batch 0. But in the re-scan, it was not handled as part of batch 0, because nbatch was now larger, so it didn't belong there. To fix, when reloading a batch file we see a tuple that actually belongs to a later batch file, we write it to that later file. To avoid adding it there multiple times, if the hash join is re-scanned multiple times, if any tuples are moved when reloading a batch file, destroy the batch file and re-create it with just the remaining tuples. This is made a bit complicated by the fact that BFZ temp files don't support appending to a file that's already been rewinded for reading. So what we actually do, is always re-create the batch file, even if there has been no changes to it. I left comments about that, Ideally, we would either support re-appending to BFZ files, or stopped using BFZ workfiles for this altogether (I'm not convinced they're any better than plain BufFiles). But that can be done later. Fixes github issue #3284
-
由 Heikki Linnakangas 提交于
ExecHashTableInsert also increments the counter, so we don't need to do it here. This is harmless AFAICS, the counter isn't used for anything but instrumentation at the moment, but it confused me while debugging.
-
由 Heikki Linnakangas 提交于
It only worked for cursors declared with DECLARE CURSOR, before. You got an "there is no parameter $0" error if you tried. This moves the decision on whether a plan is "simply updatable", from the parser to the planner. Doing it in the parser was awkward, because we only want to do it for queries that are used in a cursor, and for SPI queries, we don't know it at that time yet. For some reason, the copy, out, read-functions of CurrentOfExpr were missing the cursor_param field. While we're at it, reorder the code to match upstream. This only makes the required changes to the Postgres planner. ORCA has never supported updatable cursors. In fact, it will fall back to the Postgres planner on any DECLARE CURSOR command, so that's why the existing tests have passed even with optimizer=off.
-
由 Heikki Linnakangas 提交于
There was code in gp_read_error_log(), to "manually" dispatch the call to all the segments, if it was executed in the dispatcher. This was previously necessary, because even though the function was marked with prodataaccess='s', the planner did not guarantee that it's executed in the segments, when called in the targetlist like "SELECT gp_read_error_log('tab')". Now that we have the EXECUTE ON ALL SEGMENTS syntax, and are more rigorous about enforcing that in the planner, this hack is no longer required.
-
由 Ning Yu 提交于
* resgroup: provide helper funcs for memory usage updates. We used to have complex and duplicate logic to update group & slot memory usage under different context, now we provide two helper functions to increase or decrease memory usage in group and slot. Two bad named functions `attachToSlot()` and `detachFromSlot()` are retired now. * resgroup: provide helper function to unassign a dropped resgroup. * resgroup: move complex checks into helper functions. Many helper functions were added with descriptive names to increase readability of lots of complex checks. Also added a pointer to resource group slot in self. * resgroup: add helper functions for wait queue operations.
-
由 Ashwin Agrawal 提交于
The intend of this extra configuration file is to control the synchronization between primary and mirror for WALREP. The gp_replication.conf is not designed to work with filerep, for example, the scripts like gp_expand will fail since it directly modify the configuration files instead of going through initdb. Signed-off-by: NXin Zhang <xzhang@pivotal.io>
-
由 Heikki Linnakangas 提交于
Also change a few regression tests to use the new syntax, instead of gp_toolkit's __gp_localid and __gp_masterid functions.
-
由 Heikki Linnakangas 提交于
We already had a hack for the EXECUTE ON ALL SEGMENTS case, by setting prodataaccess='s'. This exposes the functionality to users via DDL, and adds support for the EXECUTE ON MASTER case. There was discussion on gpdb-dev about also supporting ON MASTER AND ALL SEGMENTS, but that is not implemented yet. There is no handy "locus" in the planner to represent that. There was also discussion about making a gp_segment_id column implicitly available for functions, but that is also not implemented yet. The old behavior was that a function that if a function was marked as IMMUTABLE, it could be executed anywhere. Otherwise it was always executed on the master. For backwards-compatibility, this keeps that behavior for EXECUTE ON ANY (the default), so even if a function is marked as EXECUTE ON ANY, it will always be executed on the master unless it's IMMUTABLE. There is no support for these new options in ORCA. Using any ON MASTER or ON ALL SEGMENTS functions in a query cause ORCA to fall back. This is the same as with the prodataaccess='s' hack that this replaces, but now that it is more user-visible, it would be nice to teach ORCA about it. The new options are only supported for set-returning functions, because for a regular function marked as EXECUTE ON ALL SEGMENTS, it's not clear how the results should be combined. ON MASTER would probably be doable, but there's no need for that right now, so punt. Another restriction is that a function with ON ALL SEGMENTS or ON MASTER can only be used in the FROM clause, or in the target list of a simple SELECT with no FROM clause. So "SELECT func()" is accepted, but "SELECT func() FROM foo" is not. "SELECT * FROM func(), foo" works, however. EXECUTE ON ANY functions, which is the default, work the same as before.
-
由 Bhuvnesh Chaudhary 提交于
If there are aggregation queries with aliases same as the table actual columns and they are propagated further from subqueries and grouping is applied on the column alias it may result in inconsistent targetlists for aggregation plan causing crash. CREATE TABLE t1 (a int) DISTRIBUTED RANDOMLY; SELECT substr(a, 2) as a FROM (SELECT ('-'||a)::varchar as a FROM (SELECT a FROM t1) t2 ) t3 GROUP BY a;
-
- 20 9月, 2017 6 次提交
-
-
由 Pengzhou Tang 提交于
In this commit, we add more detailed memory metrics to the 'memory_usage' column of gp_resgroup_status include current/available memory usage in a group, current/available memory usage for a slot, current/available memory usage for the shared part.
-
由 Gang Xiong 提交于
Previously, waiters waiting on a dropped resource group need to be reassigned to a new group, to achieve it, ResGroupSlotAcquire is modified to be complicated and not easy to understand, this commit refines it. Author: Gang Xiong <gxiong@pivotal.io>
-
由 Pengzhou Tang 提交于
Allow CREATE RESOURCE GROUP and ALTER RESOURCE GROUP to set concurrency to 0, so there will eventually be no running queries after some time, so the resource group can be dropped. On drop all pending queries will be moved to the new resource group assigned to the role; but if the role is also dropped the pending queries will all be canceled. Another thing is we do not allow setting concurrency of admin group to zero, superuser is under admin group and only superuser can alter resource group, so once concurrency of admin group is set to zero, there will be no chance to set it again. Signed-off-by: NNing Yu <nyu@pivotal.io>
-
由 Ming LI 提交于
Because we don't know the data location of the result of SELECT query, ON SEGMENT is forbidden.
-
由 Richard Guo 提交于
This commit does two changes: 1. Remove the restriction that sum of memory_spill_ratio and memory_shared_quota must be no larger than 100. 2. Change the range of memory_spill_ratio to be [0, 100].
-
由 Hubert Zhang 提交于
Function FaultInjectorIdentifierStringToEnum(faultName) pass a const string to a non-const parameter, which cause a build warnig. But on the second thought, we have supported injecting fault by fault name without corresponding fault identifier, so it's better to use faultname instead of fault enum identifier in the ereport.
-
- 19 9月, 2017 4 次提交
-
-
由 Bhuvnesh Chaudhary 提交于
GPOS raises exception with different severity level, but they were being logged to GPDB logs at LOG severity level. This disabled users to not turn off logging for GPOS exceptions, unless GPDB log setting was changed higher than LOG severity level. This is the initial commit which introduces the functionality. If an exception is created without the GPDB severity level, it will default to LOG severity level in GPDB. Signed-off-by: NJemish Patel <jpatel@pivotal.io>
-
由 Xin Zhang 提交于
Signed-off-by: NAbhijit Subramanya <asubramanya@pivotal.io>
-
由 Xin Zhang 提交于
New API: void set_gp_replication_config(const char *name, const char *value) This function is inspired by the upstream ALTER SYSTEM command AlterSystemSetConfigFile() from commit 7dfab04a. Once we merged the upstream changes, we can remove this function and directly use the AlterSystemSetConfigFile(). Signed-off-by: NAbhijit Subramanya <asubramanya@pivotal.io>
-
由 Xin Zhang 提交于
We use this file to store the GUC value `sychronous_standby_names` in order to control the blocking behavior between primary and mirrors as used in upstream. When this GUC is on, the primary is blocked and waiting for the commits propagated to mirrors regardless of mirror status. When this GUC is off, primary just archive and won't wait for mirrors. gp_replication.conf is now read unconditionally by the GUC parsing logic and needs to be set up by initdb. Refactor set_null_conf() to take a filename so that we don't copy-paste more code. Signed-off-by: NJacob Champion <pchampion@pivotal.io>
-
- 18 9月, 2017 2 次提交
-
-
由 Heikki Linnakangas 提交于
These shouldn't happen, but Coverity warned about these. GCC would also complain, but I've been compiling with -Wno-maybe-uninitialized lately, because of noise. Actually, this isn't quite enough; ORCA also needs to mark GPOS_RAISE with the "noreturn" attribute, so that the compiler gets the hint. Opened https://github.com/greenplum-db/gporca/pull/234 about that.
-
由 Huan Zhang 提交于
Using fault name instead of enum as the key of fault hash table GPDB fault injector uses fault enum as the key of fault hash table. If someone wants to inject fault into gpdb extensions(a separate repo), she has to hard code the extension related fault enums into gpdb core code, this is not a good practice. So we simply use fault name as the hash key to remove the need of hard code the fault enum. Note that fault injector API doesn't change.
-
- 17 9月, 2017 2 次提交
-
-
由 Heikki Linnakangas 提交于
In GPDB, we have so far used a WindowFrame struct to represent the start and end window bound, in a ROWS/RANGE BETWEEN clause, while PostgreSQL uses the combination of a frameOptions bitmask and start and end expressions. Refactor to replace the WindowFrame with the upstream representation.
-
由 Heikki Linnakangas 提交于
This removes pg_window.winframemakerfunc column. It was only used for LEAD/LAG, and only in the Postgres planner. Hardcode the same special handling for LEAD/LAG in planwindow.c instead, based on winkind. This is one step in refactoring the planner and executor further, to replace the GPDB implementation of window functions with the upstream one.
-
- 16 9月, 2017 3 次提交
-
-
由 Heikki Linnakangas 提交于
Upstream uses >= here. It was changed in GPDB, to use > instead of >=. but I don't see how that's more correct or better. I tracked that change in the old pre-open-sourcing repository to this commit: commit f3e98a1ef5fc5915662077b137c563371ea1c0a4 Date: Mon Apr 6 15:04:33 2009 -0800 Fixed guc check for ReservedBackends. [git-p4: depot-paths = "//cdb2/main/": change = 33269] So, there was no explanation there either, what the alleged problem was.
-
由 Heikki Linnakangas 提交于
Should call setQryDistributionPolicy() after applyColumnNames(), otherwise the column names specified in the CREATE TABLE cannot be used in the DISTRIBUTED BY clause. Add test case. Fixes github issue #3285.
-
由 Kavinder Dhaliwal 提交于
Historically this function was used to special case a few operators that were not considered to be MemoryIntensive. However, now it always returns true. This commit removes the function and also moves the case for T_FunctionScan in IsMemoryIntensiveOperator into the group that always returns true, as this is its current behavior
-
- 15 9月, 2017 9 次提交
-
-
由 Heikki Linnakangas 提交于
The bzip2 library is only used by the gfile/fstream code, used for external tables and gpfdist. The usage of bzip2 was in #ifndef WIN32 blocks, so it was only built on non-Windows systems. Instead of tying it to the platform, use a proper autoconf check and HAVE_LIBBZ2 flags. This makes it possible to build gpfdist with bzip2 support on Windows, as well as building without bzip2 on non-Windows systems. That makes it easier to test the otherwise Windows-only codepaths on other platforms. --with-libbz2 is still the default, but you can now use --without-libbz2 if you wish. I'm sure that some regression tests will fail if you actually build the server without libbz2, but I'm not going to address that right now. We have similar problems with other features that are in principle optional, but cause some regression tests to fail. Also use "#ifdef HAVE_LIBZ" rather than "#ifndef WIN32" to enable/disable zlib support in gpfdist. Building the server still fails if you use --without-zlib, but at least you can build the client programs without zlib, also on non-Windows systems. Remove obsolete copy of bzlib.h from the repository while we're at it.
-
由 Heikki Linnakangas 提交于
If a the sample of a column consists entirely of "too wide" values, which are left out of the sample when it's passed to the compute_stats function, we pass an empty sample to it. The default compute_stats gets confused by that, and computes the null fraction as 0 / 0 = NaN, so we end up storing NaN as stanullfrac. If all the values in the sample are wide values, then they're surely not NULLs, so the right thing to do is to store stanullfrac = 0. That is a bit non-linear with the normal compute_stats function, which effectively treats too wide values as not existing at all, which artificially inflates the null fraction. Another non-linear thing is that we store stawidth=1024 in this special case, but the normal computation again ignores the wide values in computing stawidth. If we wanted to do something about that, we should adjust the normal computation to take those wide values better into account, but that's a different story, and now we at least won't store NaN in stanullfrac any longer. Fixes github issue #3259.
-
由 Heikki Linnakangas 提交于
This is a bit unfortunate, in case someone is using them. But as it happens, we haven't even mentioned the ALTER TYPE SET DEFAULT ENCODING command in the documentation, so there probably aren't many people using them, and you can achieve the same thing by using the normal, non-alias, names like "varchar" instead of "character varying".
-
由 Heikki Linnakangas 提交于
This way we don't need the weird half-transformation of WindowDefs. Makes things simpler.
-
由 Heikki Linnakangas 提交于
The 'location' field is just to give better error messages. It should not be considered when testing whether two nodes are equal. (Note that the COMPARE_LOCATION_FIELD() macro that we now consistently use on the 'location' field is a no-op.) I noticed this while working on a patch that would compare two ColumnRefs to see if they are equal, and could be collapsed to one.
-
由 Heikki Linnakangas 提交于
While working on the 8.4 merge, I had a bug that tripped an Insist inside the PG_TRY-CATCH. That was very difficult to track down, because the way the error is logged here. Using ereport() includes filename and line number where it's re-emitted, not the original place. So all I got was "Unexpected internal error" in the log, with meaningless filename & lineno. This rewrites the way the error is reported so that it preserves the original filename and line number. It will also use the original error level and will preserve all the other fields.
-
由 Ming LI 提交于
-
由 Zhenghua Lyu 提交于
The user can config resgroup to make some query's query memory is zero. In such cases, it will use work memory. And since query_mem's type is uint64, we simply remove the assert in spi execution's code.
-
由 Ashwin Agrawal 提交于
Using gp_segment_configuration catalog table easily can find if mirrors exist or not, do not need special table to communicate the same. Earlier gp_fault_strategy used to convey 'n' for mirrorless system, 'f' for replication and 's' for san mirrors. Since support for 's' was removed in 5.0 only purpose gp_fault_strategy served was mirrored or not mirrored system. Hence deleting the gp_fault_strategy table and at required places using gp_segment_configuration to find the required info.
-