- 23 9月, 2017 1 次提交
-
-
由 Kavinder Dhaliwal 提交于
There are cases where during execution a Memory Intensive Operator (MI) may not use all the memory that is allocated to it. This means that this extra memory (quota - allocated) can be relinquished for other MI nodes to use during execution of a statement. For example -> Hash Join -> HashAggregate -> Hash In the above query fragment the HashJoin operator has a MI operator for both its inner and outer subtree. If there ever is the case that the Hash node used much less memory than was given as its quota it will now call MemoryAccounting_DeclareDone() and the difference between its quota and allocated amount will be added to the allocated amount of the RelinquishedPool. Doing this will enable HashAggregate to request memory from this RelinquishedPool if it exhausts its quota to prevent spilling. This PR adds two new API's to the MemoryAccounting Framework MemoryAccounting_DeclareDone(): Add the difference between a memory account's quota and its allocated amount to the long living RelinquishedPool MemoryAccounting_RequestQuotaIncrease(): Retrieve all relinquished memory by incrementing an operator's operatorMemKb and setting the RelinquishedPool to 0 Note: This PR introduces the facility for Hash to relinquish memory to the RelinquishedPool memory account and for the Agg operator (specifically HashAgg) to request an increase to its quota before it builds its hash table. This commit does not generally apply this paradigm to all MI operators Signed-off-by: NSambitesh Dash <sdash@pivotal.io> Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>
-
- 22 9月, 2017 1 次提交
-
-
由 Kavinder Dhaliwal 提交于
Before this commit all memory allocations made by ORCA/GPOS were a blackbox to GPDB. However the ground work had been in place to allow GPDB's Memory Accounting Framework to track memory consumption by ORCA. This commit introduces two new functions Ext_OptimizerAlloc and Ext_OptimizerFree which pass through their parameters to gp_malloc and gp_free and do some bookeeping against the Optimizer Memory Account. This introduces very little overhead to the GPOS memory management framework. Signed-off-by: NMelanie Plageman <mplageman@pivotal.io> Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
-
- 21 9月, 2017 3 次提交
-
-
由 Ning Yu 提交于
* resgroup: provide helper funcs for memory usage updates. We used to have complex and duplicate logic to update group & slot memory usage under different context, now we provide two helper functions to increase or decrease memory usage in group and slot. Two bad named functions `attachToSlot()` and `detachFromSlot()` are retired now. * resgroup: provide helper function to unassign a dropped resgroup. * resgroup: move complex checks into helper functions. Many helper functions were added with descriptive names to increase readability of lots of complex checks. Also added a pointer to resource group slot in self. * resgroup: add helper functions for wait queue operations.
-
由 Ashwin Agrawal 提交于
The intend of this extra configuration file is to control the synchronization between primary and mirror for WALREP. The gp_replication.conf is not designed to work with filerep, for example, the scripts like gp_expand will fail since it directly modify the configuration files instead of going through initdb. Signed-off-by: NXin Zhang <xzhang@pivotal.io>
-
由 Heikki Linnakangas 提交于
We already had a hack for the EXECUTE ON ALL SEGMENTS case, by setting prodataaccess='s'. This exposes the functionality to users via DDL, and adds support for the EXECUTE ON MASTER case. There was discussion on gpdb-dev about also supporting ON MASTER AND ALL SEGMENTS, but that is not implemented yet. There is no handy "locus" in the planner to represent that. There was also discussion about making a gp_segment_id column implicitly available for functions, but that is also not implemented yet. The old behavior was that a function that if a function was marked as IMMUTABLE, it could be executed anywhere. Otherwise it was always executed on the master. For backwards-compatibility, this keeps that behavior for EXECUTE ON ANY (the default), so even if a function is marked as EXECUTE ON ANY, it will always be executed on the master unless it's IMMUTABLE. There is no support for these new options in ORCA. Using any ON MASTER or ON ALL SEGMENTS functions in a query cause ORCA to fall back. This is the same as with the prodataaccess='s' hack that this replaces, but now that it is more user-visible, it would be nice to teach ORCA about it. The new options are only supported for set-returning functions, because for a regular function marked as EXECUTE ON ALL SEGMENTS, it's not clear how the results should be combined. ON MASTER would probably be doable, but there's no need for that right now, so punt. Another restriction is that a function with ON ALL SEGMENTS or ON MASTER can only be used in the FROM clause, or in the target list of a simple SELECT with no FROM clause. So "SELECT func()" is accepted, but "SELECT func() FROM foo" is not. "SELECT * FROM func(), foo" works, however. EXECUTE ON ANY functions, which is the default, work the same as before.
-
- 20 9月, 2017 5 次提交
-
-
由 Pengzhou Tang 提交于
In this commit, we add more detailed memory metrics to the 'memory_usage' column of gp_resgroup_status include current/available memory usage in a group, current/available memory usage for a slot, current/available memory usage for the shared part.
-
由 Gang Xiong 提交于
Previously, waiters waiting on a dropped resource group need to be reassigned to a new group, to achieve it, ResGroupSlotAcquire is modified to be complicated and not easy to understand, this commit refines it. Author: Gang Xiong <gxiong@pivotal.io>
-
由 Pengzhou Tang 提交于
Allow CREATE RESOURCE GROUP and ALTER RESOURCE GROUP to set concurrency to 0, so there will eventually be no running queries after some time, so the resource group can be dropped. On drop all pending queries will be moved to the new resource group assigned to the role; but if the role is also dropped the pending queries will all be canceled. Another thing is we do not allow setting concurrency of admin group to zero, superuser is under admin group and only superuser can alter resource group, so once concurrency of admin group is set to zero, there will be no chance to set it again. Signed-off-by: NNing Yu <nyu@pivotal.io>
-
由 Richard Guo 提交于
This commit does two changes: 1. Remove the restriction that sum of memory_spill_ratio and memory_shared_quota must be no larger than 100. 2. Change the range of memory_spill_ratio to be [0, 100].
-
由 Hubert Zhang 提交于
Function FaultInjectorIdentifierStringToEnum(faultName) pass a const string to a non-const parameter, which cause a build warnig. But on the second thought, we have supported injecting fault by fault name without corresponding fault identifier, so it's better to use faultname instead of fault enum identifier in the ereport.
-
- 19 9月, 2017 3 次提交
-
-
由 Xin Zhang 提交于
Signed-off-by: NAbhijit Subramanya <asubramanya@pivotal.io>
-
由 Xin Zhang 提交于
New API: void set_gp_replication_config(const char *name, const char *value) This function is inspired by the upstream ALTER SYSTEM command AlterSystemSetConfigFile() from commit 7dfab04a. Once we merged the upstream changes, we can remove this function and directly use the AlterSystemSetConfigFile(). Signed-off-by: NAbhijit Subramanya <asubramanya@pivotal.io>
-
由 Xin Zhang 提交于
We use this file to store the GUC value `sychronous_standby_names` in order to control the blocking behavior between primary and mirrors as used in upstream. When this GUC is on, the primary is blocked and waiting for the commits propagated to mirrors regardless of mirror status. When this GUC is off, primary just archive and won't wait for mirrors. gp_replication.conf is now read unconditionally by the GUC parsing logic and needs to be set up by initdb. Refactor set_null_conf() to take a filename so that we don't copy-paste more code. Signed-off-by: NJacob Champion <pchampion@pivotal.io>
-
- 18 9月, 2017 1 次提交
-
-
由 Huan Zhang 提交于
Using fault name instead of enum as the key of fault hash table GPDB fault injector uses fault enum as the key of fault hash table. If someone wants to inject fault into gpdb extensions(a separate repo), she has to hard code the extension related fault enums into gpdb core code, this is not a good practice. So we simply use fault name as the hash key to remove the need of hard code the fault enum. Note that fault injector API doesn't change.
-
- 17 9月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
In GPDB, we have so far used a WindowFrame struct to represent the start and end window bound, in a ROWS/RANGE BETWEEN clause, while PostgreSQL uses the combination of a frameOptions bitmask and start and end expressions. Refactor to replace the WindowFrame with the upstream representation.
-
- 16 9月, 2017 1 次提交
-
-
由 Kavinder Dhaliwal 提交于
Historically this function was used to special case a few operators that were not considered to be MemoryIntensive. However, now it always returns true. This commit removes the function and also moves the case for T_FunctionScan in IsMemoryIntensiveOperator into the group that always returns true, as this is its current behavior
-
- 15 9月, 2017 2 次提交
-
-
由 Heikki Linnakangas 提交于
The bzip2 library is only used by the gfile/fstream code, used for external tables and gpfdist. The usage of bzip2 was in #ifndef WIN32 blocks, so it was only built on non-Windows systems. Instead of tying it to the platform, use a proper autoconf check and HAVE_LIBBZ2 flags. This makes it possible to build gpfdist with bzip2 support on Windows, as well as building without bzip2 on non-Windows systems. That makes it easier to test the otherwise Windows-only codepaths on other platforms. --with-libbz2 is still the default, but you can now use --without-libbz2 if you wish. I'm sure that some regression tests will fail if you actually build the server without libbz2, but I'm not going to address that right now. We have similar problems with other features that are in principle optional, but cause some regression tests to fail. Also use "#ifdef HAVE_LIBZ" rather than "#ifndef WIN32" to enable/disable zlib support in gpfdist. Building the server still fails if you use --without-zlib, but at least you can build the client programs without zlib, also on non-Windows systems. Remove obsolete copy of bzlib.h from the repository while we're at it.
-
由 Ashwin Agrawal 提交于
Using gp_segment_configuration catalog table easily can find if mirrors exist or not, do not need special table to communicate the same. Earlier gp_fault_strategy used to convey 'n' for mirrorless system, 'f' for replication and 's' for san mirrors. Since support for 's' was removed in 5.0 only purpose gp_fault_strategy served was mirrored or not mirrored system. Hence deleting the gp_fault_strategy table and at required places using gp_segment_configuration to find the required info.
-
- 14 9月, 2017 3 次提交
-
-
由 Heikki Linnakangas 提交于
Although I'm not too familiar with SystemTap, I'm pretty sure that recent versions can do user space tracing better. I don't think anyone is using these hacks anymore, so remove them.
-
由 Ning Yu 提交于
* resgroup: move MyResGroupSharedInfo into MyResGroupProcInfo. MyResGroupSharedInfo is now replaced with MyResGroupProcInfo->group. * resgroup: retire resGranted in PGPROC. when resGranted == false we must have resSlotId == InvalidSlotId, when resGranted != false we must have resSlotId != InvalidSlotId, so we can retire resGranted and keep only resSlotId. * resgroup: rename sharedInfo to group. in resgroup.c there used to be both `group` and `sharedInfo` for the same thing, now only use `group`. * resgroup: rename MyResGroupProcInfo to self. We want to use this variable directly so a short name is better.
-
由 Heikki Linnakangas 提交于
This is the spec-compliant spelling, but GPDB has only allowed "agg OVER (window)" so far. With this commit, the parens are still allowed, for backwards-compatibility. Change deparsing code to also use the non-parens syntax in view definitions and EXPLAIN. Adjust expected output of regression tests accordingly.
-
- 13 9月, 2017 1 次提交
-
-
由 Jimmy Yih 提交于
It was noticed in CentOS 6 that the entries in the char* list would become blank after freeing the raw string that is obtained from setting the GUC wal_consistency_checking. We should free this string only after we finish using it. Authors: Jimmy Yih and Taylor Vesely
-
- 12 9月, 2017 4 次提交
-
-
由 Pengzhou Tang 提交于
The view gp_toolkit.gp_resgroup_status collect resgroup status information on both QD and QEs, also before and after a 300ms delay to measure the cpu usage. When there is concurrently resgroup drops it might fail due to missing resgroups. This is fixed by holding ExclusiveLock which blocks the drop operations. Signed-off-by: NNing Yu <nyu@pivotal.io>
-
由 Pengzhou Tang 提交于
Previously in ResGroupSlotRelease(), we mark the resWaiting flag to false out of the ResGroupLock, a race condition can happen as that the waitProc that meant to be wake up is canceled just before it's resWaiting is set to false and in the window resWaiting is set, the waitProc run another query and wait for a free slot again, then resWaiting is set and the waitProc will get another unexpected free slot. eg: rg1 has concurrency = 1 s1: BEGIN; s2: BEGIN; -- blocked s1: END; -- set breakpoint just before resWaiting is set in ResGroupSlotRelease() s2: cancel it; s3: BEGIN; -- will get a free slot. s2: BEGIN; -- run begin again and be blocked. s1: continue; s2: s2 will got unexpected free slot To avoid leak of resgroup slot, we set resWaiting under ResGroupLock so the waitProc have no window to re-fetch the slot before resWaiting is set. It's hard to add a regression test, adding breakpoint in ResGroupSlotRelease may block all other operations because it holds the ResGroupLock.
-
由 Heikki Linnakangas 提交于
tqual.c is quite a long file, and there are plenty of legitimate MPP-specific changes in it compared to upstream. Move these functions to separate file, to reduce our diff footprint. This hopefully makes merging easier in the future. Run pgindent over the new file, and do some manual tidying up. Also, use CStringGetTextDatum() instead of the more complicated dance with DirectFunctionCall and textin(), to convert C strings into text Datums.
-
由 Heikki Linnakangas 提交于
In the upstream, two different structs are used to represent a window definition. WindowDef in the grammar, which is transformed into WindowClause during parse analysis. In GPDB, we've been using the same struct, WindowSpec, in both stages. Split it up, to match the upstream. The representation of the window frame, i.e. "ROWS/RANGE BETWEEN ..." was different between the upstream implementation and the GPDB one. We now use the upstream frameOptions+startOffset+endOffset representation in raw WindowDef parse node, but it's still converted to the WindowFrame representation for the later stages, so WindowClause still uses that. I will switch over the rest of the codebase to the upstream representation as a separate patch. Also, refactor WINDOW clause deparsing to be closer to upstream. One notable difference is that the old WindowSpec.winspec field corresponds to the winref field in WindowDef andWindowClause, except that the new 'winref' is 1-based, while the old field was 0-based. Another noteworthy thing is that this forbids specifying "OVER (w ROWS/RANGE BETWEEN ...", if the window "w" already specified a window frame, i.e. a different ROWS/RANGE BETWEEN. There was one such case in the regression suite, in window_views, and this updates the expected output of that to be an error.
-
- 09 9月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
This adds the 'winstar' field from the upstream. Also bring in the 'winagg' field while we're at it, although it's only used for an assertion in nodeWindow.c so far.
-
- 08 9月, 2017 4 次提交
-
-
由 Ashwin Agrawal 提交于
FaultInjector_UpdateHashEntry() was using FaultInjector_InsertHashEntry(), which ends-up adding entry if not present without incrementing `faultInjectorShmem->faultInjectorSlots`. This causes inconsistency, plus also sometimes encounters "FailedAssertion(""!(faultInjectorShmem->faultInjectorSlots == 0)""," during fault inject reset, as goes negative. Fixing the same by using FaultInjector_LookupHashEntry() instead as that's what FaultInjector_UpdateHashEntry() needs. Scenario the Assertion was hitting: gpfaultinjector -f all -m async -y resume -r primary -H ALL gpfaultinjector -f all -m async -y reset -r primary -H ALL
-
由 Daniel Gustafsson 提交于
Use the available macro for detecting FTP protocol and use pstrdup instead of palloc+strncpy. Also fix a spacing issue with an ereport call.
-
由 Ashwin Agrawal 提交于
These functions as inline functions were producing warnings, based on discussion https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/6fgKvN9QpV4/zjysjqIZAgAJ converting them to macro as upstream. Adding explicit type casting wherever needed now that DatumGetPointer() returns (char *) instead of (void *).
-
由 Ashwin Agrawal 提交于
-
- 07 9月, 2017 8 次提交
-
-
由 Heikki Linnakangas 提交于
It hasn't been implemented, but there is basic support in the grammar, just enough to detect the syntax and throw an error or ignore it. All the rest was dead code.
-
由 Heikki Linnakangas 提交于
* In ruleutils.c, the ereport() was broken. Use elog() instead, like in the upstream. (elog() is fine for "can't happen" kind of sanity checks) * Remove a few unused local variables. * Add a missing cast from Plan * to Node *.
-
由 Jesse Zhang 提交于
We will be less conservative and enable by default recursive CTE on master, while keeping recursive CTE hidden as we progress on developing the feature. This reverts the following two commits: * 280c577a "Set gp_recursive_cte_prototype GUC to true in test" * 4d5f8087 "Guard Recursive CTE behind a GUC" Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io> Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
-
由 Jemish Patel 提交于
Previously we were setting the value of `optimizer_join_arity_for_associativity_commutativity` to very large number and so ORCA would spend a very long time evaluating all possible n_way_join combinations to come up with the cheapest join tree to use in the plan. We are reducing this value to `7` as it does not prove to be beneficial to spend time and resources to evaluate any more than 7_way_joins in trying to find the cheapest join tree.
-
由 Jesse Zhang 提交于
Plus minor corrections in spelling and comments. Signed-off-by: NSam Dash <sdash@pivotal.io>
-
由 Kavinder Dhaliwal 提交于
While Recurisve CTE is still being developed it will be hidden from users by the guc gp_recursive_cte_prototype Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
-
由 Tom Lane 提交于
get_name_for_var_field didn't have enough context to interpret a reference to a CTE query's output. Fixing this requires separate hacks for the regular deparse case (pg_get_ruledef) and for the EXPLAIN case, since the available context information is quite different. It's pretty nearly parallel to the existing code for SUBQUERY RTEs, though. Also, add code to make sure we qualify a relation name that matches a CTE name; else the CTE will mistakenly capture the reference when reloading the rule. In passing, fix a pre-existing problem with get_name_for_var_field not working on variables in targetlists of SubqueryScan plan nodes. Although latent all along, this wasn't a problem until we made EXPLAIN VERBOSE try to print targetlists. To do this, refactor the deparse_context_for_plan API so that the special case for SubqueryScan is all on ruleutils.c's side. (cherry picked from commit 742fd06d)
-
由 Haisheng Yuan 提交于
Planner generates plan that doesn't insert any motion between WorkTableScan and its corresponding RecursiveUnion, because currently in GPDB motions are not rescannable. For example, a MPP plan for recursive CTE query may look like: ``` Gather Motion 3:1 -> Recursive Union -> Seq Scan on department Filter: name = 'A'::text -> Nested Loop Join Filter: d.parent_department = sd.id -> WorkTable Scan on subdepartment sd -> Materialize -> Broadcast Motion 3:3 -> Seq Scan on department d ``` For the current solution, the WorkTableScan is always put on the outer side of the top most Join (the recursive part of RecusiveUnion), so that we can safely rescan the inner child of join without worrying about the materialization of a potential underlying motion. This is a heuristic based plan, not a cost based plan. Ideally, the WorkTableScan can be placed on either side of the join with any depth, and the plan should be chosen based on the cost of the recursive plan and the number of recursions. But we will leave it for later work. Note: The hash join is temporarily disabled for plan generation of recursive part, because if the hash table spills, the batch file is going to be removed as it executes. We have a following story to enable spilled hash table to be rescannable. See discussion at gpdb-dev mailing list: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/s_SoXKlwd6I
-
- 06 9月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
If a prepared statement, or a cached plan for an SPI query e.g. from a PL/pgSQL function, contains stable functions, the stable functions were incorrectly evaluated only once at plan time, instead of on every execution of the plan. This happened to not be a problem in queries that contain any parameters, because in GPDB, they are re-planned on every invocation anyway, but non-parameter queries were broken. In the planner, before this commit, when simplifying expressions, we set the transform_stable_funcs flag to true for every query, and evaluated all stable functions at planning time. Change it to false, and also rename it back to 'estimate', as it's called in the upstream. That flag was changed back in 2010, in order to allow partition pruning to work with qual containing stable functions, like TO_DATE. I think back then, we always re-planned every query, so that was OK, but we do cache plans now. To avoid regressing to worse plans, change eval_const_expressions() so that it still does evaluate stable functions, even when the 'estimate' flag is off. But when it does so, mark the plan as "one-off", meaning that it must be re-planned on every execution. That gives the old, intended, behavior, that such plans are indeed re-planned, but it still allows plans that don't use stable functions to be cached. This seems to fix github issue #2661. Looking at the direct dispatch code in apply_motion(), I suspect there are more issues like this lurking there. There's a call to planner_make_plan_constant(), modifying the target list in place, and that happens during planning. But this at least fixes the non-direct dispatch cases, and is a necessary step for fixing any remaining issues. For some reason, the query now gets planned *twice* for every invocation. That's not ideal, but it was an existing issue for prepared statements with parameters, already. So let's deal with that separately.
-