- 24 9月, 2019 2 次提交
-
-
由 Heikki Linnakangas 提交于
Printing the slice information makes sense for Init Plans, which are dispatched separately, before the main query. But not so much for other Sub Plans, which are just part of the plan tree; there is no dispatching or motion involved at such SubPlans. The SubPlan might *contain* Motions, but we print the slice information for those Motions separately. The slice information was always just the same as the parent node's, which adds no information, and can be misleading if it makes the reader think that there is inter-node communication involved in such SubPlans.
-
由 Jimmy Yih 提交于
When gp_use_legacy_hashops GUC was set, CTAS would not assign the legacy hash class operator to the new table. This is because CTAS goes through a different code path and uses the first operator class of the SELECT's result when no distribution key is provided.
-
- 23 9月, 2019 2 次提交
-
-
由 Zhenghua Lyu 提交于
In Greenplum, when estimating costs, most of the time we are in a global view, but sometimes we should shift to a local view. Postgres does not suffer from this issue because everything is in one single segment. The function `estimate_hash_bucketsize` is from postgres and it plays a very important role in the cost model of hash join. It should output a result based on locally view. However, the input parameters like, rows in a table, and ndistinct of the relation, are all taken from a global view (from all segments). So, we have to do some compensation for it. The logic is: 1. for broadcast-like locus, the global ndistinct is the same as the local one, we do the compensation by `ndistinct*=numsegments`. 2. for the case that hash key collcated with locus, on each segment, there are `ndistinct/numsegments` distinct groups, so no need to do the compensation. 3. otherwise, the locus has to be partitioned and not collocated with hash keys, for these cases, we first estimate the local distinct group number, and then do do the compensation. Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
-
由 Heikki Linnakangas 提交于
Fixes github issue https://github.com/greenplum-db/gpdb/issues/8621
-
- 21 9月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
I've been wondering for some time why we have disabled constructing Init Plans in queries that are planned in QEs, like in SPI queries that run in user-defined functions. So I removed the diff vs upstream in build_subplan() to see what happens. It turns out it was because we always ran the ExtractParamsFromInitPlans() function in QEs, to get the InitPlan values that the QD sent with the plan, even for queries that were not dispatched from the QD but planned locally. Fix the call in InitPlan to only call ExtractParamsFromInitPlans() for queries that were actually dispatched from the QD, and allow QE-local queries to build Init Plans. Include a new test case, for clarity, even though there were some existing ones that incidentally covered this case.
-
- 20 9月, 2019 1 次提交
-
-
由 Sambitesh Dash 提交于
- The corresponding ORCA PR is : https://github.com/greenplum-db/gporca/pull/533 - Change GUC value OPTIMIZER_UNEXPECTED_FAIL so that we log only unexpected failures. Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io> Co-authored-by: NSambitesh Dash <sdash@pivotal.io>
-
- 19 9月, 2019 3 次提交
-
-
由 Gang Xiong 提交于
-
由 xiong-gang 提交于
There was a hang like this: when one QE errors out before 'SetupInterconnect', QD will keep waiting for the incoming connections to be established and doesn't check the error message from dispatcher. Other QEs are finished and hang in function 'waitOnOutbound'. Co-authored-by: NAsim R P <apraveen@pivotal.io> Co-authored-by: NGang Xiong <gxiong@pivotal.io> Co-authored-by: NNing Yu <nyu@pivotal.io> (cherry picked from commit b2101122)
-
由 Shreedhar Hardikar 提交于
-
- 17 9月, 2019 1 次提交
-
-
由 Weinan WANG 提交于
* Single set command failed, rollback guc value In gpdb, guc set flow is that: 1. QD set guc 2. QD dispatch job to all QE 3. QE set guc For single set command, in gpdb, it is not 2pc safe. If the set command failed in QE, only QD guc value can rollback. For the guc which has GUC_GPDB_NEED_SYNC flag, it requires guc value is same in the whole session-level. To deal with it, record rollback guc in AbortTransaction to a restore guc list. Re-set these guc when next query coming. However, if it failed again, destroy all QE, since we can not have the same value in that session. Hopefully, guc value can synchronize success in later creategang stage using '-c' command.
-
- 12 9月, 2019 1 次提交
-
-
由 Jinbao Chen 提交于
There are some problems on the old multi-stage agg cost model: 1. We use global group number and global workmemmory to estimate the number of spilled tuple. But the situation between the first stage agg and the second stage agg is completely different. Tuples are randomly distributed on the group key on first stage agg. The number of groups on each segment is almost equal to the number of global groups. But in second stage agg, Distribution key is a subset of group key, so the number of groups on each segment is equal to (number of global groups / segment number). So the lld code can cause huge cost deviation. 2. Using ((group number + input rows) / 2) as spilled tupule is 3. Using global group number * 1.3 as the output rows of streaming agg node is very wrong. The out put row of streaming agg node should be group number * segment number * param. too rough. 4. We use numGroups to estimate the initial size of the hash table in exec node. But numGroups is global group number. So we made the following changes: 1. Use a funtion 'groupNumberPerSegemnt' to estimate the group number per segment on first stage agg. Use numGroups/segment number as the group number per segment on second stage agg. 2. Use funtion 'spilledGroupNumber' to estimate spilled tuple number. 3. Use spilled tuple number * segment number as output tuple number of streaming agg node. 4. Use numGroups as group number per segment. Also, we have information on the number of tuples in top N groups. So we can predict the maximum number of tuples in the biggest segment when the skew occurs. When we can predict skew, enable the 1 phase agg. Co-authored-by: NZhenghua Lyu <kainwen@gmail.com>
-
- 10 9月, 2019 1 次提交
-
-
由 Ashwin Agrawal 提交于
If "./pg_regress --init-file init_file uao_ddl/compresstype_row" is executed, it fails. As extra argument `--ao-dir=uao` needs to be specified to pg_regress, to convey, to convert these .source files to row and column .sql/.out files and not use regular standard logic on them. This approach has following shortfalls: - additional developer overhead to remember to add that option - without that option .sql/.out files are still generated but are not usable / incorrect - row and column test files directories will always need same prefix - the check for ao-dir prefix was checked for each and every file in input/output directories, which is unnecessary To improve the situation, modifying the logic in pg_regress. No more need extra argument to pg_regress, instead, presence of special file named "GENERATE_ROW_AND_COLUMN_FILES" conveys to pg_regress to apply special conversion rule on that directory. This way always correct conversion rule is applied and developer don't need to specify any extra option.
-
- 30 8月, 2019 2 次提交
-
-
由 Jesse Zhang 提交于
This must have been a typo committed in commit 6b79a578 . The original commit missed a trailing space in the output file after a newly added comment (which causes a regress diff when optimizer=off) I was going to fix it by adding the trailing space back, but on a second thought, I decided to gut the comment from all 3 files (one .sql and two .out files) to prevent future self-loathing. While we were at it, also remove the ever-expanding list of DROP statements that are ignored from the end of gporca.out. Honestly the `DROP` itself kinda offends me, but that's for another day. Let's get the build green for now.
-
由 Bhuvnesh Chaudhary 提交于
While processing constraint interval, also consider predicate of type <cast(ident)> array cmp else we lose the opportunity to generate implied quals. This is corresponding to the ORCA changes.
-
- 29 8月, 2019 2 次提交
-
-
由 Ashwin Agrawal 提交于
- arrange them alphabetically - separate upsteam from GPDB specific ones - add few missing ones - also reflect to ignore only in current directory or sub-directories
-
由 Chris Hajas 提交于
Consider the query below: ``` test=# explain select * from foo, jazz where foo.c = jazz.e and jazz.f = 10 and a in (select b+1 from bar); QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice4; segments: 3) (cost=0.00..1324469.30 rows=1 width=20) -> Hash Join (cost=0.00..1324469.30 rows=1 width=20) Hash Cond: foo.c = jazz.e -> Dynamic Table Scan on foo (dynamic scan id: 1) (cost=0.00..1324038.29 rows=34 width=12) Filter: a = (b + 1) AND ((subplan)) SubPlan 1 -> Materialize (cost=0.00..431.00 rows=1 width=4) -> Broadcast Motion 1:3 (slice2) (cost=0.00..431.00 rows=3 width=4) -> Limit (cost=0.00..431.00 rows=1 width=4) -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=4) -> Limit (cost=0.00..431.00 rows=1 width=4) -> Table Scan on bar (cost=0.00..431.00 rows=34 width=4) -> Hash (cost=100.00..100.00 rows=34 width=4) -> Partition Selector for foo (dynamic scan id: 1) (cost=10.00..100.00 rows=34 width=4) -> Broadcast Motion 3:3 (slice3; segments: 3) (cost=0.00..431.00 rows=1 width=8) -> Table Scan on jazz (cost=0.00..431.00 rows=1 width=8) Filter: f = 10 Optimizer status: PQO version 3.65.0 (18 rows) ``` Previously, since the subplan was in a qual, we did not populate the qual properly when executing a dynamic table scan node. Thus the subplan attribute in PlanState of the dynamic table scan was incorrectly set to NULL, causing a later crash. We now populate this similarly to how we do it for dynamic index/bitmap scans. Co-authored-by: NSambitesh Dash <sdash@pivotal.io> Co-authored-by: NChris Hajas <chajas@pivotal.io> Co-authored-by: NAshuka Xue <axue@pivotal.io>
-
- 28 8月, 2019 1 次提交
-
-
由 Richard Guo 提交于
For GPDB, to use the general plan tree walker/mutator, structure plan_tree_base_prefix needs to be prefixed with in the context structure and initialized appropriately. It is needed by function plan_tree_walker to recur into the subplan in case of visiting a SubPlan node. However, context structure SubqueryScanWalkerContext fails to do that. So when we are trying to recurse through a plan tree to find a subquery scan node, if there is a subplan there, we will crash. This patch fixes that. Also it fixes github issue #8342 as well as the panic in github issue #7279.
-
- 24 8月, 2019 2 次提交
-
-
由 Ashwin Agrawal 提交于
-
由 Ashwin Agrawal 提交于
Previously there existed bug in bitmap index creation, where AO tuples with row numbers which mapped to heap TID with zero offset, fetched incorrect tuple via bitmap indexes. This resulted in wrong result, hence adding test to validate that scenario. Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/p8oTYMbHFaI/ZEDxFihlDgAJ
-
- 23 8月, 2019 1 次提交
-
-
由 Shreedhar Hardikar 提交于
ORCA will now choose between one-stage and two-stage agg based on cost. Note: A number of ICG tests needed updates. In most cases, the plans changed from two-stage to one-stage agg. Some of the tests, e.g workfile/hashagg_spill, are testing some other functionality or otherwise would benefit from forced two-stage aggs. I have added set/reset commands in such cases, to keep the testing unchanged.
-
- 22 8月, 2019 3 次提交
-
-
由 Daniel Gustafsson 提交于
This removes unnecessary table distribution clauses (as they match the default), and moves Greenplum specific tests to gp_constraints to further reduce differences with upstream. When hacking on this, it became clear that the gp_constraints suite wasn't connected to a test schedule, so this activates the site as part of the greenplum_schedule as well. Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io> Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
-
由 Andrew Gierth 提交于
Previously, parseCheckAggregates was run before assign_query_collations, but this causes problems if any expression has already had a collation assigned by some transform function (e.g. transformCaseExpr) before parseCheckAggregates runs. The differing collations would cause expressions not to be recognized as equal to the ones in the GROUP BY clause, leading to spurious errors about unaggregated column references. The result was that CASE expr WHEN val ... would fail when "expr" contained a GROUPING() expression or matched one of the group by expressions, and where collatable types were involved; whereas the supposedly identical CASE WHEN expr = val ... would succeed. Backpatch all the way; this appears to have been wrong ever since collations were introduced. Per report from Guillaume Lelarge, analysis and patch by me. Discussion: https://postgr.es/m/CAECtzeVSO_US8C2Khgfv54ZMUOBR4sWq+6_bLrETnWExHT=rFg@mail.gmail.com Discussion: https://postgr.es/m/87muo0k0c7.fsf@news-spur.riddles.org.uk (backported from commit 174fab99)
-
由 Adam Lee 提交于
If a query mixes window functions with aggregate functions or grouping, transformGroupedWindows() divides it to a new query and a sub-query, and replace some expressions with Var and set the varcollid with the replaced expressions' original collations. Because assign_query_collations() does not handle Var's collation, transformGroupedWindows() should be called after it with the assigned collations information. Otherwise the Var, which are replaced with, will have no collations. Fixes GitHub issue https://github.com/greenplum-db/gpdb/issues/8376
-
- 21 8月, 2019 1 次提交
-
-
由 Shreedhar Hardikar 提交于
ExecRescanMaterial uses Plan::allParam bitset to determine the param changes it should watch for when determining whether it needs to destroy the tuplestore and re-execute the subtree. ORCA sets allParam by extracting PARAMs from the plan tree using extract_nodes_walker(). However, the function logic not descend into sub-queries via SUBPLAN nodes, even if descendIntoSubqueries option is set to true (in case of ORCA, this is sometimes not even possible because the base node is not available during dxl to plstmt translation as it is done bottom-up). But, because of this, extract_nodes_walker() may miss PARAMs present in SubPlan::testexpr and SubPlan::args. This commit fixes the logic in extract_nodes_walker() to examine the SubPlan node, no matter if descendIntoSubqueries is set or unset. This fix was motivated by a wrong results bug in skip-level correlated subqueries (see test case in the commit). Because allParam was left empty, any Materialize in the intermediate subplan would retain it's contents even if there was an changed outerref, causing wrong results. For the purpose for retrieving PARAMs in such cases, it is not really necessary to descend into subqueries. It should be sufficient to look at the args list of the SUBPLAN, since any PARAM that would affect the materialized results in the intermediate subplan must be passed into the lowest subplan using SubPlan::args. In the example below, the PARAM may be an outer ref to either subtrees: subtree (1): It will be passed into the subplan via SubPlan 2 args and so will be captured in allParams. Thus, Materialize's results are discarded at every rescan of Subplan 2, as is expected. subtree (2): It will not capture the PARAM. The Materialize's results need not be discarded, since there is not relevant outer ref under it. -> Result Filter: x = (subplan 2) -> subtree (1) Subplan 2 -> Material -> Result Filter: x = (subplan 2) -> subtree (2) Subplan 1 -> PARAM used somewhere Co-authored-by: NChris Hajas <chajas@pivotal.io> Co-authored-by: NShreedhar Hardikar <shardikar@pivotal.io>
-
- 19 8月, 2019 1 次提交
-
-
由 Daniel Gustafsson 提交于
As roles are global objects, failure to clean up a created role will cause subsequent testruns to fail as the role already exists (only the database is dropped between testtruns of installcheck). Fix by dropping the role explicitly along with dependant objects. Longer term this should probably become a privileges_gp test to keep upstream tests unmodified as much as we can, but for now lets fix the immediate issue. Reviewed-by: NShaoqi Bai <sbai@pivotal.io>
-
- 16 8月, 2019 1 次提交
-
-
由 Sambitesh Dash 提交于
-
- 15 8月, 2019 1 次提交
-
-
由 Richard Guo 提交于
When generating plan for multi-DQA, we need to build a coplan on base of the shared scan for each distinct DQA, and then join the coplans back together. And while building the coplan for each DQA, the arrays for RelOptInfo and RangeTblEntry for the PlannerInfo would be rebuild and the original arrays would have been replaced. So, for the DQA other than the first one, we need to restore the arrays to what they are like for the original shared scan plan before building coplan for it. Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io> Reviewed-by: Npengzhout <ptang@pivotal.io>
-
- 14 8月, 2019 1 次提交
-
-
由 Richard Guo 提交于
For Result pathnode, its parent would be set to NULL. So we cannot reference its parent in cost_common_agg() when estimating cost for aggregation. Instead, we can use the rows in the Path and use 0 as the width. This patch fixes github issue #8357.
-
- 12 8月, 2019 4 次提交
-
-
由 Asim R P 提交于
The intention behind the warning was to guide the test writer to avoid flaky behavior due to FTS intervening in the middle of a PANIC to mark a segment as down. This is good, however, the way the warning is emitted is not correct. The underlying assumption in the warning implementation (see fbd0f091) is that InjectFault() function is executed by the same process that created the gp_inject_fault extension. This is not valid in the new libpq based fault injector because the fault is injected by a transient fault handler process, which is always different from the process that creates the extension when running regression tests. Evidently, there are several tests injecting 'panic' type fault already and their answer files don't have such warnings. The bitmap_index test modified by this commit was the only test still using old fault injector interface to inject a 'panic' fault. In the old fault injector, the above mentioned assumption was applicable and the warning was emitted. Now that we've moved to new implementation wholesale, the warning must be removed. We should go back to drawing board for emitting the FTS warning but that's a separate patch.
-
由 Asim R P 提交于
Remove NOTICE messages that follow a gp_inject_fault() select statement. Replace the boolean value with text 'Success:' returned by the new interfae. For reference, the following sed script was used to identify 't' following a ' gp_inject_fault ' line: /^ gp_inject_fault $/{ $!{ N :again1 N s/ t$/ Success:/ t again1 } }
-
由 Asim R P 提交于
These tests were invoking the new fault injector interface using its temporary name - gp_inject_fault2(). Now the new interface is the only one present in the repository, named as gp_inject_fault().
-
由 Weinan WANG 提交于
gpdb, we pre-execute init plan in ExecutorStart to reduce slice number. However, for some initplans, which require params from the same level node, should follow upstream execution flow. To recognize this initplan's pattern, record `extParam` when creating `subplan` object. Fix issue: #6953
-
- 05 8月, 2019 1 次提交
-
-
由 Weinan WANG 提交于
We have a kind of query that its InitPlan execute in ExecutorStart. If Explain Analyze this kind of query, as well as, some memory or disk information requires to collect in the main plan, the QD will crash. Since queryDesc->showstatctx->stats_gathered is assigned to true in ExecSetParamPlan function before ExecutorRun, So we only gather Initplan metrics and left other slices information over. Segment fault when some execution node print out metrics message. To fix this problem, variable stats_gathered only be assigned after slice 0 metrics information collection. reproduce process: create table t(a int); explain analyze select a from ( select a from t where a > ( select avg(a) from t ) ) as foo order by a limit 1 fix issue:#6951
-
- 03 8月, 2019 2 次提交
-
-
由 Adam Berlin 提交于
- before this change, the guc would setup a pPostCreate hook to be run after the creation of a partition table, but we never actually read from that hook setting, so it is dead code.
-
由 Chris Hajas 提交于
ICG changes for commit "Add 4 xforms for LOJ index apply on dynamic table scan" These transforms existed for non-partition tables, but not for partition tables. Authored-by: NChris Hajas <chajas@pivotal.io>
-
- 02 8月, 2019 3 次提交
-
-
由 Abhijit Subramanya 提交于
-
由 Bhuvnesh Chaudhary 提交于
This commit adds the GPDB side changes required to support GIN Indexes with ORCA. It also adds a new test file gp_gin_indexes to test plans produced for ORCA/planner. GIN indexes are not supported with index expression or predicate constraints. ORCA does not support it currently for other types of indexes too.
-
由 David Kimura 提交于
ALTER TABLE DROP COLUMN followed by reorganize leads to loss of column encoding settings of the dropped column. When the column's compresstype encoding is incorrect, we can encounter block version mismatch error later during block info validation check of the dropped column. One idea was to skip dropped columns when constructing AOCSScanDesc. However, dropping all columns is a special case that is not easily handled because it is not equivalent to deleted rows. Instead, the fix is to preserve column encoding settings even for dropped columns. Co-authored-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io> Co-authored-by: NIvan Leskin <leskin.in@arenadata.io>
-
- 31 7月, 2019 1 次提交
-
-
由 Adam Lee 提交于
AggStates are now pointers allocated in aggcontext with type INTERNAL, just spilling the pointers don't decrease the memory usage and have possible memory leak if combining states without free. This commit serialize the aggstates, write the real data into file and free the memory.
-
- 27 7月, 2019 1 次提交
-
-
由 Shreedhar Hardikar 提交于
Also bump version to 3.60.0.
-