- 08 9月, 2017 2 次提交
-
-
由 Ashwin Agrawal 提交于
-
This commit removes Query::hasModifyingCTE and ParseState::p_hasModifyingCTE because they are dead code. This change impacts reading and writing `pg_rewrite` rules, which is how views are implemented, and hence won't be backported to 5.0 or ealier. A `pg_upgrade` from 5 to 6 will still work because this change has no DDL surface.
-
- 07 9月, 2017 20 次提交
-
-
由 Heikki Linnakangas 提交于
Refactor the EDGE_IS_* macros so that they take the current WindowStatePerLevel as argument, rather than a WindowFrameEdge. This will make the transition to the upstream window function implementation a bit easier, because the upstream implementation doesn't have a WindowFrameEdge struct. Encapsulate access to level_state->is_rows in a macro. This is also in preparation for the upstream implementation, which will contain that information as a flag in the frameOptions bitmask. Many functions used an 'is_lead' argument, and one used 'is_trail', to keep track whether we're dealing with the leading or trailing edge. Introduce a little enum with EDGE_TRAIL and EDGE_LEAD values, to avoid having to remember what 'true' or 'false' means in which context.
-
由 Heikki Linnakangas 提交于
In principle, it makes sense to determine at plan-time, whether the expression needs to be re-evaluated for every row. In practice, it seems simpler to decide that in the executor, when initializing the Window node. This allows removing a bunch of code from the planner, and from the ORCA translator, including the hack to force the expression to be delayed if it was a SubLink. The planner always set the delayed flag, unless the expression was a Const. We can easily and quickly check for that in the executor too. I'm not sure how ORCA decided whether to delay or not, but in some quick testing I cannot come up with a case where it would decide differently.
-
由 Heikki Linnakangas 提交于
The big difference is that each leaf query is now transformed in one go, like it's done in the upstream, instead of transforming the target list and FROM list first. That partial transformation was causing trouble for another refactoring that I'm working on, which ill change the way window functions are handled in parse analysis. This two-pass code is GPDB-specific, PostgreSQL uses a simpler algorithm that works bottom-up, one setop node at a time, to select the column types.
-
由 Heikki Linnakangas 提交于
It hasn't been implemented, but there is basic support in the grammar, just enough to detect the syntax and throw an error or ignore it. All the rest was dead code.
-
由 Richard Guo 提交于
Verify the newval for GUC 'statement_mem' and 'max_resource_groups' only if they are actually being set. In the process of starting gpdb, one step is to check if all GUCs are valid with new values, but without actually setting them.
-
Currently ORCA does not support index scan on leaf partitions. It only supports index scan if we query the root table. This commit along with the corresponding ORCA changes adds a support for using indexes when leaf partitions are queried directly. When a root table that has indexes (either homogenous/complete or heterogenous/partial) is queried; the Relcache Translator sends index information to ORCA. This enables ORCA to generate an alternative plan with Dynamic Index Scan on all partitions (in case of homogenous index) or a plan with partial scan i.e. Dynamic Table Scan on leaf partitions that don’t have indexes + Dynamic Index Scan on leaf partitions with indexes (in case of heterogeneous index). This is a two step process in Relcache Translator as described below: Step 1 - Get list of all index oids `CTranslatorRelcacheToDXL::PdrgpmdidRelIndexes()` performs this step and it only retrieves indexes on root and regular tables; for leaf partitions it bails out. Now for root, list of index oids is nothing but index oids on its leaf partitions. For instance: ``` CREATE TABLE foo ( a int, b int, c int, d int) DISTRIBUTED by (a) PARTITION BY RANGE(b) (PARTITION p1 START (1) END (10) INCLUSIVE, PARTITION p2 START (11) END (20) INCLUSIVE); CREATE INDEX complete_c on foo USING btree (c); CREATE INDEX partial_d on foo_1_prt_p2 using btree(d); ``` The index list will look like = { complete_c_1_prt_p1, partial_d } For a complete index, the index oid of the first leaf partitions is retrieved. If there are partial indexes, all the partial index oids are retrieved. Step 2 - Construct Index Metadata object `CTranslatorRelcacheToDXL::Pmdindex()` performs this step. For each index oid retrieved in Step #1 above; construct an Index Metadata object (CMDIndexGPDB) to be stored in metadata cache such that ORCA can get all the information about the index. Along with all other information about the index, `CMDIndexGPDB` also contains a flag `fPartial` which denotes if the given index is homogenous (if yes, ORCA will apply it to all partitions selected by partition selector) or heterogenous (if yes, the index will be applied to only appropriate partitions). The process is as follows: ``` Foreach oid in index oid list : Get index relation (rel) If rel is a leaf partition : Get the root rel of the leaf partition Get all the indexes on the root (this will be same list as step #1) Determine if the current index oid is homogenous or heterogenous Construct CMDIndexGPDB based appropriately (with fPartial, part constraint, defaultlevels info) Else: Construct a normal CMDIndexGPDB object. ``` Now for leaf partitions, there is no notion of homogenous or heterogenous indexes since a leaf partition is like a regular table. Hence in `Pmdindex()` we should not got for checking if index is complete or not. Additionally, If a given index is homogenous or heterogenous needs to be decided from the perspective of relation we are querying(such as root or a leaf). Hence the right place of `fPartial` flag is in the relation metadata object (CMDRelationGPDB) and not the independent Index metadata object (CMDIndexGPDB). This commit makes following changes to support index scan on leaf partitions along with partial scans : Relcache Translator: In Step1, retrieve the index information on the leaf partition and create a list of CMDIndexInfo object which contain the index oid and `fPartial` flag. Step 1 is the place where we know what relation we are querying which enable us to determine whether or not the index is homogenous from the context of the relation. The relation metadata tag will look like following after this change: Before: ``` <dxl:Indexes> <dxl:Index Mdid="0.17159874.1.0"/> <dxl:Index Mdid="0.17159920.1.0"/> </dxl:Indexes> ``` After: ``` <dxl:IndexInfoList> <dxl:IndexInfo Mdid="0.17159874.1.0" IsPartial="true"/> <dxl:IndexInfo Mdid="0.17159920.1.0" IsPartial="false"/> </dxl:IndexInfoList> ``` A new class `CMDIndexInfo` has been created in ORCA which contains index mdid and `fPartial` flag. For external tables, normal tables and leaf partitions; the `fPartial` flag will always be false. Hence at the end, relcache translator will provide list of indexes defined on leaf partitions when they are queried directly with `fPartial` being false always. And when root table is queried; the `fPartial` will be set appropriately based on the completeness of the index. ORCA will refer to Relation Metadata for fPartial information and not to the indepedent Index Metadata Object. [Ref ##120303669]
-
由 Heikki Linnakangas 提交于
* In ruleutils.c, the ereport() was broken. Use elog() instead, like in the upstream. (elog() is fine for "can't happen" kind of sanity checks) * Remove a few unused local variables. * Add a missing cast from Plan * to Node *.
-
由 Jesse Zhang 提交于
We will be less conservative and enable by default recursive CTE on master, while keeping recursive CTE hidden as we progress on developing the feature. This reverts the following two commits: * 280c577a "Set gp_recursive_cte_prototype GUC to true in test" * 4d5f8087 "Guard Recursive CTE behind a GUC" Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io> Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
-
由 Heikki Linnakangas 提交于
In a stand-alone backend ("postgres --single"), you cannot realistically expect any of the infrastructure needed for MPP processing to be present. Let's force a stand-alone backend to run in utility mode, to make sure that we don't try to dispatch queries, participate in distributed transactions, or anything like that, in a stand-alone backend. Fixes github issue #3172, which was one such case where we tried to dispatch a SET command in single-user mode, and got all confused.
-
由 Jemish Patel 提交于
Previously we were setting the value of `optimizer_join_arity_for_associativity_commutativity` to very large number and so ORCA would spend a very long time evaluating all possible n_way_join combinations to come up with the cheapest join tree to use in the plan. We are reducing this value to `7` as it does not prove to be beneficial to spend time and resources to evaluate any more than 7_way_joins in trying to find the cheapest join tree.
-
由 Heikki Linnakangas 提交于
In commit 563c8c6b, I all but removed the WindowRef.winlevelsup field, but missed that checkExprHasWindFuncs() was relying on it in a subtle way. Even though winlevelsup was always 0 during parsing, checkExprHasWindFuncs() compared it against the "current" nesting level, when it recursed into subqueries. Before commit 563c8c6b, it could never actually return true within subqueries, because the node->winlevelsup == context_sublevels_up condition would never be true. But when I removed the winlevelsup field, I erroneosly removed that condition altogether, making it always return true in subqueries. To fix, don't recurse into subqueries in checkExprHasWindFuncs(). There's no point in recursing, if it can never return true in a subquery. It doesn't recurse in the upstream either. Add a test case for the failing case (reduced from TPC-DS query 70), to the 'bfv_olap' test. While we're at it, remove alternative expected output file for 'bfv_olap', because there was no meaningful (i.e not-ignored) difference between that and the main expected output file.
-
由 Jesse Zhang 提交于
Plus minor corrections in spelling and comments. Signed-off-by: NSam Dash <sdash@pivotal.io>
-
由 Kavinder Dhaliwal 提交于
While Recurisve CTE is still being developed it will be hidden from users by the guc gp_recursive_cte_prototype Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
-
由 Tom Lane 提交于
the "cteParam" as a proxy for the possibility that the underlying CTE plan depends on outer-level variables or Params, but that doesn't work very well because it sometimes causes calling subqueries to be treated as SubPlans when they could be InitPlans. This is inefficient and also causes the outright failure exhibited in bug #4902. Instead, leave the cteParam out of it and copy the underlying CTE plan's extParams directly. Per bug #4902 from Marko Tiikkaja. (cherry picked from commit 9298d2ff)
-
由 Tom Lane 提交于
get_name_for_var_field didn't have enough context to interpret a reference to a CTE query's output. Fixing this requires separate hacks for the regular deparse case (pg_get_ruledef) and for the EXPLAIN case, since the available context information is quite different. It's pretty nearly parallel to the existing code for SUBQUERY RTEs, though. Also, add code to make sure we qualify a relation name that matches a CTE name; else the CTE will mistakenly capture the reference when reloading the rule. In passing, fix a pre-existing problem with get_name_for_var_field not working on variables in targetlists of SubqueryScan plan nodes. Although latent all along, this wasn't a problem until we made EXPLAIN VERBOSE try to print targetlists. To do this, refactor the deparse_context_for_plan API so that the special case for SubqueryScan is all on ruleutils.c's side. (cherry picked from commit 742fd06d)
-
由 foyzur 提交于
To support RecursiveCTE we need to be able to ReScan a HashJoin as many times as the recursion depth. The HashJoin was previously ReScannable only if it has one memory-resident batch. Now, we support ReScannability for more than one batch. The approach that we took is to keep the inner batch files around for more than the duration of a single iteration of join if we detect that we need to reuse the batch files for rescanning. This can also improve the performance of the subplan as we no longer need to materialize and rebuild the hash table. Rather, we can just reload the batches from their corresponding batch files. To accomplish reloading of inner batch files, we keep the inner batch files around even if the outer is joined as we wait for the reuse in subsequent rescan (if rescannability is desired). The corresponding mail thread is here: https://groups.google.com/a/greenplum.org/forum/#!searchin/gpdb-dev/Rescannability$20of$20HashJoin%7Csort:relevance/gpdb-dev/E5kYU0FwJLg/Cqcxx0fOCQAJ Contributed by Haisheng Yuan, Kavinder Dhaliwal and Foyzur Rahman
-
由 Kavinder Dhaliwal 提交于
Currently Recursive CTE's do not support the following operations in the recursive term: - Group By - Window Functions - Subqueries with a self-reference - Distinct This commit produces an error in the parsing stage whenever any of the above is found in the recursive term of a CTE definition
-
由 Kavinder Dhaliwal 提交于
non-recursive term. Per an example from Dickson S. Guedes.
-
由 Kavinder Dhaliwal 提交于
This commit ensures that if there is ever a self reference to a recursive cte within a set operation in the recursive term an error will be produced For example WITH RECURSIVE x(n) AS ( SELECT 1 UNION ALL SELECT n+1 FROM (SELECT * FROM x UNION SELECT * FROM z)foo) SELECT * FROM x; Will produce an error, while WITH RECURSIVE x(n) AS ( SELECT 1 UNION ALL SELECT n+1 FROM (SELECT * from z UNION SELECT * FROM u)foo, x where foo.x = x.n) SELECT * FROM x; Will not because the set operation does not have a self reference to its cte.
-
由 Haisheng Yuan 提交于
Planner generates plan that doesn't insert any motion between WorkTableScan and its corresponding RecursiveUnion, because currently in GPDB motions are not rescannable. For example, a MPP plan for recursive CTE query may look like: ``` Gather Motion 3:1 -> Recursive Union -> Seq Scan on department Filter: name = 'A'::text -> Nested Loop Join Filter: d.parent_department = sd.id -> WorkTable Scan on subdepartment sd -> Materialize -> Broadcast Motion 3:3 -> Seq Scan on department d ``` For the current solution, the WorkTableScan is always put on the outer side of the top most Join (the recursive part of RecusiveUnion), so that we can safely rescan the inner child of join without worrying about the materialization of a potential underlying motion. This is a heuristic based plan, not a cost based plan. Ideally, the WorkTableScan can be placed on either side of the join with any depth, and the plan should be chosen based on the cost of the recursive plan and the number of recursions. But we will leave it for later work. Note: The hash join is temporarily disabled for plan generation of recursive part, because if the hash table spills, the batch file is going to be removed as it executes. We have a following story to enable spilled hash table to be rescannable. See discussion at gpdb-dev mailing list: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/s_SoXKlwd6I
-
- 06 9月, 2017 6 次提交
-
-
由 Heikki Linnakangas 提交于
If a prepared statement, or a cached plan for an SPI query e.g. from a PL/pgSQL function, contains stable functions, the stable functions were incorrectly evaluated only once at plan time, instead of on every execution of the plan. This happened to not be a problem in queries that contain any parameters, because in GPDB, they are re-planned on every invocation anyway, but non-parameter queries were broken. In the planner, before this commit, when simplifying expressions, we set the transform_stable_funcs flag to true for every query, and evaluated all stable functions at planning time. Change it to false, and also rename it back to 'estimate', as it's called in the upstream. That flag was changed back in 2010, in order to allow partition pruning to work with qual containing stable functions, like TO_DATE. I think back then, we always re-planned every query, so that was OK, but we do cache plans now. To avoid regressing to worse plans, change eval_const_expressions() so that it still does evaluate stable functions, even when the 'estimate' flag is off. But when it does so, mark the plan as "one-off", meaning that it must be re-planned on every execution. That gives the old, intended, behavior, that such plans are indeed re-planned, but it still allows plans that don't use stable functions to be cached. This seems to fix github issue #2661. Looking at the direct dispatch code in apply_motion(), I suspect there are more issues like this lurking there. There's a call to planner_make_plan_constant(), modifying the target list in place, and that happens during planning. But this at least fixes the non-direct dispatch cases, and is a necessary step for fixing any remaining issues. For some reason, the query now gets planned *twice* for every invocation. That's not ideal, but it was an existing issue for prepared statements with parameters, already. So let's deal with that separately.
-
由 Heikki Linnakangas 提交于
CdbDispatchPlan() was making a copy of the plan tree, in the same memory context as the old plan tree was in. If the plan came from the plan cache, the copy will also be stored in the CachedPlan context. That means that every execution of the cached plan will leak a copy of the plan tree in the long-lived memory context. Commit 8b693868 fixed this for cached plans being used directly with the extended query protocol, but it did not fix the same issue with plans being cached as part of a user-defined function. To fix this properly, revert the changes to exec_bind_message, and instead in CdbDispatchPlan, make the copy of the plan tree in a short-lived memory context. Aside from the memory leak, it was never a good idea to change the original PlannedStmt's planTree pointer to point to the modified copy of the plan tree. That copy has had all the parameters replaced with their current values, but on the next execution, we should do that replacement again. I think that happened to not be an issue, because we had code elsewhere that forced re-planning of all queries anyway. Or maybe it was in fact broken. But in any case, stop scribbling on the original PlannedStmt, which might live in the plan cache, and make a temporary copy that we can freely scribble on in CdbDispatchPlan, that's only used for the dispatch.
-
由 Heikki Linnakangas 提交于
They're not really per-portal settings, so it doesn't make much sense to pass them to PortalStart. And most of the callers were passing savedSeqServerHost/Port anyway. Instead, set the "current" host and port in postgres.c, when we receive them from the QD.
-
由 Heikki Linnakangas 提交于
We don't care about old versions of dtrace anymore. Revert the code to the way it's in the upstream, to reduce our diff footprint.
-
由 Heikki Linnakangas 提交于
That seems like a very random place to do it (sorry for the pun). The random seed is initialized at backend startup anyway, that ought to be good enough, so just remove the spurious initialization from bfz.c. In the passing, improve the debug-message to mention which compression algorithm was used.
-
由 Heikki Linnakangas 提交于
I guess once upon a time this was needed to get better error messages, with error positions, but we rely on the 'location' fields in the parse nodes nowadays. Removing this doesn't affect any of the error messages memorized in the regression tests, so it's not needed anymore.
-
- 05 9月, 2017 5 次提交
-
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
-
由 Ning Yu 提交于
* Simplify tuple serialization in Motion nodes. There is a fast-path for tuples that contain no toasted attributes, which writes the raw tuple almost as is. However, the slow path is significantly more complicated, calling each attribute's binary send/receive functions (although there's a fast-path for a few built-in datatypes). I don't see any need for calling I/O functions here. We can just write the raw Datum on the wire. If that works for tuples with no toasted attributes, it should work for all tuples, if we just detoast any toasted attributes first. This makes the code a lot simpler, and also fixes a bug with data types that don't have a binary send/receive routines. We used to call the regular (text) I/O functions in that case, but didn't handle the resulting cstring correctly. Diagnosis and test case by Foyzur Rahman. Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io> Signed-off-by: NNing Yu <nyu@pivotal.io>
-
由 Heikki Linnakangas 提交于
These are just pro forma, as the location field isn't used for anything after parse analysis, but let's be tidy.
-
由 Heikki Linnakangas 提交于
Only a top-level transaction can have a distributed transaction ID, so this seems more logical.
-
- 04 9月, 2017 7 次提交
-
-
由 Xiaoran Wang 提交于
There are same codes computing target segment in both function CopyFrom and CopyFromDispatch. Extract the codes into separate functions. Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>
-
由 Heikki Linnakangas 提交于
Planner and ORCA translator both implemented the same logic, to assign external table URIs to segments. But I spotted one case where the logic differed: CREATE EXTERNAL TABLE exttab_with_on_master( i int, j text ) LOCATION ('file://@hostname@@abs_srcdir@/data/exttab_few_errors.data') ON MASTER FORMAT 'TEXT' (DELIMITER '|'); SELECT * FROM exttab_with_on_master; ERROR: 'ON MASTER' is not supported by this protocol yet. With ORCA you got a less user-friendly error: set optimizer=on; set optimizer_enable_master_only_queries = on; postgres=# explain SELECT * FROM exttab_with_on_master; ERROR: External scan error: Could not assign a segment database for external file (CTranslatorDXLToPlStmt.cpp:472) The immediate cause of that was that commit fcf82234 didn't remember to modify the ORCA translator's copy of the same logic. But really, it's silly and error-prone to duplicate the code, so modify ORCA to use the same code that the planner does.
-
由 Heikki Linnakangas 提交于
This backports the new FUNCDETAIL_WINDOWFUNC return code from PostgreSQL 8.4, and refactors the code to match upstream, as much as feasible. A few error scenarios now give better error messages.
-
由 Daniel Gustafsson 提交于
-
由 Heikki Linnakangas 提交于
Simpler that way.
-
由 Heikki Linnakangas 提交于
We don't need two different functions to check whether an expression contains a window function. Replace both with the variant used in the upstream, contain_window_function().
-
由 Heikki Linnakangas 提交于
This allows having error positions for more syntax errors, and reduces the diff footprint of our window functions implementation against the one in PostgreSQL 8.4.
-