- 18 1月, 2018 1 次提交
-
-
由 Jesse Zhang 提交于
This patch removes codegen wholesale from Greenplum. In addition to reverting the commits involving codegen, we also removed miscellaneous references to the feature and GUC. The following commits from 5.0.0 were reverted (topologically ordered): f38c9064 Support avg aggregate function in codegen 87dcae4c Capture and print error messages from llvm::verifyFunction 65137540 libgpcodegen assert() overrides use GPDB Assert() 81d378b4 GenerateExecVariableList calls regular slot_getattr when the code generation of the latter fails 05a28211 Update Google Test source path used by codegen 22a35fcc Call ereport when code generating float operators 79517271 Support overflow checks for doubles. b5373a1e Fix codegen unittest link problem 7a1a98c9 Print filename and lineno in codegened CreateElog 58eda293 Fix wrong virtual tuple check in codegened slot_getattr bc6faa08 Set llvm_isNull_ptr to false when the result of codegened expression evaluation is not null 8bbbd63f Enhance codegened advance_aggregates with support for null attributes e1fd6072 Abort code generation of expression evaluation trees with unsupported ExprState types 509460ee Support null attributes in codegen expression evaluation framework 739d978d Move enrollment of codegened ExecVariableList to ExecInitNode c05528d1 Fix CHECK_SYMBOL_DEFINED macro in CMakeLists.txt 12cfd7bd Support offset calculation when there is null in the tuple and codegen is enabled 40a2631e Use slot_getattr wrapper function for regular version.(#1194) 613e9fbb Revert "Fix codegen issue by moving slot_getattr to heaptuple.c similar to" ee03f799 Fix cpplint error on advance aggregate 2a65b0aa Fix slot function nullptr issue in expr eval. 3107fc0e Fix for !(expr1 & expr2) and hasnull logic in slot_getatrr codegen. c940c7f6 Fix codegen issue by moving slot_getattr to heaptuple.c similar to Postgres. c8125736 Introduce guc to enable/disable all generator. a4f39507 Ensure that codegen unittests pass with GCC 6.2.0 (#1177) 682d0b28 Allow overriding assert() functionality in libgpcodegen in DEBUG mode 3209258a Organize codegen source files and unit tests d2ba88a9 Fix codegen unittests using DatumCastGenerator 020b71a5 Generate code for COUNT aggregate function 0eeec886 Fixing codegen related bugs in InitializeSupportedFunction 41055352 Rewrite PGGenericFuncGenerator to support variable number of types e5318b6b Add AdvanceAggregatesCodegenEnroll mock function 87521715 Codegen advance_aggregates for SUM transition function 2043d95b Use string messages in codegened slot_getattr fallback block f160fa5a Add new GUC to control Codegen Optimization level. 697ffc1a Fix cpplint errors c5e3aed4 Support Virtual Tuples and Memtuples in SlotGetAttrCodegen 5996aaa7 Keep a reference to the CodegenManager in code generators 6833b3c2 Remove unused header and just include what you use in codegen ab1eda87 Allow setting codegen guc to ON only if code generation is supported by the build dcd40712 Use PGFuncGeneratorInfo to codegen pg functions 83869d1c Replace dynamic_cast with dyn_cast for llvm objects 23007017 Decide what to generate for ExecEvalExpr based on PlanState 387d8ce8 Add EXPLAIN CODEGEN to print generated IR to the client c4a9bd27 Introduce Datum to cpp cast, cpp type to Datum cast and normal cast.(#944) 66158dfd Record external function names for useful debugging adab9120 Support variable length attributes in SlotGetAttrCodegen. 335e09aa Proclaim that the codegen'ed version of generated functions are called in debug build 50fd9140 Fix cpplint errors 88a04412 Use ExprTreeGeneratorInfo for expression tree generation. 3b4af3bb Split code generation of ExecVariableList from slot_getattr 8cd9ed9f Support <= operator for date/timestamp data types and some minor refactors. e4dccf46 Implement InlineFunction to force inline at call site 71170942 Mock postmaster.o for codegen_framework_unittest.t 09f00581 Codegen expressions that contain plus, minus and mul operators on float8 data types d7fb2f6d Fix codegen unittests on Linux and various compiler warnings while building codegen. 45f2aa96 Fix test and update coding style This closes #874 1b26fbfc Enrolled targetlist for scan and aggregate ebd1d014 Enhance codegen framework to support arbitrary expr tree ec626ce6 Generate and enroll naive ExecEvalExpr in Scan's quals 1186001e Revert "Create naive code-generated version of ExecQual" 6f928a65 Replace RegisterExternalFunction with GetOrRegisterExternalFunction using an unordered_map in codegen_utils 6ae0085b Move ElogWrapper to GpCodegenUtils. d3f80b45 Add verifyfunction for generated llvm function. 7bcf094a Fix codegen compiler error with 8.3 merge aae0ad3d Create naive code-generated version of ExecQual dce266ad Minor code quality fixes in src/backend/codegen d281f340 Support null attributes in code generated ExecVariableList 82fd418e Address a number of cpplint errors in codegen_utils_unittest.cc 887aa48d Add check for CodegenUtils::GetType for bool arrays bb9b92c6 Enhance Base Codegen to do clean up when generation fails b9ef5e3f Fix build error for Annotated types a5cfefd9 Add support for array types in codegen_utils. 2b883384 Fix static_assert call 7b75d9ea This commit generates code for code path: ExecVariableList > slot_getattr > _slot_getsomeattrs > slot_deform_tuple. This code path is executed during scan in simple select queries that do not have a where clause (e.g., select bar from foo;). 6d0a06e8 Fix CodeGen typos and CodeGeneratorManagerCreate function signature in gpcodegen_mock.c 4916a606 Add support for registering vararg external functions with codegen utils. ae4a7754 Integrate codegen framework and make simple external call to slot deform tuple. This closes #649 ee5fb851 Renaming code_generator to codegen_utils and CodeGenerator to CodegenUtils. This closes #648 88e9baba Adding GPDB code generation utils Signed-off-by: NJesse Zhang <sbjesse@gmail.com> Signed-off-by: NMelanie Plageman <mplageman@pivotal.io> Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
-
- 17 1月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
The change in index.c is a behavioral change. The behavior on reindexing shared catalogs now matches upstream again. The rest is just removal of dead code.
-
- 09 12月, 2017 1 次提交
-
-
由 Jacob Champion 提交于
Upstream commit 43a57cf3, which significantly changes the API for the HashBitmap (TIDBitmap in Postgres), is about to hit in an upcoming merge. This patch is a joint effort by myself, Max Yang, Xiaoran Wang, Heikki Linnakangas, and Daniel Gustafsson to reduce our diff against upstream and support the incoming API changes with our GPDB-specific customizations. The primary goal of this patch is to support concurrent iterations over a single StreamBitmap or TIDBitmap. GPDB has made significant changes to allow either one of those bitmap types to be iterated over without the caller necessarily needing to know which is which, and we've kept that property here. Here is the general list of changes: - Cherry-pick the following commit from upstream: commit 43a57cf3 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Sat Jan 10 21:08:36 2009 +0000 Revise the TIDBitmap API to support multiple concurrent iterations over a bitmap. This is extracted from Greg Stark's posix_fadvise patch; it seems worth committing separately, since it's potentially useful independently of posix_fadvise. - Revert as much as possible of the TIDBitmap API to the upstream version, to avoid unnecessary merge hazards in the future. - Add a tbm_generic_ version of the API to differentiate upstream's TIDBitmap-only API from ours. Both StreamBitmap and TIDBitmap can be passed to this version of the API. - Update each section of code to use the new generic API. - Fix up some memory management issues in bitmap.c that are now exacerbated by our changes.
-
- 24 11月, 2017 6 次提交
-
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
This is similar to the old implementation, in that we use "+", "-" to compute the boundaries. Unfortunately it seems unlikely that this would be accepted in the upstream, but at least we have that feature back in GPDB now, the way it used to be. See discussion on pgsql-hackers about that: https://www.postgresql.org/message-id/26801.1265656635@sss.pgh.pa.us
-
由 Heikki Linnakangas 提交于
This is backport from PostgreSQL 9.4. It brings back functionality that we lost with the ripout & replace of the window function implementation. I left out all the code and tests related to COLLATE, because we don't have that feature. Will need to put that back when we merge collation support, in 9.1. commit 8d65da1f Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Mon Dec 23 16:11:35 2013 -0500 Support ordered-set (WITHIN GROUP) aggregates. This patch introduces generic support for ordered-set and hypothetical-set aggregate functions, as well as implementations of the instances defined in SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(), percent_rank(), cume_dist()). We also added mode() though it is not in the spec, as well as versions of percentile_cont() and percentile_disc() that can compute multiple percentile values in one pass over the data. Unlike the original submission, this patch puts full control of the sorting process in the hands of the aggregate's support functions. To allow the support functions to find out how they're supposed to sort, a new API function AggGetAggref() is added to nodeAgg.c. This allows retrieval of the aggregate call's Aggref node, which may have other uses beyond the immediate need. There is also support for ordered-set aggregates to install cleanup callback functions, so that they can be sure that infrastructure such as tuplesort objects gets cleaned up. In passing, make some fixes in the recently-added support for variadic aggregates, and make some editorial adjustments in the recent FILTER additions for aggregates. Also, simplify use of IsBinaryCoercible() by allowing it to succeed whenever the target type is ANY or ANYELEMENT. It was inconsistent that it dealt with other polymorphic target types but not these. Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing, and rather heavily editorialized upon by Tom Lane Also includes this fixup: commit cf63c641 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Mon Dec 23 20:24:07 2013 -0500 Fix portability issue in ordered-set patch. Overly compact coding in makeOrderedSetArgs() led to a platform dependency: if the compiler chose to execute the subexpressions in the wrong order, list_length() might get applied to an already-modified List, giving a value we didn't want. Per buildfarm.
-
由 Heikki Linnakangas 提交于
This loses the functionality, and leaves all the regression tests that used those functions failing. The plan is to later backport the upstream implementation of those functions from PostgreSQL 9.4. The feature is called "ordered set aggregates" there.
-
由 Heikki Linnakangas 提交于
This adds some limitations, and removes some functionality that tte old implementation had. These limitations will be lifted, and missing functionality will be added back, in subsequent commits: * You can no longer have variables in start/end offsets * RANGE is not implemented (except for UNBOUNDED) * If you have multiple window functions that require a different sort ordering, the planner is not smart about placing them in a way that minimizes the number of sorts. This also lifts some limitations that the GPDB implementation had: * LEAD/LAG offset can now be negative. In the qp_olap_windowerr, a lot of queries that used to throw an "ROWS parameter cannot be negative" error are now passing. That error was an artifact of the eay LEAD/LAG were implemented. Those queries contain window function calls like "LEAD(col1, col2 - col3)", and sometimes with suitable values in col2 and col3, the second argument went negative. That caused the error. implementation of LEAD/LAG is OK with a negative argument. * Aggregate functions with no prelimfn or invprelimfn are now supported as window functions * Window functions, e.g. rank(), no longer require an ORDER BY. (The output will vary from one invocation to another, though, because the order is then not well defined. This is more annoying on GPDB than on PostgreSQL, because in GDPB the row order tends to vary because the rows are spread out across the cluster and will arrive in the master in unpredictable order) * NTILE doesn't require the argument expression to be in PARTITION BY * A window function's arguments may contain references to an outer query. This changes the OIDs of the built-in window functions to match upstream. Unfortunately, the OIDs had been hard-coded in ORCA, so to work around that until those hard-coded values are fixed in ORCA, the ORCA translator code contains a hack to map the old OID to the new ones.
-
- 21 11月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
Much of the code and structs used by index scans and bitmap index scans had been fused together and refactored in GPDB, to share code between dynamic index scans and regular ones. However, it would be nice to keep upstream code unchanged as much as possible. To that end, refactor the exector code for dynamic index scans and dynamic bitmap index scans, to reduce the diff vs upstream. The Dynamic Index Scan executor node is now a thin wrapper around the regular Index Scan node, even thinner than before. When a new Dynamic Index Scan begins, we don't do much initialization at that point. When the scan begins, we initialize an Index Scan node for the first partition, and return rows from it until it's exhausted. On next call, the underlying Index Scan is destroyed, and a new Index Scan node is created, for the next partition, and so on. Creating and destroying the IndexScanState for every partition adds some overhead, but it's not significant compared to all the other overhead of opening and closing the relations, building scan keys etc. Similarly, a Dynamic Bitmap Index Scan executor node is just a thin wrapper for regular Bitmap Index Scan. When MultiExecDynamicBitmapIndexScan() is called, it initializes an BitmapIndexScanState for the current partition, and calls it. On ReScan, the BitmapIndexScan executor node for the old partiton is shut down. A Dynamic Bitmap Index Scan differs from Dynamic Index Scan in that a Dynamic Index Scan is responsible for iterating through all the active partitions, while a Dynamic Bitmap Index Scan works as a slave for the Dynamic Bitmap Heap Scan node above it. It'd be nice to do a similar refactoring for heap scans, but that's for another day.
-
- 04 11月, 2017 3 次提交
-
-
由 Heikki Linnakangas 提交于
Also move initialization of gpmon packet to single choke point at ExecInitNode(), and sending the packet at ExecReScan() and ExecRestrPos(). A few CheckSendPlanStateGpmonPkt() remain here and there, which I didn't dare to remove. Although I'm pretty sure we could just remove them as well and no-one would notice the difference.
-
由 Heikki Linnakangas 提交于
Except at the very top, one node's output is always another node's input, so it seems silly to have separate counters for rows in. The only place where "rows in" was used, was in gpperfmon's calculation of "rows skew". Change that calculation to use "rows out" instead. That's not exactly the same thing, but seems just as good for the purpose of measuring skew.
-
由 Heikki Linnakangas 提交于
setMotionStatsForGpmon() didn't actually do anything. It just set a bunch of local variables. And the structs were simply unused.
-
- 10 10月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
We have two implementations of tuplesort: the "regular" one inherited from upstream, in tuplesort.c, and a GPDB-specific tuplesort_mk.c. We had modified all the callers to check the gp_enable_mk_sort GUC, and deal with both of them. However, that makes merging with upstream difficult, and litters the code with the boilerplate to check the GUC and call one of the two implementations. Simplify the callers, by providing a single API that hides the two implementations from the rest of the system. The API is the tuplesort_* functions, as in upstream. This requires some preprocessor trickery, so that tuplesort.c can use the tuplesort_* function names as is, but in the rest of the codebase, calling tuplesort_*() will call a "switcheroo" function that decides which implementation to actually call. While this is more lines of code overall, it keeps all the ugliness confined in tuplesort.h, not littered throughout the codebase.
-
- 27 9月, 2017 1 次提交
-
-
由 Bhuvnesh Chaudhary 提交于
commit bd3dadda Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Fri Aug 22 00:16:04 2008 +0000 Arrange to convert EXISTS subqueries that are equivalent to hashable IN subqueries into the same thing you'd have gotten from IN (except always with unknownEqFalse = true, so as to get the proper semantics for an EXISTS). I believe this fixes the last case within CVS HEAD in which an EXISTS could give worse performance than an equivalent IN subquery. The tricky part of this is that if the upper query probes the EXISTS for only a few rows, the hashing implementation can actually be worse than the default, and therefore we need to make a cost-based decision about which way to use. But at the time when the planner generates plans for subqueries, it doesn't really know how many times the subquery will be executed. The least invasive solution seems to be to generate both plans and postpone the choice until execution. Therefore, in a query that has been optimized this way, EXPLAIN will show two subplans for the EXISTS, of which only one will actually get executed. There is a lot more that could be done based on this infrastructure: in particular it's interesting to consider switching to the hash plan if we start out using the non-hashed plan but find a lot more upper rows going by than we expected. I have therefore left some minor inefficiencies in place, such as initializing both subplans even though we will currently only use one. Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
- 26 9月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
We had a very simplistic implementation in parse-analysis already, which converted the FILTER WHERE clause into a CASE-WHEN expression. That did not work for non-strict aggregates, and didn't deparse back into a FILTER expression nicely, to name a few problems with it. Replace it with the PostgreSQL implementation. TODO: * ORCA support. It now falls back to the Postgres planner. * I disabled the three-stage DQA plan types if there are any FILTERs
-
- 25 9月, 2017 2 次提交
-
-
由 Heikki Linnakangas 提交于
It wasn't very useful. ORCA and Postgres both just stack WindowAgg nodes on top of each other, and no-one's been unhappy about that, so we might as well do that, too. This reduces the difference between GPDB and the upstream implementation, and will hopefully make it smoother to switch. Rename the Window Plan node type to WindowAgg, to match upstream, now that it is fairly close to the upstream version.
-
由 Heikki Linnakangas 提交于
To match upstream.
-
- 07 9月, 2017 1 次提交
-
-
由 foyzur 提交于
To support RecursiveCTE we need to be able to ReScan a HashJoin as many times as the recursion depth. The HashJoin was previously ReScannable only if it has one memory-resident batch. Now, we support ReScannability for more than one batch. The approach that we took is to keep the inner batch files around for more than the duration of a single iteration of join if we detect that we need to reuse the batch files for rescanning. This can also improve the performance of the subplan as we no longer need to materialize and rebuild the hash table. Rather, we can just reload the batches from their corresponding batch files. To accomplish reloading of inner batch files, we keep the inner batch files around even if the outer is joined as we wait for the reuse in subsequent rescan (if rescannability is desired). The corresponding mail thread is here: https://groups.google.com/a/greenplum.org/forum/#!searchin/gpdb-dev/Rescannability$20of$20HashJoin%7Csort:relevance/gpdb-dev/E5kYU0FwJLg/Cqcxx0fOCQAJ Contributed by Haisheng Yuan, Kavinder Dhaliwal and Foyzur Rahman
-
- 04 9月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
Most notably, move the definition of XmlExpr and friends to where they are in the upstream.
-
- 01 9月, 2017 1 次提交
-
-
由 Daniel Gustafsson 提交于
This bumps the copyright years to the appropriate years after not having been updated for some time. Also reformats existing code headers to match the upstream style to ensure consistency.
-
- 31 8月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
Plus other minor cleanup.
-
- 16 8月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
-
- 09 8月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
To match the upstream code.
-
- 15 7月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
* Remove PartOidExpr, it's not used in GPDB. The target lists of DML nodes that ORCA generates includes a column for the target partition OID. It can then be referenced by PartOidExprs. ORCA uses these to allow sorting the tuples by partition, before inserting them to the underlying table. That feature is used by HAWQ, where grouping tuples that go to the same output partition is cheaper. Since commit adfad608, which removed the gp_parquet_insert_sort GUC, we don't do that in GPDB, however. GPDB can hold multiple result relations open at the same time, so there is no performance benefit to grouping the tuples first (or at least not enough benefit to counterbalance the cost of a sort). So remove the now unused support for PartOidExpr in the executor. * Bump ORCA version to 2.37 Signed-off-by: NEkta Khanna <ekhanna@pivotal.io> * Removed acceptedLeaf Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
- 22 6月, 2017 1 次提交
-
-
由 foyzur 提交于
In GPDB the dispatcher dispatches the entire plan tree to each query executor (QX). Each QX deserializes the entire plan tree and starts execution from the root of the plan tree. This begins by calling InitPlan on the QueryDesc, which blindly calls ExecInitNode on the root of the plan. Unfortunately, this is wasteful, in terms of memory and CPU. Each QX is in charge of a single slice. There can be many slices. Looking into plan nodes that belong to other slices, and initializing (e.g., creating PlanState for such nodes) is clearly wasteful. For large plans, particularly planner plans, in the presence of partitions, this can add up to a significant waste. This PR proposes a fix to solve this problem. The idea is to find the local root for each slice and start ExecInitNode there. There are few special cases: SubPlans are special, as they appear as expression but the expression holds the root of the sub plan tree. All the subplans are bundled in the plannedstmt->subplans, but confusingly as Plan pointers (i.e., we save the root of the SubPlan expression's Plan tree). Therefore, to find the relevant sub plans, we need to first find the relevant expressions and extract their roots and then iterate the plannedstmt->subplans, but only ExecInitNode on the ones that we can reach from some expressions in current slice. InitPlan are no better as they can appear anywhere in the Plan tree. Walking from a local motion is not sufficient to find these InitPlan. Therefore, we need to walk from the root of the plan tree and identify all the SubPlan. Note: unlike regular subplan, the initplan may not appear in the expression as subplan; rather it will appear as a parameter generator in some other parts of the tree. We need to find these InitPlan and obtain the SubPlan for each InitPlan. We can then use the SubPlan's setParam to copy precomputed parameter values from estate->es_param_list_info to estate->es_param_exec_vals We also found that the origSliceIdInPlan is highly unreliable and cannot be used as an indicator of a plan node's slice information. Therefore, we precompute each plan node's slice information to correctly determine if a Plan node is alien or not. This makes alien node identification more accurate. In successive PRs, we plan to use the alien memory account balance as a test to see if we successfully eliminated all aliens. We will also use the alien account balance to determine memory savings.
-
- 17 6月, 2017 2 次提交
-
-
由 Kavinder Dhaliwal 提交于
There is an assert failure when Window's child is a HashJoin operator and while filling its buffer Window receives a NULL tuple. In this case HashJoin will call ExecEagerFreeHashJoin() since it is done returning any tuples. However, Window once it has returned all the tuples in its input buffer will call ExecProcNode() on HashJoin. This casues an assert failure in HashJoin that states that ExecHashJoin() should not be called if HashJoin's hashtable has already been released. This commit fixes the above issue by setting a flag in WindowState when Window encounters a null tuple while filling its buffer. This flag then guards any subsequent call to ExecProcNode() from fetchCurrentRow()
-
This brought in postgres/postgres@44d5be0 pretty much wholesale, except: 1. We leave `WITH RECURSIVE` for a later commit. The code is brought in, but kept dormant by us bailing early at the parser whenever there is a recursive CTE. 2. We use `ShareInputScan` in the stead of `CteScan`. ShareInputScan is basically the parallel-capable `CteScan`. (See `set_cte_pathlist` and `create_ctescan_plan`) 3. Consequently we do not put the sub-plan for the CTE in a pseudo-initplan: it is directly present in the main plan tree instead, hence we disable `SS_process_ctes` inside `subquery_planner` 4. Another corollary is that all new operators (`CteScan`, `RecursiveUnion`, and `WorkTableScan`) are dead code right now. But they will come to live once we bring in parallel implementation of `WITH RECURSIVE` In general this commit reduces the divergence between Greenplum and upstream. User visible changes: The merge in parser enables a corner case previously treated as error: you can now specify fewer columns in your `WITH` clause than the actual projected columns in the body subquery of the `WITH`. Original commit message: > Implement SQL-standard WITH clauses, including WITH RECURSIVE. > > There are some unimplemented aspects: recursive queries must use UNION ALL > (should allow UNION too), and we don't have SEARCH or CYCLE clauses. > These might or might not get done for 8.4, but even without them it's a > pretty useful feature. > > There are also a couple of small loose ends and definitional quibbles, > which I'll send a memo about to pgsql-hackers shortly. But let's land > the patch now so we can get on with other development. > > Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane > (cherry picked from commit 44d5be0e)
-
- 07 6月, 2017 1 次提交
-
-
由 Melanie Plageman 提交于
- Remove iteration specific members of qexec packet - Remove iterators_history table - Remove measures used to populate iterators_history - Remove iterator_aggregate flag Signed-off-by: NNadeem Ghani <nghani@pivotal.io> Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>
-
- 13 4月, 2017 1 次提交
-
-
The variable `outernotreferencedbyinner` is always false and hence `nl_QuitIfEmptyInner` will never set to be true.
-
- 11 4月, 2017 1 次提交
-
-
由 Shreedhar Hardikar 提交于
The commit contains a number of minor refactors and fixes. - Fix indentation and spelling - Remove unused variable (pass) - Reset bloom filter after spilling HashAgg - Remove dead code in ExecAgg - Move to AggStatus to AggState as HashAggStatus because state change algorithm is implemented in nodeAgg
-
- 01 4月, 2017 2 次提交
-
-
由 Heikki Linnakangas 提交于
The old mechanism was to scan the complete plan, searching for a pattern with a Join, where the outer side included an Append node. The inner side was duplicated into an InitPlan, with the pg_partition_oid aggregate to collect the Oids of all the partitions that can match. That was inefficient and broken: if the duplicated plan was volatile, you might choose wrong partitions. And scanning the inner side twice can obviously be slow, if there are a lot of tuples. Rewrite the way such plans are generated. Instead of using an InitPlan, inject a PartitionSelector node into the inner side of the join. Fixes github issues #2100 and #2116.
-
由 foyzur 提交于
GPDB supports range and list partitions. Range partitions are represented as a set of rules. Each rule defines the boundaries of a part. E.g., a rule might say that a part contains all values between (0, 5], where left bound is 0 exclusive, but the right bound is 5, inclusive. List partitions are defined by a list of values that the part will contain. ORCA uses the above rule definition to generate expressions that determine which partitions need to be scanned. These expressions are of the following types: 1. Equality predicate as in PartitionSelectorState->levelEqExpressions: If we have a simple equality on partitioning key (e.g., part_key = 1). 2. General predicate as in PartitionSelectorState->levelExpressions: If we need more complex composition, including non-equality such as part_key > 1. Note: We also have residual predicate, which the optimizer currently doesn't use. We are planning to remove this dead code soon. Prior to this PR, ORCA was treating both range and list partitions as range partitions. This meant that each list part will be converted to a set of list values and each of these values will become a single point range partition. E.g., consider the DDL: ```sql CREATE TABLE DATE_PARTS (id int, year int, month int, day int, region text) DISTRIBUTED BY (id) PARTITION BY RANGE (year) SUBPARTITION BY LIST (month) SUBPARTITION TEMPLATE ( SUBPARTITION Q1 VALUES (1, 2, 3), SUBPARTITION Q2 VALUES (4 ,5 ,6), SUBPARTITION Q3 VALUES (7, 8, 9), SUBPARTITION Q4 VALUES (10, 11, 12), DEFAULT SUBPARTITION other_months ) ( START (2002) END (2012) EVERY (1), DEFAULT PARTITION outlying_years ); ``` Here we partition the months as list partition using quarters. So, each of the list part will contain three months. Now consider a query on this table: ```sql select * from DATE_PARTS where month between 1 and 3; ``` Prior to this ORCA generated plan would consider each value of the Q1 as a separate range part with just one point range. I.e., we will have 3 virtual parts to evaluate for just one Q1: [1], [2], [3]. This approach is inefficient. The problem is further exacerbated when we have multi-level partitioning. Consider the list part of the above example. We have only 4 rules for 4 different quarters, but we will have 12 different virtual rule (aka constraints). For each such constraint, we will then evaluate the entire subtree of partitions. After this PR, we no longer decompose rules into constraints for list parts and then derive single point virtual range partitions based on those constraints. Rather, the new ORCA changes will use ScalarArrayOp to express selectivity on a list of values. So, the expression for the above SQL will look like 1 <= ANY {month_part} AND 3 >= ANY {month_part}, where month_part will be substituted at runtime with different list of values for each of quarterly partitions. We will end up evaluating that expressions 4 times with the following list of values: Q1: 1 <= ANY {1,2,3} AND 3 >= ANY {1,2,3} Q2: 1 <= ANY {4,5,6} AND 3 >= ANY {4,5,6} ... Compare this to the previous approach, where we will end up evaluating 12 different expressions, each time for a single point value: First constraint of Q1: 1 <= 1 AND 3 >= 1 Second constraint of Q1: 1 <= 2 AND 3 >= 2 Third constraint of Q1: 1 <= 3 AND 3 >= 3 First constraint of Q2: 1 <= 4 AND 3 >= 4 ... The ScalarArrayOp depends on a new type of expression PartListRuleExpr that can convert a list rule to an array of values. ORCA specific changes can be found here: https://github.com/greenplum-db/gporca/pull/149
-
- 15 2月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
-
- 26 1月, 2017 1 次提交
-
-
Leveraged bound for the limit with mk sort.
-
- 13 1月, 2017 2 次提交
-
-
由 Heikki Linnakangas 提交于
DynamicTableScanInfo is an extension of EState, so always allocate it in the same memory context. The DynamicTableScanInfo.memoryContext field always pointed to es_query_cxt, so that is what we in fact always did anyway, this just removes the unnecessary abstraction, for simplicity.
-
由 Heikki Linnakangas 提交于
Pass the EState that contains it to where it's needed, instead.
-
- 07 1月, 2017 1 次提交
-
-
由 Foyzur Rahman 提交于
This reverts commit 48c495a1.
-
- 04 1月, 2017 1 次提交
-
-
由 foyzur 提交于
Undo selected partitions before reselecting new partitions to avoid unnecessary leftover partitions from previous selections. * Adding ICG qp_dpe test to verify that the partitions are reset for each outer tuple.
-
- 20 12月, 2016 1 次提交
-
-
由 Heikki Linnakangas 提交于
Remnants of workfile caching code that was removed earlier.
-