- 15 7月, 2017 1 次提交
-
-
由 Bhuvnesh 提交于
* Donot generated PartOid Expression In GPDB, PartOidExpr is not used however ORCA still generates it. But HAWQ uses PartOid for sorting while inserting into Append Only Row / Parquet Partitioned tables. This patch uses Parquet Storage and Number of Partitions in a Append Only row partitioned table to decide if PartOid should be generated. In case of GPDB, Parquet storage is not supported and the GUC to control the number of partitions above which sort should be used is set to int max which is practically not feasible, so in case of GPDB PartOid expr will never be generated, however HAWQ can control the generation of PartOid based on the value of already existing GUCs in HAWQ. * Remove PartOid ProjElem from minidump files * Fixed CICGTest * Fix CDMLTest * Fix CDirectDispatchTest * Fix CPhysicalParallelUnionAllTest * Fix CCollapseProjectTest test * Fix parser for Partition Selector A Partition Selector node can have another partition selector node as its immediate child. In such cases, the current parsers fails. The patch fixes the issue * Fix PartTbl Test * PR Feedback Applied * Applied HSY feedback 1 Signed-off-by: NEkta Khanna <ekhanna@pivotal.io> * Bump ORCA to 2.37 Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
- 11 7月, 2017 4 次提交
-
-
由 Venkatesh Raghavan 提交于
-
由 Venkatesh Raghavan 提交于
Enable GPORCA to generate better plans for non-correlated exists subquery in the WHERE clause Consider the following exists subquery, `(select * from bar)`. GPORCA generates an elaborate count based implementation of this subquery. If bar is a fact table, the count is going to be expensive. ``` vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar); QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Gather Motion 3:1 (slice3; segments: 3) (cost=0.00..1368262.79 rows=400324 width=8) -> Nested Loop (cost=0.00..1368250.86 rows=133442 width=8) Join Filter: true -> Table Scan on foo (cost=0.00..461.91 rows=133442 width=8) Filter: a = b -> Materialize (cost=0.00..438.57 rows=1 width=1) -> Broadcast Motion 1:3 (slice2) (cost=0.00..438.57 rows=3 width=1) -> Result (cost=0.00..438.57 rows=1 width=1) Filter: (count((count()))) > 0::bigint -> Aggregate (cost=0.00..438.57 rows=1 width=8) -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..438.57 rows=1 width=8) -> Aggregate (cost=0.00..438.57 rows=1 width=8) -> Table Scan on bar (cost=0.00..437.95 rows=332395 width=1) Optimizer status: PQO version 2.35.1 (14 rows) ``` Planner on the other hand uses LIMIT as shown in the INIT plan. ``` vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar); QUERY PLAN ------------------------------------------------------------------------------------------------ Gather Motion 3:1 (slice2; segments: 3) (cost=0.03..13611.14 rows=1001 width=8) -> Result (cost=0.03..13611.14 rows=334 width=8) One-Time Filter: $0 InitPlan (slice3) -> Limit (cost=0.00..0.03 rows=1 width=0) -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..0.03 rows=1 width=0) -> Limit (cost=0.00..0.01 rows=1 width=0) -> Seq Scan on bar (cost=0.00..11072.84 rows=332395 width=0) -> Seq Scan on foo (cost=0.00..13611.11 rows=334 width=8) Filter: a = b Settings: optimizer=off Optimizer status: legacy query optimizer (12 rows) ``` While GPORCA doesnot support init-plan, we can nevertheless generate a better plan by using LIMIT instead of count. After this PR, GPORCA will generate the following plan with LIMIT clause. ``` vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar); QUERY PLAN ------------------------------------------------------------------------------------------------------------ Gather Motion 3:1 (slice3; segments: 3) (cost=0.00..1368262.73 rows=400324 width=8) -> Nested Loop EXISTS Join (cost=0.00..1368250.80 rows=133442 width=8) Join Filter: true -> Table Scan on foo (cost=0.00..461.91 rows=133442 width=8) Filter: a = b -> Materialize (cost=0.00..438.57 rows=1 width=1) -> Broadcast Motion 1:3 (slice2) (cost=0.00..438.57 rows=3 width=1) -> Limit (cost=0.00..438.57 rows=1 width=1) -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..438.57 rows=1 width=1) -> Limit (cost=0.00..438.57 rows=1 width=1) -> Table Scan on bar (cost=0.00..437.95 rows=332395 width=1) Optimizer status: PQO version 2.35.1 (12 rows) ```
-
由 Bhunvesh Chaudhary 提交于
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
由 Bhunvesh Chaudhary 提交于
Each ORCA commit must BUMP the version. If the version is not bumped new releases will not be pushed to the ORCA repository. This commit adds the check to validate the version of the current commit with the tag version existing on the repository. Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
- 06 7月, 2017 3 次提交
-
-
由 Omer Arap 提交于
There is a tie breaker logic in FBetterThan function in CCostContext. According to the code if the distribution is Hashed rather than Random, it should be favored when the costs are equal. The code was checking both if the distribution spec in the same context is both equal to Hashed and Random which is false by default. It should check if comparing one is Hashed and compared one is Random for correct behavior.
-
由 Bhuvnesh Chaudhary 提交于
There are changes to regress tests in GPDB repository, so bumping up the minor version.
-
由 Bhuvnesh Chaudhary 提交于
CScalarCmp and CScalarIsDistinctFrom operator must have the CScalarIdent operator on the LHS and CScalarConst operator on the RHS. If it's not the case the predicate is assigned a default selectivity due to which the Cardinality estimate is impacted. This patch fixes the issue by reordering the children of CScalarCmp and CScalarIsDistinctFrom operator if CScalarIdent operator is on the RHS on CScalarConst is on LHS. Also the Comparision operator is changed due to the reordering of the arguments, if supported. If the corresponding comparision operator does not exist or supported, the children are not reordered. Only cases of the type CONST = VAR are handled by this patch. Signed-off-by: NOmer Arap <oarap@pivotal.io>
-
- 01 7月, 2017 1 次提交
-
-
由 Venkatesh Raghavan 提交于
Minor cleaniup of duplicate code.
-
- 30 6月, 2017 2 次提交
-
-
由 Jesse Zhang 提交于
Shiny new feature in Concourse 3.3.0 (https://concourse.ci/running-tasks.html#caches) [ci skip]
-
由 Heikki Linnakangas 提交于
It made the assumption that it's OK to call it on a NULL pointer, which isn't cool with all C++ compilers and options. I'm getting a bunch of warnings like this because of it: /home/heikki/gpdb/optimizer-main/libgpos/include/gpos/common/CDynamicPtrArray.inl:382:3: warning: nonnull argument ‘this’ compared to NULL [-Wnonnull-compare] if (NULL == this) ^~ There are a few other places that produce the same error, but one step at a time. This is most important because it's in an inline function, so this produces warnings also in any code that uses ORCA, like the translator code in GPDB's src/backend/gpopt/ directory, not just ORCA itself. Since the function is now gone, all references to it also need to be removed from the translator code outside ORCA. Bump up Orca version to 2.34.2
-
- 28 6月, 2017 2 次提交
-
-
由 Bhuvnesh Chaudhary 提交于
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
-
由 Omer Arap 提交于
This commit introduces a check to detect if the CTE producer and matching consumer is executed on the right place. So if CTE producer is executed on master/segments/segment, then matching consumer also has to execute on master/segments/segment. In rare cases, orca generates plans that violate this assumption. This commit detects plans of that kind and falls back. Signed-off-by: NVenkatesh Raghavan <vraghavan@pivotal.io> Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
- 21 6月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
-
- 20 6月, 2017 2 次提交
-
-
由 Omer Arap 提交于
The template classes in orca mostly do the implementation in `.inl` files while some implementation also exists in `.h` files. It makes it hard to traverse the code in .inl files since some IDEs do not recognize formatting. Therefore this commit moves the implementation to the `.h` files wherever is applicable. This commit does not port implementation from `.inl` where there exists `.cpp` implementation file as well as `.h` only has function definitions such as `CUtils.h`.
-
由 Venkatesh Raghavan 提交于
Histogram intersection depends on the value of the bucket boundaries. For datatypes like text, varchar, etc. Orca currently uses a hash function to mark bucket boundaries. This function is slightly useful for equality with singleton buckets but nothing more. So the previous join stats computation based on histogram intersection is totally bogus. In this CL, we now modified it into a NDV (number of distinct values) based estimation.
-
- 10 6月, 2017 1 次提交
-
-
由 Jesse Zhang 提交于
This reverts commit 8c315ea4. [ci skip]
-
- 08 6月, 2017 1 次提交
-
-
由 Jemish Patel 提交于
[ci skip] Signed-off-by: NJemish Patel <jpatel@pivotal.io>
-
- 18 5月, 2017 1 次提交
-
-
由 Venkatesh Raghavan 提交于
-
- 16 5月, 2017 2 次提交
-
-
由 Jesse Zhang 提交于
[ci skip]
-
由 Jesse Zhang 提交于
Todd broke our CI in greenplum-db/gpdb@398534a9. [ci skip]
-
- 15 5月, 2017 2 次提交
-
-
由 Venkatesh Raghavan 提交于
-
由 Venkatesh Raghavan 提交于
* Make sure intent of the traceflags is clear * Remove double negation where possible * Update comments
-
- 12 5月, 2017 1 次提交
-
-
由 C.J. Jameson 提交于
-
- 09 5月, 2017 6 次提交
-
-
由 Dhanashree Kashid 提交于
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
For a query using a correlated subselect as predicate, such as: ``` CREATE TABLE partitioned_table (a int, pk int) DISTRIBUTED BY (a) PARTITION BY range(pk) (end(5), end(10)); CREATE TABLE other_table (c int, d int) DISTRIBUTED BY (c); INSERT INTO partitioned_table VALUES (1, 1), (2, 4), (3, 9); ANALYZE ROOTPARTITION partitioned_table; EXPLAIN SELECT pk from partitioned_table WHERE a > (SELECT d FROM other_table WHERE c = a) AND pk < 12; ``` ORCA will generate a Correlated Nested Loop Left Outer Join, which should translate to a DXL Scan under a DXL SubPlan filter. However, the translation happened in the following order: 0. Translate outer child (which has a filter) of correlated NLJ 0. Build a DXL SubPlan using the inner child, which is intended to serve as an "additional filter condition" on top of the outer child 0. Now based on the DXL plan for the outer child, we decide whether or not to use the additional condition (generated in #2 or #3) as a filter in the final result. 0. If the outer child is a Physical Sequence, we discarded the condition assuming that filter condition is already present in the partition selector. 0. This code to discard the subplan was added in e99325cc because, previously, we were always inserting additional filter as "the second child" based on the assumption that every DXL node has a filter child in the 2nd place. As it turned out, the DXL Sequence node is one counterexample: it has no filter, and it's second child is expected to be a partition selector. 0. We didn't catch this error in e99325cc because the test cases had a trivial additional filter condition of a constant true, so dropping it didn't really raise any eyebrows. However, the generally correct approach should be to retain this additional condition, either as an additional DXL Result node on top of the DXL for the outer child, or for appropriate types of nodes, inline the condition into existing filters. This patch set fixes that. Signed-off-by: NJemish Patel <jpatel@pivotal.io> Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
We ensure all three cases of PdxlnCorrelatedNLJoin take ownership of the DXLProperties object
-
-
-
0. There are cases where the additional scalar condition cannot be combined with the original condition in the DXL plan. Case in point: when the outer child gets translated into a DXL Sequence, we cannot put the "combined condition" into the sequence. 0. Deferring the combination of conditions also gives us an optimization opportunity to reduce double translations.
-
- 30 4月, 2017 1 次提交
-
-
由 Jesse Zhang 提交于
installcheck has over the last year gotten slowly bloated. This test runs about 32 minutes when nothing else happens in Concourse. Given we also run the planner ICG in parallel, we'd better err on the safe side and extend this to an hour. [ci skip]
-
- 26 4月, 2017 9 次提交
-
-
由 Omer Arap 提交于
Signed-off-by: NJemish Patel <jpatel@pivotal.io>
-
由 Heikki Linnakangas 提交于
If an error occurs while serializing an exception's error context, don't recurse into Serialize. Serializing the context is likely to just fail again, leading to infinite recursion. Also disable abort-signal while serializing error context, to also avoid recursing.
-
由 Heikki Linnakangas 提交于
If an error occurs, a worker is no longer executing the given task. This was causing problems later, when the Task object had already been destroyed, but m_ptsk was left dangling.
-
由 Heikki Linnakangas 提交于
The hash table iterator holds a spinlock on the hash table, and there's a GPOS_ASSERT_NO_SPINLOCK in PvMalloc. Writing to the dumper stream can cause allocations, so avoid doing that while iterating.
-
由 Heikki Linnakangas 提交于
This isn't strictly necessary, but considerably speeds up the dumping of large queries. This brought down the time needed to process and dump a 600 MB minidump from about 90 s to 30 s on my laptop.
-
由 Heikki Linnakangas 提交于
Instead of having a fixed-size buffer to serialize minidumps to, refactor the serialization functions to write to a stream. That simplifies the serialization functions, as they no longer need to reserve space in the buffer ahead of time (i.e. the UlpRequiredSpace() functions are gone), reduces memory usage when dumping small queries, and makes it possible to minidump large queries without running out of memory. This removes the arbitrary 16 MB limit on minidump size. I'm not sure it's a good idea to create multi-gigabyte minidumps in practice, but that's a case of "if it hurts, don't do it". At least it's now possible, if you need to.
-
由 Omer Arap 提交于
Signed-off-by: NJemish Patel <jpatel@pivotal.io>
-
由 Jesse Zhang 提交于
Signed-off-by: NOmer Arap <oarap@pivotal.io>
-
由 Omer Arap 提交于
-