提交 · a4362cba3c58a077d8a4d134aad2b14a96d4cb4c · Greenplum / Gpdb

05 6月, 2020 1 次提交

Support "NDV-preserving" function and op property (#10247) · a4362cba

由 Hans Zeller 提交于 6月 04, 2020

Orca uses this property for cardinality estimation of joins.
For example, a join predicate foo join bar on foo.a = upper(bar.b)
will have a cardinality estimate similar to foo join bar on foo.a = bar.b.

Other functions, like foo join bar on foo.a = substring(bar.b, 1, 1)
won't be treated that way, since they are more likely to have a greater
effect on join cardinalities.

Since this is specific to ORCA, we use logic in the translator to determine
whether a function or operator is NDV-preserving. Right now, we consider
a very limited set of operators, we may add more at a later time.

Let's assume that we join tables R and S and that f is a function or
expression that refers to a single column and does not preserve
NDVs. Let's also assume that p is a function or expression that also
refers to a single column and that does preserve NDVs:

join predicate       card. estimate                         comment
-------------------  -------------------------------------  -----------------------------
col1 = col2          |R| * |S| / max(NDV(col1), NDV(col2))  build an equi-join histogram
f(col1) = p(col2)    |R| * |S| / NDV(col2)                  use NDV-based estimation
f(col1) = col2       |R| * |S| / NDV(col2)                  use NDV-based estimation
p(col1) = col2       |R| * |S| / max(NDV(col1), NDV(col2))  use NDV-based estimation
p(col1) = p(col2)    |R| * |S| / max(NDV(col1), NDV(col2))  use NDV-based estimation
otherwise            |R| * |S| * 0.4                        this is an unsupported pred
Note that adding casts to these expressions is ok, as well as switching left and right side.

Here is a list of expressions that we currently treat as NDV-preserving:

coalesce(col, const)
col || const
lower(col)
trim(col)
upper(col)

One more note: We need the NDVs of the inner side of Semi and
Anti-joins for cardinality estimation, so only normal columns and
NDV-preserving functions are allowed in that case.

This is a port of these GPDB 5X and GPOrca PRs:
https://github.com/greenplum-db/gporca/pull/585
https://github.com/greenplum-db/gpdb/pull/10090

This is take 2, after reverting the first attempt due to a merge conflict that
caused a test to fail.

a4362cba

04 6月, 2020 2 次提交

J
Revert "Support "NDV-preserving" function and op property (#10225)" · 898e66b8
由 Jesse Zhang 提交于 6月 03, 2020
```
Regression test "gporca" started failing after merging d565edac.

This reverts commit d565edac.
```
898e66b8

Support "NDV-preserving" function and op property (#10225) · d565edac

由 Hans Zeller 提交于 6月 03, 2020

Orca uses this property for cardinality estimation of joins.
For example, a join predicate foo join bar on foo.a = upper(bar.b)
will have a cardinality estimate similar to foo join bar on foo.a = bar.b.

Other functions, like foo join bar on foo.a = substring(bar.b, 1, 1)
won't be treated that way, since they are more likely to have a greater
effect on join cardinalities.

Since this is specific to ORCA, we use logic in the translator to determine
whether a function or operator is NDV-preserving. Right now, we consider
a very limited set of operators, we may add more at a later time.

Let's assume that we join tables R and S and that f is a function or
expression that refers to a single column and does not preserve
NDVs. Let's also assume that p is a function or expression that also
refers to a single column and that does preserve NDVs:

join predicate       card. estimate                         comment
-------------------  -------------------------------------  -----------------------------
col1 = col2          |R| * |S| / max(NDV(col1), NDV(col2))  build an equi-join histogram
f(col1) = p(col2)    |R| * |S| / NDV(col2)                  use NDV-based estimation
f(col1) = col2       |R| * |S| / NDV(col2)                  use NDV-based estimation
p(col1) = col2       |R| * |S| / max(NDV(col1), NDV(col2))  use NDV-based estimation
p(col1) = p(col2)    |R| * |S| / max(NDV(col1), NDV(col2))  use NDV-based estimation
otherwise            |R| * |S| * 0.4                        this is an unsupported pred
Note that adding casts to these expressions is ok, as well as switching left and right side.

Here is a list of expressions that we currently treat as NDV-preserving:

coalesce(col, const)
col || const
lower(col)
trim(col)
upper(col)

One more note: We need the NDVs of the inner side of Semi and
Anti-joins for cardinality estimation, so only normal columns and
NDV-preserving functions are allowed in that case.

This is a port of these GPDB 5X and GPOrca PRs:
https://github.com/greenplum-db/gporca/pull/585
https://github.com/greenplum-db/gpdb/pull/10090

d565edac

03 6月, 2020 1 次提交

Refactoring the DbgPrint and OsPrint methods (#10149) · b3fdede6

由 Hans Zeller 提交于 6月 02, 2020

* Make DbgPrint and OsPrint methods on CRefCount

Create a single DbgPrint() method on the CRefCount class. Also create
a virtual OsPrint() method, making some objects derived from CRefCount
easier to print from the debugger.

Note that not all the OsPrint methods had the same signatures, some
additional OsPrintxxx() methods have been generated for that.

* Making print output easier to read, print some stuff on demand

Required columns in required plan properties are always the same
for a given group. Also, equivalent expressions in required distribution
properties are important in certain cases, but in most cases they
disrupt the display and make it harder to read.

Added two traceflags, EopttracePrintRequiredColumns and
EopttracePrintEquivDistrSpecs that have to be set to print this
information. If you want to go back to the old display, use these
options when running gporca_test: -T 101016 -T 101017

* Add support for printing alternative plans

A new method, CEngine::DbgPrintExpr() can be called from
COptimizer::PexprOptimize, to allow printing of the best plan
for different contexts. This is only enabled in debug builds.

To use this:

- run an MDP using gporca_test, using a debug build
- print out memo after optimization (-T 101006 -T 101010)
- set a breakpoint near the end of COptimizer::PexprOptimize()
- if, after looking at the contents of memo, you want to see
  the optimal plan for context c of group g, do the following:
  p eng.DbgPrintExpr(g, c)

You could also get the same info from the memo printout, but it
would take a lot longer.

b3fdede6

30 5月, 2020 1 次提交

Penalize cross products in Orca's DPv2 algorithm more accurately (#10029) · 457bb928

由 Chris Hajas 提交于 5月 29, 2020

Previously in the DPv2 transform (exhaustive2) while we penalized
cross joins for the remaining joins in greedy, we did
not for the first join, which in some cases selected a cross join.
This ended up selecting a poor join order in many cases and went against
the intent of the alternative being generated, which is to minimize
cross joins.

We also increase the cost of the default penalty from 5 to 1024, which is the value we use in the cost model during the optimization stage.

The greedy alternative also wasn't kept in the heap, so we include that now too.

457bb928

28 5月, 2020 1 次提交

Log fewer errors (#10100) · fba77702

由 Sambitesh Dash 提交于 5月 27, 2020

This is a continuation of commit 456b2b31 in GPORCA. Adding more errors to the list that
doesn't get logged in log file. We are also removing the code that writes to std::cerr,
generating a not very nice looking log message. Instead, add the info whether the error was
unexpected to another log message that we also generate.

fba77702

22 5月, 2020 1 次提交

Let configure set C++14 mode (#10147) · b371e592

由 Chris Hajas 提交于 5月 21, 2020

Commit 649ee57d "Build ORCA with C++14: Take Two (#10068)" left
behind a major FIXME: a hard-coded CXXFLAGS in gporca.mk. At the very
least this looks completely out of place aesthetically. But more
importantly, this is problematic in several ways:

1. It leaves the language mode for part of the code base
(src/backend/gpopt "ORCA translator") unspecified. The ORCA translator
closely collaborates with ORCA and directly uses much of the interfaces
from ORCA. There is a non-hypothetical risk of non-subtle
incompatibilities. This is obscured by the fact that GCC and upstream
Clang (which both default to gnu++14 after their respective 6.0
releases). Apple Clang from Xcode 11, however, reacts to it much like
the default is still gnu++98:

> In file included from CConfigParamMapping.cpp:20:
> In file included from ../../../../src/include/gpopt/config/CConfigParamMapping.h:19:
> In file included from ../../../../src/backend/gporca/libgpos/include/gpos/common/CBitSet.h:15:
> In file included from ../../../../src/backend/gporca/libgpos/include/gpos/common/CDynamicPtrArray.h:15:
> ../../../../src/backend/gporca/libgpos/include/gpos/common/CRefCount.h:68:24: error: expected ';' at end of declaration list
>                         virtual ~CRefCount() noexcept(false)
>                                             ^
>                                             ;

2. It potentially conflicts with other parts of the code base. Namely,
when configured with gpcloud, we might have -std=c++11 and -std=gnu++14
in the same breath, which is highly undesirable or an outright error.

3. Idiomatically language standard selection should modify CXX, not
CXXFLAGS, in the same vein as how AC_PROC_CC_C99 modifies CC.

We already had a precedence of setting the compiler up in C++11 mode
(albeit for a less-used component gpcloud). This patch leverages the
same mechanism to set up CXX in C++14 mode.
Authored-by: NChris Hajas <chajas@pivotal.io>

b371e592

19 5月, 2020 1 次提交

Build ORCA with C++14: Take Two (#10068) · 649ee57d

由 Jesse Zhang 提交于 5月 18, 2020

This patch makes the minimal changes to build ORCA with C++14. This
should address the grievance that ORCA cannot build with the default
Xerces C++ (3.2 or newer, which is built with GCC 8.3 in the default
C++14 mode) headers from Debian. I've kept the CMake build system in
sync with the main Makefile. I've also made sure that all ORCA tests
pass.

This patch set also enables ORCA in Travis so the community gets
compilation coverage.

=== FIXME / near-term TODOs:

What's _not_ included in this patch, but would be nice to have soon (in
descending order of importance):

1. -std=gnu++14 ought to be done in "configure", not in a Makefile. This
is not a pendantic aesthetic issue, sooner or later we'll run into this
problem, especially if we're mixing multiple things built in C++.

2. Clean up the Makefiles and move most CXXFLAGS override into autoconf.

3. Those noexept(false) seem excessive, we should benefit from
conditionally marking more code "noexcept" at least in production.

4. Detecting whether Xerces was generated (either by autoconf or CMake)
with a compiler that's effectively running post-C++11

5. Work around a GCC 9.2 bug that crashes the loading of minidumps (I've
tested with GCC 6 to 10). Last I checked, the bug has been fixed in GCC
releases 10.1 and 9.3.

[resolves #9923]
[resolves #10047]
Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
Reviewed-by: NHans Zeller <hzeller@pivotal.io>
Reviewed-by: NAshuka Xue <axue@pivotal.io>
Reviewed-by: NDavid Kimura <dkimura@pivotal.io>

649ee57d

18 5月, 2020 1 次提交

VPATH fix for ORCA-related Makefile. · afd31921

由 Jesse Zhang 提交于 5月 11, 2020

This commit fixes up a host of top_builddir vs top_srcdir confusion,
uncovered by running a VPATH build (with ORCA enabled, of course).

I've also taken this opportunity to slightly eliminate some duplication,
using Makefile inclusion.

After this commit, a VPATH build should compile.

This resolves #10071.

afd31921

14 5月, 2020 2 次提交

A

Address PR Feedback · d90ceb45
由 Ashuka Xue 提交于 5月 13, 2020

d90ceb45

Allow stats estimation for text-like types only for histograms containing singleton buckets · ecefcc1c

由 Ashuka Xue 提交于 4月 29, 2020

In commit `Improve statistics calculation for exprs like "var = ANY
(ARRAY[...])"`, we improve performance in cardinality estimation for
ArrayCmp. However, it caused ArrayCmp expressions with text-like types
to default to NDV based cardinality estimations in spite of present and
valid histograms.

This commit re-enables using histograms for text-like types provided it
is safe to do so.

Removed because non-singleton buckets for text is not valid:
- src/backend/gporca/data/dxl/minidump/CTE-12.mdp
- src/backend/gporca/data/dxl/statistics/Join-Statistics-Text-Input.xml
- src/backend/gporca/data/dxl/statistics/Join-Statistics-Text-Output.xml
Co-authored-by: NAshuka Xue <axue@pivotal.io>
Co-authored-by: NShreedhar Hardikar <shardikar@pivotal.io>

ecefcc1c

13 5月, 2020 1 次提交

Removing xerces patch (#10091) · 2448be9b

由 Hans Zeller 提交于 5月 12, 2020

The scripts we use in Concourse pipelines download Apache xerces-c-3.1.2 and then apply a patch that is part of our source code tree. Abhijit has pointed out that this is no longer necessary. This commit removes the patch and uses the vanilla xerces-c-3.1.2 source code instead.

Eventually, we want to stop including xerces into our releases and rely on the natively installed xerces. See also https://github.com/greenplum-db/gpdb/pull/10068.

2448be9b

12 5月, 2020 1 次提交

Limit DPE stats to groups with unresolved partition selectors (#9988) · cfc83810

由 Hans Zeller 提交于 5月 11, 2020

DPE stats are computed when we have a dynamic partition selector that's
applied on another child of a join. The current code continues to use
DPE stats even for the common ancestor join and nodes above it, but
those nodes aren't affected by the partition selector.

Regular Memo groups pick the best expression among several to compute
stats, which makes row count estimates more reliable. We don't have
that luxury with DPE stats, therefore they are often less reliable.

By minimizing the places where we use DPE stats, we should overall get
more reliable row count estimates with DPE stats enabled.

The fix also ignores DPE stats with row counts greater than the group
stats. Partition selectors eliminate certain partitions, therefore
it is impossible for them to increase the row count.

cfc83810

09 5月, 2020 13 次提交

Add missing format attributes in Orca. · 2be1f2b2

由 Jesse Zhang 提交于 5月 07, 2020

Add __format__ decorations for printf format checking to ORCA functions
that were missing them (uncovered by reverting commit 942448b7).

When we build the ORCA test suite with CMake, we don't get to use
Greenplum's configure infrastructure that determines the best format
archetype (gnu_printf if available). For simplicity, just hardcode
"printf", it's virtually universally supported.
Co-authored-by: NMelanie Plageman <melanieplageman@gmail.com>

2be1f2b2

Remove bogus calls to CStringStatic::AppendFormat. · 58c3ea9c

由 Melanie Plageman 提交于 5月 08, 2020

In commit 942448b7 we temporarily suppressed a compiler warning
that printf-like functions are missing the __format__ attribute for
consistency checking. In order to reinstate such diagnostics, we will
add format attributes to various GPOS methods. Once that happens (in a
forthcoming commit), the compiler catches a call site that is passing a
nonliteral format string to a printf-like function:

../libgpos/server/src/unittest/gpos/io/CFileTest.cpp:285:34: error: format not a string literal and no format arguments [-Werror=format-security]
  strTmpFile.AppendFormat(szTmpDir);

At first we were tempted to merely replace AppendFormat with
AppendBuffer. After all that looks like the most logical substitute. But
upon closer inspection, the whole usage pattern in
CFileTest::Unittest_MkTmpFile is quite bogus. In addition, AppendBuffer
has a truncating behavior that might be more surprising.

In addition to fixing the warning, this patch rearranges the code a bit,
adds comments to make sure the next person reading it pulls less of
their hair out.
Co-authored-by: NJesse Zhang <sbjesse@gmail.com>

58c3ea9c

H
Fix compilation of unit tests. · 88f9744a
由 Heikki Linnakangas 提交于 4月 23, 2020
```
They use GPOS_RESET_EX, which needs ITask.

Fix missing includes in unit tests.
```
88f9744a

Replace ops.h with more fine-grained includes. · 143dd82d

由 Heikki Linnakangas 提交于 4月 21, 2020

ops.h brings in the headers for *all* the in include/gpopt/operators/,
which is way more than is needed in most cases.

143dd82d

H

More header cleanup. · 347fba32
由 Heikki Linnakangas 提交于 4月 21, 2020

347fba32

Reduce usage of dxlops.h · 529ce1a7

由 Heikki Linnakangas 提交于 4月 21, 2020

Avoid including dxlops.h, which pulls *all* the CParseHandler header
files. Makes the postgres binary (with assertions and debugging
information) about 1.5 MB smaller.

529ce1a7

H
Move typedef WOSTREAM closer to where it's needed. · b88c8195
由 Heikki Linnakangas 提交于 4月 20, 2020
```
Let's keep base.h as slim as possible.
```
b88c8195
H

gpos/base/ITask.h isn't actually used very widely, remove it from base.h. · af6431ad
由 Heikki Linnakangas 提交于 4月 20, 2020

af6431ad

Remove or move some more headers. · 99a0066f

由 Heikki Linnakangas 提交于 4月 20, 2020

CMemoryPool.h is included literally everywhere, because it comes with
gpos/base.h. Every little there helps.

99a0066f

H
Remove unnecessary includes. · 632ad764
由 Heikki Linnakangas 提交于 4月 20, 2020
```
Try to not pull in unnecessary dependencies in header files.
```
632ad764

Cleanup dependencies. · 35cfc37d

由 Heikki Linnakangas 提交于 4月 20, 2020

With this, the xerces headers are not pulled into the xforms/ files.
Makes each .o file about 100 kB shorter. Shrinks the postgres binary
from about 128 MB to 121 MB, with assertions and debugging enabled.

35cfc37d

H

Remove some unused code. · c9756796
由 Heikki Linnakangas 提交于 4月 21, 2020

c9756796
H

Remove obsolete GPOS_SunOs ifdefs. · 4eebb0e1
由 Heikki Linnakangas 提交于 4月 20, 2020

4eebb0e1

06 5月, 2020 3 次提交

Drop trailing slashes in "subdir". · f90dd34f

由 Jesse Zhang 提交于 4月 28, 2020

While functionally harmless, the trailing slashes seem to contradict our
coding conventions. In addition, it leads to double slashes when we
compute the linking command.

While we're at it, also correct the trailing slashes in subdir for 3
pre-existing Makefiles (fsync, heap_checksum, and walrep regress tests).

f90dd34f

Use CXXFLAGS instead of CPPFLAGS for -std and -W* . · d51ecabe

由 Jesse Zhang 提交于 4月 14, 2020

Semantically, those are CXXFLAGS, not CPP flags (which typically deal
with -D and -I stuff). Practically, this becomes a problem when we try
to turn off certain warnings because CPPFLAGS come after CXXFLAGS and
the order sensitivity of -Werror and -Wno-* flags won't mix well.

d51ecabe

Remove repetition in gpos Makefile · 20719067

由 Jesse Zhang 提交于 4月 14, 2020

In a forthcoming commit we're gonna tweak some of these flags. To
prevent duplication, consolidate them through Makefile inclusion first.
This also addresses a FIXME left from commit 3f3f4a57 "Fix configure
and cmake to build ORCA with debug".

20719067

29 4月, 2020 2 次提交

Improve statistics calculation for exprs like "var = ANY (ARRAY[...])" · e25bcf4e

由 Shreedhar Hardikar 提交于 4月 21, 2020

Implements an algorithm in MakeHistArrayCmpAnyFilter() using CStatsPredArrayCmp:
1. Construct a histogram with the same bucket boundaries as present in the
base_histogram.
This is better than using a singleton bucket per point, because it that
case, the frequency of each bucket is so small, it is often less than
CStatistics::Epsilon, and may be considered as 0, leading to
cardinality misestimation. Using the same buckets as base_histogram
also aids in joining histogram later.
2. Compute the frequency for each bucket based on the number of points (NDV)
present within each bucket boundary. NB: the points must be de-duplicated
beforehand to prevent double counting.
3. Join this "dummy_histogram" with the base_histogram to determine the buckets
from base_histogram that should be selected (using MakeJoinHistogram)
4. Compute and adjust the resultant scale factor for the filter.
Co-authored-by: NAshuka Xue <axue@pivotal.io>
Co-authored-by: NShreedhar Hardikar <shardikar@pivotal.io>

e25bcf4e

[Refactor] Rename functions for clarity and add DbgPrints · bce754a1

由 Ashuka Xue 提交于 4月 22, 2020

Functions renamed:
- CHistogram::Buckets -> GetNumBuckets
- CHistogram::ParseDXLToBucketsArray -> GetBuckets

Implemented DbgPrint for:
- CBucket
- CHistogram
Co-authored-by: NAshuka Xue <axue@pivotal.io>
Co-authored-by: NShreedhar Hardikar <shardikar@pivotal.io>

bce754a1

20 4月, 2020 1 次提交

Do not push Volatile funcs below aggs · 885ca8a9

由 Sambitesh Dash 提交于 4月 09, 2020

Consider the scenario below

```
create table tenk1 (c1 int, ten int);
create temp sequence ts1;
explain select * from (select distinct ten from tenk1) ss where ten < 10 + nextval('ts1') order by 1;
```

The filter outside the subquery is a candidate to be pushed below the
'distinct' in the sub-query.  But since 'nextval' is a volatile function, we
should not push it.

Volatile functions give different results with each execution. We don't want
aggs to use result of a volatile function before it is necessary. We do it for
all aggs - DISTINCT and GROUP BY.

Also see commit 6327f25d.

885ca8a9

15 4月, 2020 2 次提交

D
Fix typos in documentation · d9faedac
由 Daniel Gustafsson 提交于 4月 15, 2020
```
Various typos spotted in internal in-tree documentation.
```
d9faedac

Speed up stats derivation for large number of disjunction in ORCA · 9d05d090

由 Shreedhar Hardikar 提交于 4月 09, 2020

This bug is particularly evident with queries containing a large array
IN clause, e.g "a IN (1, 3, 5, ...)".

As a first step to improve optimization times for such queries, this
commit reduces unnecessary re-allocation of histogram buckets during the
merging of statistics of disjunctive predicates.

It improves the performance of the target query with 7000 elements in
the array comparison by around 50%.
Co-authored-by: NShreedhar Hardikar <shardikar@pivotal.io>
Co-authored-by: NAshuka Xue <axue@pivotal.io>

9d05d090

10 4月, 2020 2 次提交

Handle opfamilies/opclasses for distribution in ORCA · e7ec9f11

由 Shreedhar Hardikar 提交于 10月 15, 2019

GPDB 6 introduced a mechanism to distribute table tables on columns
using a custom hash opclass, instead of using cdbhash. Before this
commit, ORCA would ignore the distribution opclass, but ensuring the
translator would only allow queries in which all tables were distributed
by either their default or default "legacy" opclasses.

However, in case of tables distributed by legacy or default opclasses,
but joined using a non-default opclass operator, ORCA would produce an
incorrect plan, giving wrong results.

This commit fixes that bug by introducing support for distributed tables
using non-default opfamilies/opclasses. But, even though the support is
implemented, it is not fully enabled at this time. The logic to fallback
to planner when the plan contains tables distributed with non-default
non-legacy opclasses remains. Our intention is to support it fully in
the future.

How does this work?
For hash joins, capture the opfamily of each hash joinable operator. Use
that to create hash distribution spec requests for either side of the
join.  Scan operators derive a distribution spec based on opfamily
(corresponding to the opclass) of each distribution column.  If there is
a mismatch between distribution spec requested/derived, add a Motion
Redistribute node using the distribution function from the requested
hash opfamily.

The commit consists of several sub-sections:
- Capture distr opfamilies in CMDRelation and related classes

  For each distribution column of the relation, track the opfamily of
  "opclass" used in the DISTRIBUTED BY clause. This information is then
  relayed to CTableDescriptor & CPhysicalScan.

  Also support this in other CMDRelation subclasses: CMDRelationCTAS
  (via CLogicalCTAS) & CMDRelationExternalGPDB.

- Capture hash opfamily of CMDScalarOp using gpdb::GetCompatibleHashOpFamily()
  This is need to determined distribution spec requests from joins.

- Track hash opfamilies of join predicates

  This commit extends the caching of join keys in Hash/Merge joins by
  also caching the corresponding hash opfamilies of the '=' operators
  used in those predicates.

- Track opfamily in CDistributionSpecHashed.

  This commit also constructs CDistributionSpecHashed with opfamily
  information that was previously cached in CScalarGroup in the case of
  HashJoins.
  It also includes the compatibility checks that reject distributions
  specs with mismatched opfamilies in order to produce Redistribute
  motions.

- Capture default distribution (hash) opfamily in CMDType
- Handle legacy opfamilies in CMDScalarOp & CMDType
- Handle opfamilies in HashExprList Expr->DXL translation

ORCA-side notes:
1. To ensure correctness, equivalent classes can only be determined over
   a specific opfamily. For example, the expression `a = b` implies a &
   b belong to an equiv classes only for the opfamily `=` belongs to.
   Otherwise expression `b |=| c` can be used to imply a & c belong to
   the same equiv class, which is incorrect, as the opfamily of `=` and
   `|=|` differ.
   For this commit, determine equiv classes only for default opfamilies.
   This will ensure correct behavior for majority of cases.
2. This commit does *not* implement similar features for merge joins.
   That is left for future work.
3. This commit introduces two traceflags:
   - EopttraceConsiderOpfamiliesForDistribution: If this is off,
     opfamilies is ignored and set to NULL. This mimics behavior before
     this PR. Ctest MDPs are run this way.
   - EopttraceUseLegacyOpfamilies: Set if ANY distribution col in the
     query uses a legacy opfamily/opclass. MDCache getters will then
     return legacy opfamilies instead of the default opfamilies for all
     queries.

What new information is captured from GPDB?
1. Opfamily of each distribution column in CMDRelation,
   CMDRelationCtasGPDB & CMDRelationExternalGPDB
2. Compatible hash opfamily of each CMDScalarOp using
   gpdb::GetCompatibleHashOpFamily()
3. Default distribution (hash) opfamily of every type.
   This maybe NULL for some types. Needed for certain operators (e.g
   HashAgg) that request distribution spec that cannot be inferred in
   any other way: cannot derive it, cannot get it from any scalar op
   etc. See GetDefaultDistributionOpfamilyForType()
4. Legacy opfamilies for types & scalar operators.
   Needed for supporting tables distributed by legacy opclasses.

Other GPDB side changes:

1. HashExprList no longer carries the type of the expression (it is
   inferred from the expr instead). However, it now carries the hash
   opfamily to use when deriving the distribution hash function. To
   maintain compatibility with older versions, the opfamily is used only
   if EopttraceConsiderOpfamiliesForDistribution is set, otherwise,
   default hash distribution function of the type of the expr is used.
2. Don't worry about left & right types in get_compatible_hash_opfamily()
3. Consider COERCION_PATH_RELABELTYPE as binary coercible for ORCA.
4. EopttraceUseLegacyOpfamilies is set if any table is distributed by a
   legacy opclass.

e7ec9f11

S
Revert "Fallback when citext op non-citext join predicate is present" · 0410b7ba
由 Shreedhar Hardikar 提交于 10月 28, 2019
```
This reverts commit 3e45f064.
```
0410b7ba

09 4月, 2020 1 次提交

Update orca test pipeline · 79d4b3e9

由 Ashuka Xue 提交于 4月 08, 2020

Add a missing resource so that the ORCA dev pipeline can be run for both
master and 6X.

79d4b3e9

08 4月, 2020 3 次提交

H

Merging Orca .editorconfig into gpdb file · 1093ef02
由 Hans Zeller 提交于 3月 10, 2020

1093ef02

Fix a couple of ORCA assertions · 5a658c09

由 Chris Hajas 提交于 2月 27, 2020

These were exposed when running ICW with ORCA asserts enabled.

In DeriveJoinStats, EopLogicalFullOuterJoin is also a valid logical join
operator. In IDatum, we need to check that doubles are within some
epsilon as we're not passing in the full 64 bit IEEE value to ORCA.

With fixing the assertion, we would need to regenerate the mdp for
MinCardinalityNaryJoin However, there is no DDL/query for this test so
it is difficult to update. Since it also didn't seem to provide much
value, we're removing it.

5a658c09

C

Update README instructions on how to build/test with ORCA · 45719ddd
由 Chris Hajas 提交于 2月 28, 2020

45719ddd