提交 · b4e2e3e2280ca81d1a3a1b20d9659e229238ed18 · Greenplum / Gpdb

14 3月, 2019 1 次提交

Rename legacy planner to Postgres planner · b4e2e3e2

由 Daniel Gustafsson 提交于 3月 14, 2019

As we merge with upstream and by that keep refining the Postgres
planner, legacy planner is no longer a suitable name. This changes
all variations of the spelling (legacy planner, legacy optimizer,
legacy query optimizer etc) to say "Postgres" rather than "legacy".
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
Reviewed-by: NDavid Yozie <dyozie@pivotal.io>
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

b4e2e3e2

12 3月, 2019 1 次提交

Fix failing tests after implementing recursive index drop · 2479d566

由 Taylor Vesely 提交于 2月 12, 2019

This commit is part of Add Partitioned Indexes #7047.

After adding INTERNAL_AUTO, and dependencies between partitioned indexs,
many tests that assumed that we need to manually delete indexes added to
leaf partitions need updating.
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>

2479d566

19 1月, 2019 1 次提交

Use VERBOSE setting for HLL ANALYZE logging · e34f741f

由 Daniel Gustafsson 提交于 1月 18, 2019

Running ANALYZE with the HLL computation produce a lot of LOG messages
which are more geared towards troubleshooting than general purpose log
files. Fold these under ANALYZE VERBOSE to avoid cluttering up logfiles
on production systems unless explicitly asked for.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

e34f741f

12 1月, 2019 1 次提交

Revert "ICG updates for ORCA commit: "Include binary coercible casts for predicate inference"" · 0a924bd6

由 Bhuvnesh Chaudhary 提交于 1月 11, 2019

This reverts commit dbece3da.

Performance regression observed for tpcds queries due to cardinality
misestimation. Impacted TPCDS queries 174, 111 and 104.
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>

0a924bd6

29 12月, 2018 1 次提交

Call executor nodes the same, whether generated by planner or ORCA. · 455b9a19

由 Heikki Linnakangas 提交于 12月 29, 2018

We used to call some node types different names in EXPLAIN output,
depending on whether the plan was generated by ORCA or the Postgres
planner. Also, a Bitmap Heap Scan used to be called differently, when the
table was an AO or AOCS table, but only in planner-generated plans. There
was some historical justification for this, because they used to
be different executor node types, but commit db516347 removed last such
differences.

Full list of renames:

Table Scan -> Seq Scan
Append-only Scan -> Seq Scan
Append-only Columnar Scan -> Seq Scan
Dynamic Table Scan -> Dynamic Seq Scan
Bitmap Table Scan -> Bitmap Heap Scan
Bitmap Append-Only Row-Oriented Scan -> Bitmap Heap Scan
Bitmap Append-Only Column-Oriented Scan -> Bitmap Heap Scan
Dynamic Bitmap Table Scan -> Dynamic Bitmap Heap Scan

455b9a19

15 12月, 2018 1 次提交
- S
  ICG updates for ORCA commit: "Include binary coercible casts for predicate inference" · dbece3da
  由 Shreedhar Hardikar 提交于 12月 05, 2018
```
Also bump ORCA's version to 3.16.0
Co-authored-by: NHans Zeller <hzeller@pivotal.io>
```
  dbece3da
13 12月, 2018 1 次提交

Reporting cleanup for GPDB specific errors/messages · 56540f11

由 Daniel Gustafsson 提交于 12月 13, 2018

The Greenplum specific error handling via ereport()/elog() calls was
in need of a unification effort as some parts of the code was using a
different messaging style to others (and to upstream). This aims at
bringing many of the GPDB error calls in line with the upstream error
message writing guidelines and thus make the user experience of
Greenplum more consistent.

The main contributions of this patch are:

* errmsg() messages shall start with a lowercase letter, and not end
  with a period. errhint() and errdetail() shall be complete sentences
  starting with capital letter and ending with a period. This attempts
  to fix this on as many ereport() calls as possible, with too detailed
  errmsg() content broken up into details and hints where possible.

* Reindent ereport() calls to be more consistent with the common style
  used in upstream and most parts of Greenplum:

	ereport(ERROR,
			(errcode(<CODE>),
			 errmsg("short message describing error"),
			 errhint("Longer message as a complete sentence.")));

* Avoid breaking messages due to long lines since it makes grepping
  for error messages harder when debugging. This is also the de facto
  standard in upstream code.

* Convert a few internal error ereport() calls to elog(). There are
  no doubt more that can be converted, but the low hanging fruit has
  been dealt with. Also convert a few elog() calls which are user
  facing to ereport().

* Update the testfiles to match the new messages.

Spelling and wording is mostly left for a follow-up commit, as this was
getting big enough as it was. The most obvious cases have been handled
but there is work left to be done here.

Discussion: https://github.com/greenplum-db/gpdb/pull/6378Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

56540f11

03 11月, 2018 2 次提交
- B
  
  Add test for CTE · 7871fdfa
  由 Bhuvnesh Chaudhary 提交于 11月 02, 2018
  
  7871fdfa
- B
  Bump ORCA v3.9.0 and update test cases · 148893a6
  由 Bhuvnesh Chaudhary 提交于 10月 30, 2018
```
Co-authored-by: NSambitesh Dash <sdash@pivotal.io>
```
  148893a6
25 9月, 2018 1 次提交

Fix volatile functions handling by ORCA · e17c6f9a

由 Dhanashree Kashid 提交于 9月 21, 2018

Following commits have been cherry-picked again:

b1f543f3.

b0359e69.

a341621d.

The contrib/dblink tests were failing with ORCA after the above commits.
The issue has been fixed now in ORCA v3.1.0. Hence we re-enabled these
commits and bumping the ORCA version.

e17c6f9a

22 9月, 2018 1 次提交

Change pretty-printing of expressions in EXPLAIN to match upstream. · 4c54c894

由 Heikki Linnakangas 提交于 9月 21, 2018

We had changed this in GPDB, to print less parens. That's fine and dandy,
but it hardly seems worth it to carry a diff vs upstream for this. Which
format is better, is a matter of taste. The extra parens make some
expressions more clear, but OTOH, it's unnecessarily verbose for simple
expressions. Let's follow the upstream on this.

These changes were made to GPDB back in 2006, as part of backporting
to EXPLAIN-related patches from PostgreSQL 8.2. But I didn't see any
explanation for this particular change in output in that commit message.

It's nice to match upstream, to make merging easier. However, this won't
make much difference to that: almost all EXPLAIN plans in regression
tests are different from upstream anyway, because GPDB needs Motion nodes
for most queries. But every little helps.

4c54c894

21 9月, 2018 2 次提交

D
Revert "Bump ORCA version to 3.0.0" · fcadfe88
由 Dhanashree Kashid 提交于 9月 20, 2018
```
Revert following commits related to ORCA version 3.0.0

b1f543f3.

b0359e69.

a341621d.
```
fcadfe88

Introduce optimizer_enable_gather_on_segment_for_DML GUC · a341621d

由 Sambitesh Dash 提交于 9月 13, 2018

When ON, ORCA will optimize DML queries by enforcing a non-master gather
whenever possible. When off, a gather on master will be enforced
instead.

Default value will be ON.

Also add new tests to ensure sane behavior when this optimization is
turned on and fix the existing tests.
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

a341621d

08 9月, 2018 1 次提交

Introduce optimizer guc to enable generating streaming material · 635c2e0f

由 Dhanashree Kashid 提交于 9月 05, 2018

Previously, while optimizing nestloop joins, ORCA always generated a
blocking materialize node (cdb_strict=true). Though, this conservative
nature ensured that the join node produced by ORCA will always be
deadlock safe; we sometimes produced slow running plans.

ORCA now has a capability of producing blocking materialize only when
needed by detecting motion hazard in the nestloop join. A streaming
material will be generated when there is no motion hazard.

This commit adds a guc to control this behavior. When set to off, we
fallback to old behavior of always producing a blocking materialize.

Also bump the statement_mem for a test in segspace. After this change,
for the test query, we produce a streaming spool which changes number of
operator groups in memory quota calculation and query fails with:
`ERROR:  insufficient memory reserved for statement`. Bump the
statement_mem by 1MB to test the fault injection.

Also bump the orca version to 2.72.0
Signed-off-by: NAbhijit Subramanya <asubramanya@pivotal.io>

635c2e0f

06 9月, 2018 1 次提交

Making incremental analyze more verbose · e725d5af

由 Omer Arap 提交于 9月 04, 2018

This commit adds more log messages and updates existing log messages to
increase logging verbosity.
Signed-off-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>

e725d5af

01 9月, 2018 1 次提交

Add test to ORCA generates correct equivalence class · 27127b47

由 Sambitesh Dash 提交于 8月 30, 2018

Given a query like below:

SELECT Count(*)
FROM   (SELECT *
        FROM   (SELECT tab_2.cd AS CD1,
                       tab_2.cd AS CD2
                FROM   tab_1
                       LEFT JOIN tab_2
                              ON tab_1.id = tab_2.id) f
        UNION ALL
        SELECT region,
               code
        FROM   tab_3)a;

Previously, orca produced an incorrect filter, (cd2 = cd) on top of the
project list generated for producing an alias. This led to incorrect
results as column 'cd' is produced by a nullable side of LOJ (tab2) and
such filter produces NULL
output.
Ensure orca produces correct equivalence class by considering the
nullable columns.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

27127b47

31 8月, 2018 2 次提交

Replace GPDB versions of some numeric aggregates with upstream's. · 325e6fcd

由 Heikki Linnakangas 提交于 8月 31, 2018

Among other things, this fixes the inaccuracy of integer avg() and sum()
functions. (i.e. fixes https://github.com/greenplum-db/gpdb/issues/5525)

The upstream versions are from PostgreSQL 9.6, using the 128-bit math
from the following commit:

commit 959277a4
Author: Andres Freund <andres@anarazel.de>
Date:   Fri Mar 20 10:26:17 2015 +0100

    Use 128-bit math to accelerate some aggregation functions.

    On platforms where we support 128bit integers, use them to implement
    faster transition functions for sum(int8), avg(int8),
    var_*(int2/int4),stdev_*(int2/int4). Where not supported continue to use
    numeric as a transition type.

    In some synthetic benchmarks this has been shown to provide significant
    speedups.

    Bumps catversion.

    Discussion: 544BB5F1.50709@proxel.se
    Author: Andreas Karlsson
    Reviewed-By: Peter Geoghegan, Petr Jelinek, Andres Freund,
        Oskari Saarenmaa, David Rowley

325e6fcd

Rename "prelim function" to "combine function", to match upstream. · b8545d57

由 Heikki Linnakangas 提交于 8月 31, 2018

The GPDB "prelim" functions did the same things as the "combine"
functions introduced in PostgreSQL 9.6 This commit includes just the
catalog changes, to essentially search & replace "prelim" with
"combine". I did not pick the planner and executor changes that were
made as part of this in the upstream, yet.

Also replace the GPDB implementation of float8_amalg() and
float8_regr_amalg(), with the upstream float8_combine() and
float8_regr_combine(). They do the same thing, but let's use upstream
functions where possible.

Upstream commits:
commit a7de3dc5
Author: Robert Haas <rhaas@postgresql.org>
Date:   Wed Jan 20 13:46:50 2016 -0500

    Support multi-stage aggregation.

    Aggregate nodes now have two new modes: a "partial" mode where they
    output the unfinalized transition state, and a "finalize" mode where
    they accept unfinalized transition states rather than individual
    values as input.

    These new modes are not used anywhere yet, but they will be necessary
    for parallel aggregation.  The infrastructure also figures to be
    useful for cases where we want to aggregate local data and remote
    data via the FDW interface, and want to bring back partial aggregates
    from the remote side that can then be combined with locally generated
    partial aggregates to produce the final value.  It may also be useful
    even when neither FDWs nor parallelism are in play, as explained in
    the comments in nodeAgg.c.

    David Rowley and Simon Riggs, reviewed by KaiGai Kohei, Heikki
    Linnakangas, Haribabu Kommi, and me.

commit af025eed
Author: Robert Haas <rhaas@postgresql.org>
Date:   Fri Apr 8 13:44:50 2016 -0400

    Add combine functions for various floating-point aggregates.

    This allows parallel aggregation to use them.  It may seem surprising
    that we use float8_combine for both float4_accum and float8_accum
    transition functions, but that's because those functions differ only
    in the type of the non-transition-state argument.

    Haribabu Kommi, reviewed by David Rowley and Tomas Vondra

b8545d57

25 8月, 2018 1 次提交

Add tests ensuring correct handling of Full and left outer joins · 17de967d

由 Dhanashree Kashid 提交于 8月 23, 2018

1. Add a test for a full outer join query on varchar columns
In such scenario, planner expects a relabletype node on top of varchar
column while looking up for a Sort operator. Please refer commit fab435e
for more details.  Add a test for such queries and disable hashjoin to
make sure that a planner is able to generate a plan with merge join
successfully.

2. Add a test for a query with an Agg and left outer join
This test is to ensure that ORCA produces correct results, by performing
a two stage aggregation on top of a co-located join. Corresponding plan
test has been added in the ORCA test suite.

17de967d

15 8月, 2018 1 次提交

Refactor allow_system_table_mods into a boolean GUC (#5407) · 4c24d744

由 David Kimura 提交于 8月 15, 2018

The purpose of this refactor is to more closely align the GUC with postgres. It
started as a suggestion in https://github.com/greenplum-db/gpdb/pull/4790.
There are still differences, particularly around when this GUC can be set. In
GPDB it can be set by anyone at any time (PGC_USERSET), however in postgres it
is limited to postmaster restart (PGC_POSTMASTER). This difference was kept on
purpose until we have more buy-in as it is a bigger change on the end-user.
Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>

4c24d744

03 8月, 2018 1 次提交
- K
  Revert "Merge with PostgreSQL 9.2beta2." · e0aa3ef2
  由 Karen Huddleston 提交于 8月 02, 2018
```
This reverts commit 4750e1b6.
```
  e0aa3ef2
02 8月, 2018 1 次提交

Merge with PostgreSQL 9.2beta2. · 4750e1b6

由 Richard Guo 提交于 8月 02, 2018

This is the final batch of commits from PostgreSQL 9.2 development,
up to the point where the REL9_2_STABLE branch was created, and 9.3
development started on the PostgreSQL master branch.

Notable upstream changes:

* Index-only scan was included in the batch of upstream commits. It
  allows queries to retrieve data only from indexes, avoiding heap access.

* Group commit was added to work effectively under heavy load. Previously,
  batching of commits became ineffective as the write workload increased,
  because of internal lock contention.

* A new fast-path lock mechanism was added to reduce the overhead of
  taking and releasing certain types of locks which are taken and released
  very frequently but rarely conflict.

* The new "parameterized path" mechanism was added. It allows inner index
  scans to use values from relations that are more than one join level up
  from the scan. This can greatly improve performance in situations where
  semantic restrictions (such as outer joins) limit the allowed join orderings.

* SP-GiST (Space-Partitioned GiST) index access method was added to support
  unbalanced partitioned search structures. For suitable problems, SP-GiST can
  be faster than GiST in both index build time and search time.

* Checkpoints now are performed by a dedicated background process. Formerly
  the background writer did both dirty-page writing and checkpointing. Separating
  this into two processes allows each goal to be accomplished more predictably.

* Custom plan was supported for specific parameter values even when using
  prepared statements.

* API for FDW was improved to provide multiple access "paths" for their tables,
  allowing more flexibility in join planning.

* Security_barrier option was added for views to prevents optimizations that
  might allow view-protected data to be exposed to users.

* Range data type was added to store a lower and upper bound belonging to its
  base data type.

* CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The
  SELECT query is planned during the execution of the utility. To conform to
  this change, GPDB executes the utility statement only on QD and dispatches
  the plan of the SELECT query to QEs.
Co-authored-by: NAdam Lee <ali@pivotal.io>
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: NAsim R P <apraveen@pivotal.io>
Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
Co-authored-by: NGang Xiong <gxiong@pivotal.io>
Co-authored-by: NHaozhou Wang <hawang@pivotal.io>
Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
Co-authored-by: NPaul Guo <paulguo@gmail.com>
Co-authored-by: NRichard Guo <guofenglinux@gmail.com>
Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>

4750e1b6

19 6月, 2018 1 次提交

Utilize hyperloglog and merge utilities to derive root table statistics · 9c1b1ae3

由 Omer Arap 提交于 1月 12, 2018

This commit introduces an end-to-end scalable solution to generate
statistics of the root partitions. This is done by merging the
statistics of leaf partition tables to generate the statistics of the
root partition. Therefore, ability to merge leaf table statistics for
the root table makes analyze very incremental and stable.

**CHANGES IN LEAF TABLE STATS COLLECTION:**

Incremental analyze will create sample for each partition as the
previous version. While analyzing the sample and generating statistics
for the partition, it will also create a `hyperloglog_counter` data
structure and add values from the sample to the `hyperloglog_counter`
such as number of multiples and sample size. Once the entire sample is
processed, analyze will save the `hyperloglog_counter` as a byte array
in `pg_statistic` catalog table. We reserve a slot for the
`hyperlog_counter` in the table and signify this as a specific type of
statistic kind which is `STATISTIC_KIND_HLL`. We only keep the
`hyperloglog_counter` in the `pg_catalog` for the leaf partitions. If
the user chooses to run FULL scan for HLL, we signify the kind as
`STATISTIC_KIND_FULLHLL`.

**MERGING LEAF STATISTICS**

Once all the leaf partitions are analyzed, we analyze the root
partition. Initially, we check if all the partitions have been analyzed
properly and have all the statistics available to us in the
`pg_statistic` catalog table. If there is a partition with no tuples,
even though it has no entry in `pg_catalog`, we consider it as analyzed.
If for some reason a single partition is not analyzed, we fall back to
the original analyze algorithm that requires to acquire sample for the
root partition and calculate statistic based on the sample.

Merging null fraction and average width from leaf partition statistics
is trivial and does not involve significant challenge. We do calculate
them first. Then, the remaining statistics information are:

- Number of distinct values (NDV)

- Most common values (MCV), and their frequencies termed as most common
frequency (MCF)

- Histograms that represent the distribution of the data values in the
table

**Merging NDV:**

Hyperloglog provides a functionality to merge multiple
`hyperloglog_counter`s into one and calculate the number of distinct
values using the aggregated `hyperlog_counter`. This aggregated
`hyperlog_counter` is sufficient only if the user chooses to run full
scan for hyperloglog. In the sample based approach, without the
hyperloglog algorithm, derivation of number of distinct values is not
possible. Hyperloglog enables us to merge the `hyperloglog_counter`s
from each partition and calculate the NDV on the merged
`hyperloglog_counter` with an acceptable error rate. However, it does
not give us the ultimate NDV of the root partition, it provides us the
NDV of the union of the samples from each partition.

The rest of the NDV interpolation depends on four metrics in postgres
and based on the formula used in postgres: NDV in the sample, number of
multiple values in the sample, sample size and total rows in the table.
Using these values the algorithm calculates the approximate NDV for the
table. While merging the statistics from the leaf partitions, with the
help of hyperloglog we can accurately generate NDV for the sample,
sample size and total rows, however, number of multiples in the
accumulated sample is unknown since we do not have an access to the
accumulated sample at this point.

_Number of Multiples_

Our approach to estimate the number of multiples in the aggregated
sample (which itself is unavailable) for the root requires the
availability of NDVs, number of multiples and size of each leaf sample.
The NDVs in each sample is trivial to calculate using the partition's
`hyperloglog_counter`. The number of multiples and sample size for each
partition is saved in the `hyperloglog_counter` of the partition to be
used in the merge during the leaf statistics gathering.

Estimating the number of multiples in the aggregate sample for the root
partition is a two step process. First, we accurately estimate the
number of values that reside in more than one partition's sample. Then,
we estimate the number of multiples that uniquely exists in a single
partition. Finally, we add these values to estimate the overall number
of multiples in the aggregate sample of the root partition.

To count the number of values that uniquely exists in one single
partition, we utilize hyperloglog functionality. We can easily estimate
how many values appear only on a specific partition _i_. We call the NDV
of overall aggregate of the entire partition as `NDV_all` and NDV of
aggregate of all partitions but _i_ as `NDV_minus_i`. The difference of
`NDV_all` and  `NDV_minus_i` would result in the values that appear in
only one partition. The rest of the values will contribute to the
overall number of multiples in the root’s aggregated sample, and we call
them as `nMultiple_inter` as the number of values that appear in more
than one partition.

However, that is not enough since even a single value only resides in
one partition, the partition might have multiple of them. We need a way
to express the possibility of existence of these values. Remember that
we also account the number of multiples that uniquely in partition
sample. We already know the number of multiples inside a partition
sample, however we need to normalize this value with the proportion of
the number of values unique to the partition sample to the number of
distinct values of the partition sample. The normalized value would be
partition sample i’s contribution to the overall calculation of the
nMultiple.

Finally, `nMultiple_root` would be the sum of the `nMultiple_inter` and
`normalized_m_i` for each partition sample.

**Merging MCVs:**

We utilize the merge functionality we imported from the 4.3 version of
the greenplum DB. The algorithm is trivial. We convert each MCV’s
frequency into count and add them up if they appear in more than one
partition. After every possible candidate’s count has been calculated,
we sort the candidate values and pick the top ones which is defined by
the `default_statistics_target`. 4.3 previously blindly picks the top
values with the highest count. We however incorporated the same logic
used in the current greenplum and postgres and test if a values is a
real MCV by running some tests. Therefore, even after the merge, the
logic totally aligns with the postgres.

**Merging Histograms:**

One of the main novel contribution of this commit comes in how we merge
the histograms from the leaf partitions. In 4.3 we use priority queue to
merge the histogram from the leaf partition. However, that approach is
very naive and loses very important statistical information. In
postgres, histogram is calculated over the values that did not qualify
as an MCV. The merge logic for the histograms in 4.3, did not take this
into consideration and significant statistical information is lost while
we merge the MCV values.

We introduce a novel approach to feed the MCV’s from the leaf partitions
that did not qualify as a root MCV to the histogram merge logic. To
fully utilize the previously implemented priority queue logic, we
treated non-qualified MCV’s as the histograms of a so called `dummy`
partitions. To be more previcate, if an MCV m1 is a non-qualified MCV we
create a histogram [m1, m1] where it only has one bucket and the bucket
size is the count of this non-qualified MCV. When we merge the
histograms of the leaf partitions and these dummy partitions the merged
histogram would not lose any statistical information.
Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

9c1b1ae3

05 4月, 2018 1 次提交

Fix fallback test in gporca.sql · 34878131

由 Dhanashree Kashid 提交于 3月 28, 2018

This test was added to check the logging of ORCA fall-back messages. The
query contains CUBE grouping extension which is currently not supported
by ORCA causing ORCA to fall back to planner with following log
messages:

LOG:  NOTICE,"Feature not supported by the Pivotal Query Optimizer:
Cube",
LOG:  Planner produced plan :0

The planner generated plan contains a Shared Scan node.  During
execution of this, sometimes, there is an extra log message generated
indicating that Shared Scan writer is waiting for an acknowledgement
from Shared Scan readers:

LOG: SISC WRITER (shareid=0, slice=1): notify still wait for an answer,
errno 4

The query returns successfully however this intermittently generated log
message causes this test to fail.
This commit fixes the flake by converting this to an EXPLAIN test, which
is sufficient to demonstrate the fall back logging.

34878131

14 2月, 2018 1 次提交
- S
  
  Add tests for handling of NULL datum in CDefaultComparator in ORCA · 34d5fc8c
  由 sambitesh 提交于 2月 13, 2018
  
  34d5fc8c
09 2月, 2018 1 次提交

Fix more whitespace in tests, mostly in expected output. · 93b92ca4

由 Heikki Linnakangas 提交于 2月 09, 2018

Commit ce3153fa, about to be merged from PostgreSQL 9.0 soon, removes
the -w option from pg_regress's "diff" invocation. That commit will fix
all the PostgreSQL regression tests to pass without it, but we need to
also fix all the GPDB tests. That's what this commit does.

I did much of this in commit 06a2bb64, but now that we're about to
actually merge that, more cases popped up.

Co-Author: Daniel Gustafsson <dgustafsson@pivotal.io>

93b92ca4

18 1月, 2018 1 次提交

Fix whitespace in tests, mostly in expected output. · 06a2bb64

由 Heikki Linnakangas 提交于 1月 18, 2018

06a2bb64

06 1月, 2018 1 次提交

Faithfully translate a cast over param for subplan testexpr · b83a17bd

由 Sambitesh Dash 提交于 12月 20, 2017

Instead of assuming that casts are always binary coercible (and hence that we
could get away with just dropping them), translate casts in ORCA plans into
either a RelabelType or a FuncExpr.
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>

b83a17bd

05 1月, 2018 1 次提交

set search_path and stop dropping schema in gporca test · c7ab6924

由 Jesse Zhang 提交于 1月 02, 2018

The `gporca` regression test suite uses a schema but doesn't really
switch `search_path` to the schema that's meant to encapsulate most of
the objects it uses. This has led to multiple instances where we:
  1. Either used a table from another namespace by accident;
  2. Or we leaked objects into the public namespace that other tests in
  turned accidentally depended on.

As we were about to add a few user-defined types and casts to the test
suite, we want to (at last) ensure that all future additions are scoped
to the namespace.
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>

Closes #4238

c7ab6924

21 12月, 2017 2 次提交
- H
  Revert "Update ICG test expected output files" · 8ffb673b
  由 Haisheng Yuan 提交于 12月 19, 2017
```
This reverts commit 4fac169fb1204de54a05ac14fba1a5e4d9f82c08.
```
  8ffb673b
- H
  
  Update ICG test expected output files · 67788bb1
  由 Haisheng Yuan 提交于 12月 18, 2017
  
  67788bb1
13 12月, 2017 1 次提交

Ensure that ORCA is not called on any process other than the master QD · 916f460f

由 Shreedhar Hardikar 提交于 12月 08, 2017

We don't want to use the optimizer for planning queries in SQL, pl/pgSQL
etc. functions when that is done on the segments.

ORCA excels in complex queries, most of which will access distributed
tables. We can't run such queries from the segments slices anyway
because they require dispatching a query within another - which is not
allowed in GPDB. Note that this restriction also applies to non-QD
master slices.  Furthermore, ORCA doesn't currently support pl/*
statements (relevant when they are planned on the segments).

For these reasons, restrict to using ORCA on the master QD processes
only.

Also revert commit d79a2c7f ("Fix pipeline failures caused by 0dfd0ebc.")
and separate out gporca fault injector tests in newly added
gporca_faults.sql so that the rest can run in a parallel group.
Signed-off-by: NJesse Zhang <sbjesse@gmail.com>

916f460f

04 12月, 2017 1 次提交

Fix pipeline failures caused by . · d79a2c7f

由 Shreedhar Hardikar 提交于 12月 03, 2017

Move gporca regression test out of the parallel group so that
gp_fault_injector functionality works correctly.
Also, as it turns out, ORCA is used to run pg/PLSQL queries sometimes
even when the GUC optimizer is set to off. So when gporca sets up the
gp_fault_injector, it gets activated later on in parallel group
qp_functions_in_from test is part of. So, reset the fault in gporca just
in case.

d79a2c7f

02 12月, 2017 2 次提交

Support optimization interrupts in ORCA · 0dfd0ebc

由 Shreedhar Hardikar 提交于 11月 02, 2017

To support that, this commit adds 2 new ORCA APIs:
- SignalInterruptGPOPT(), which notifies ORCA that an abort is requested
  (must be called from the signal handler)
- ResetInterruptsGPOPT(), which resets ORCA's state to before the
  interruption, so that the next query can run normally (needs to be
  called only on the QD)

Also check for interrupts right after ORCA returns.

0dfd0ebc

D
Update Planner answer file for gpora · f18a3a59
由 Dhanashree 提交于 12月 01, 2017
```
This was missed in commit 407b2880
```
f18a3a59

13 11月, 2017 1 次提交

Make 'gporca' test pass regardless of log_statement setting. · 85b743e3

由 Heikki Linnakangas 提交于 11月 13, 2017

This test would only produce the LOG lines memorized in the expected output
if log_statement='all' was set. Remove the assumption, by temporarily
setting log_statement (and log_min_duration_statement), like in some
earlier tests in the same file.

85b743e3

17 10月, 2017 1 次提交

Add support for "order none" directive to atmsort. · 5390d8b7

由 Heikki Linnakangas 提交于 10月 17, 2017

This allows overriding the heuristic on whether a query has an ORDER BY.

Use the directive in one of the queries in the 'gporca' test, which
contains a subquery with an ORDER BY that fools the atmsort's usual
heuristic. The overall order of the query is not well-defined, even though
there is an ORDER BY in the subquery. The current implementation of
DISTINCT in fact always also sorts the output, which is why this test
is passing, but that is about to be relaxed soon, when we merge upstream
commit 63247bec.

5390d8b7

27 9月, 2017 2 次提交

Implement CDB like pre-join deduplication · efb2777a

由 Dhanashree Kashid, Ekta Khanna and Omer Arap 提交于 6月 20, 2017

For flattened IN or EXISTS sublinks, if we chose INNER JOIN path instead
of SEMI JOIN then we need to apply duplicate suppression.

The deduplication can be done in two ways:
1. post-join dedup
unique-ify the inner join results. try_postjoin_dedup in CdbRelDedupInfo denotes
if we need to got for post-join dedup

2. pre-join dedup
unique-ify the rows coming from the rel containing the subquery result,
before that is joined with any other rels. join_unique_ininfo in
CdbRelDedupInfo denotes if we need to go for pre-join dedup.
semi_operators and semi_rhs_exprs are used for this. We ported a
function from 9.5 to compute these in make_outerjoininfo().

Upstream has completely different implementation of this. Upstream explores JOIN_UNIQUE_INNER
and JOIN_UNIQUE_OUTER paths for this and deduplication is done create_unique_path().
GPDB does this differently since JOIN_UNIQUE_INNER and JOIN_UNIQUE_OUTER are obsolete
for us. Hence we have kept the GPDB style deduplication mechanism as it in this merge.

Post-join has been implemented in previous merge commits.

Ref [#146890743]

efb2777a

CDB Specific changes, other fix-ups after merging · e5f6e826

由 Shreedhar Hardikar 提交于 6月 08, 2017

0. Fix up post join dedup logic after cherry-pick
0. Fix pull_up_sublinks_jointree_recurse returning garbage relids
0. Update gporca, rangefuncs, eagerfree answer fileis
	1. gporca
	Previously we were generating a Hash Inner Join with an
	HashAggregate for deduplication. Now we generate a Hash
	Semi Join in which case we do not need to deduplicate the
	inner.

	2. rangefuncs
	We updated this answer file during the cherry-pick of
	e006a24a since there was a change in plan.
	After these cherry-picks, we are back to the original
	plan as master. Hence we see the original error.

	3. eagerfree
	We are generating a not-very-useful subquery scan node
	with this change. This is not producing wrong results.
	But this subqeury scan needs to be removed.
	We will file a follow-up chore to investigate and fix this.

0. We no longer need helper function `hasSemiJoin()` to check whether
this specialInfo list has any specialJoinInfos constructed for Semi Join
(IN/EXISTS sublink). We have moved that check inside
`cdb_set_cheapest_dedup()`

0. We are not exercising the pre-join-deduplication code path after
this cherry-pick. Before this merge, we had three CDB specific
nodes in `InClauseInfo` in which we recorded information for
pre-join-dedup in case of simple uncorrelated IN sublinks.
`try_join_unique`, `sub_targetlist` and `InOperators`
Since we now have `SpecialJoinInfo` instead of `InClauseInfo`, we need
to devise a way to record this information in `SpecialJoinInfo`.
We have filed a follow-up story for this.

Ref [#142356521]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

e5f6e826

21 7月, 2017 1 次提交

Fix rtable index of FunctionScan when translating GPORCA plan. · 3b24a561

由 Venkatesh Raghavan 提交于 7月 19, 2017

Arguments to the function scan can themselve have a subquery
that can create new rtable entries. Therefore, first translate all
arguments of the FunctionScan before setting scanrelid of the
FunctionScan.

3b24a561