- 14 3月, 2019 1 次提交
-
-
由 Daniel Gustafsson 提交于
As we merge with upstream and by that keep refining the Postgres planner, legacy planner is no longer a suitable name. This changes all variations of the spelling (legacy planner, legacy optimizer, legacy query optimizer etc) to say "Postgres" rather than "legacy". Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io> Reviewed-by: NDavid Yozie <dyozie@pivotal.io> Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
-
- 12 3月, 2019 1 次提交
-
-
由 Taylor Vesely 提交于
This commit is part of Add Partitioned Indexes #7047. After adding INTERNAL_AUTO, and dependencies between partitioned indexs, many tests that assumed that we need to manually delete indexes added to leaf partitions need updating. Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>
-
- 19 1月, 2019 1 次提交
-
-
由 Daniel Gustafsson 提交于
Running ANALYZE with the HLL computation produce a lot of LOG messages which are more geared towards troubleshooting than general purpose log files. Fold these under ANALYZE VERBOSE to avoid cluttering up logfiles on production systems unless explicitly asked for. Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-
- 12 1月, 2019 1 次提交
-
-
由 Bhuvnesh Chaudhary 提交于
This reverts commit dbece3da. Performance regression observed for tpcds queries due to cardinality misestimation. Impacted TPCDS queries 174, 111 and 104. Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
-
- 29 12月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
We used to call some node types different names in EXPLAIN output, depending on whether the plan was generated by ORCA or the Postgres planner. Also, a Bitmap Heap Scan used to be called differently, when the table was an AO or AOCS table, but only in planner-generated plans. There was some historical justification for this, because they used to be different executor node types, but commit db516347 removed last such differences. Full list of renames: Table Scan -> Seq Scan Append-only Scan -> Seq Scan Append-only Columnar Scan -> Seq Scan Dynamic Table Scan -> Dynamic Seq Scan Bitmap Table Scan -> Bitmap Heap Scan Bitmap Append-Only Row-Oriented Scan -> Bitmap Heap Scan Bitmap Append-Only Column-Oriented Scan -> Bitmap Heap Scan Dynamic Bitmap Table Scan -> Dynamic Bitmap Heap Scan
-
- 15 12月, 2018 1 次提交
-
-
由 Shreedhar Hardikar 提交于
Also bump ORCA's version to 3.16.0 Co-authored-by: NHans Zeller <hzeller@pivotal.io>
-
- 13 12月, 2018 1 次提交
-
-
由 Daniel Gustafsson 提交于
The Greenplum specific error handling via ereport()/elog() calls was in need of a unification effort as some parts of the code was using a different messaging style to others (and to upstream). This aims at bringing many of the GPDB error calls in line with the upstream error message writing guidelines and thus make the user experience of Greenplum more consistent. The main contributions of this patch are: * errmsg() messages shall start with a lowercase letter, and not end with a period. errhint() and errdetail() shall be complete sentences starting with capital letter and ending with a period. This attempts to fix this on as many ereport() calls as possible, with too detailed errmsg() content broken up into details and hints where possible. * Reindent ereport() calls to be more consistent with the common style used in upstream and most parts of Greenplum: ereport(ERROR, (errcode(<CODE>), errmsg("short message describing error"), errhint("Longer message as a complete sentence."))); * Avoid breaking messages due to long lines since it makes grepping for error messages harder when debugging. This is also the de facto standard in upstream code. * Convert a few internal error ereport() calls to elog(). There are no doubt more that can be converted, but the low hanging fruit has been dealt with. Also convert a few elog() calls which are user facing to ereport(). * Update the testfiles to match the new messages. Spelling and wording is mostly left for a follow-up commit, as this was getting big enough as it was. The most obvious cases have been handled but there is work left to be done here. Discussion: https://github.com/greenplum-db/gpdb/pull/6378Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io> Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-
- 03 11月, 2018 2 次提交
-
-
由 Bhuvnesh Chaudhary 提交于
-
由 Bhuvnesh Chaudhary 提交于
Co-authored-by: NSambitesh Dash <sdash@pivotal.io>
-
- 25 9月, 2018 1 次提交
-
-
由 Dhanashree Kashid 提交于
Following commits have been cherry-picked again: b1f543f3. b0359e69. a341621d. The contrib/dblink tests were failing with ORCA after the above commits. The issue has been fixed now in ORCA v3.1.0. Hence we re-enabled these commits and bumping the ORCA version.
-
- 22 9月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
We had changed this in GPDB, to print less parens. That's fine and dandy, but it hardly seems worth it to carry a diff vs upstream for this. Which format is better, is a matter of taste. The extra parens make some expressions more clear, but OTOH, it's unnecessarily verbose for simple expressions. Let's follow the upstream on this. These changes were made to GPDB back in 2006, as part of backporting to EXPLAIN-related patches from PostgreSQL 8.2. But I didn't see any explanation for this particular change in output in that commit message. It's nice to match upstream, to make merging easier. However, this won't make much difference to that: almost all EXPLAIN plans in regression tests are different from upstream anyway, because GPDB needs Motion nodes for most queries. But every little helps.
-
- 21 9月, 2018 2 次提交
-
-
由 Dhanashree Kashid 提交于
Revert following commits related to ORCA version 3.0.0 b1f543f3. b0359e69. a341621d.
-
由 Sambitesh Dash 提交于
When ON, ORCA will optimize DML queries by enforcing a non-master gather whenever possible. When off, a gather on master will be enforced instead. Default value will be ON. Also add new tests to ensure sane behavior when this optimization is turned on and fix the existing tests. Signed-off-by: NSambitesh Dash <sdash@pivotal.io> Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
-
- 08 9月, 2018 1 次提交
-
-
由 Dhanashree Kashid 提交于
Previously, while optimizing nestloop joins, ORCA always generated a blocking materialize node (cdb_strict=true). Though, this conservative nature ensured that the join node produced by ORCA will always be deadlock safe; we sometimes produced slow running plans. ORCA now has a capability of producing blocking materialize only when needed by detecting motion hazard in the nestloop join. A streaming material will be generated when there is no motion hazard. This commit adds a guc to control this behavior. When set to off, we fallback to old behavior of always producing a blocking materialize. Also bump the statement_mem for a test in segspace. After this change, for the test query, we produce a streaming spool which changes number of operator groups in memory quota calculation and query fails with: `ERROR: insufficient memory reserved for statement`. Bump the statement_mem by 1MB to test the fault injection. Also bump the orca version to 2.72.0 Signed-off-by: NAbhijit Subramanya <asubramanya@pivotal.io>
-
- 06 9月, 2018 1 次提交
-
-
由 Omer Arap 提交于
This commit adds more log messages and updates existing log messages to increase logging verbosity. Signed-off-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
-
- 01 9月, 2018 1 次提交
-
-
由 Sambitesh Dash 提交于
Given a query like below: SELECT Count(*) FROM (SELECT * FROM (SELECT tab_2.cd AS CD1, tab_2.cd AS CD2 FROM tab_1 LEFT JOIN tab_2 ON tab_1.id = tab_2.id) f UNION ALL SELECT region, code FROM tab_3)a; Previously, orca produced an incorrect filter, (cd2 = cd) on top of the project list generated for producing an alias. This led to incorrect results as column 'cd' is produced by a nullable side of LOJ (tab2) and such filter produces NULL output. Ensure orca produces correct equivalence class by considering the nullable columns. Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
-
- 31 8月, 2018 2 次提交
-
-
由 Heikki Linnakangas 提交于
Among other things, this fixes the inaccuracy of integer avg() and sum() functions. (i.e. fixes https://github.com/greenplum-db/gpdb/issues/5525) The upstream versions are from PostgreSQL 9.6, using the 128-bit math from the following commit: commit 959277a4 Author: Andres Freund <andres@anarazel.de> Date: Fri Mar 20 10:26:17 2015 +0100 Use 128-bit math to accelerate some aggregation functions. On platforms where we support 128bit integers, use them to implement faster transition functions for sum(int8), avg(int8), var_*(int2/int4),stdev_*(int2/int4). Where not supported continue to use numeric as a transition type. In some synthetic benchmarks this has been shown to provide significant speedups. Bumps catversion. Discussion: 544BB5F1.50709@proxel.se Author: Andreas Karlsson Reviewed-By: Peter Geoghegan, Petr Jelinek, Andres Freund, Oskari Saarenmaa, David Rowley
-
由 Heikki Linnakangas 提交于
The GPDB "prelim" functions did the same things as the "combine" functions introduced in PostgreSQL 9.6 This commit includes just the catalog changes, to essentially search & replace "prelim" with "combine". I did not pick the planner and executor changes that were made as part of this in the upstream, yet. Also replace the GPDB implementation of float8_amalg() and float8_regr_amalg(), with the upstream float8_combine() and float8_regr_combine(). They do the same thing, but let's use upstream functions where possible. Upstream commits: commit a7de3dc5 Author: Robert Haas <rhaas@postgresql.org> Date: Wed Jan 20 13:46:50 2016 -0500 Support multi-stage aggregation. Aggregate nodes now have two new modes: a "partial" mode where they output the unfinalized transition state, and a "finalize" mode where they accept unfinalized transition states rather than individual values as input. These new modes are not used anywhere yet, but they will be necessary for parallel aggregation. The infrastructure also figures to be useful for cases where we want to aggregate local data and remote data via the FDW interface, and want to bring back partial aggregates from the remote side that can then be combined with locally generated partial aggregates to produce the final value. It may also be useful even when neither FDWs nor parallelism are in play, as explained in the comments in nodeAgg.c. David Rowley and Simon Riggs, reviewed by KaiGai Kohei, Heikki Linnakangas, Haribabu Kommi, and me. commit af025eed Author: Robert Haas <rhaas@postgresql.org> Date: Fri Apr 8 13:44:50 2016 -0400 Add combine functions for various floating-point aggregates. This allows parallel aggregation to use them. It may seem surprising that we use float8_combine for both float4_accum and float8_accum transition functions, but that's because those functions differ only in the type of the non-transition-state argument. Haribabu Kommi, reviewed by David Rowley and Tomas Vondra
-
- 25 8月, 2018 1 次提交
-
-
由 Dhanashree Kashid 提交于
1. Add a test for a full outer join query on varchar columns In such scenario, planner expects a relabletype node on top of varchar column while looking up for a Sort operator. Please refer commit fab435e for more details. Add a test for such queries and disable hashjoin to make sure that a planner is able to generate a plan with merge join successfully. 2. Add a test for a query with an Agg and left outer join This test is to ensure that ORCA produces correct results, by performing a two stage aggregation on top of a co-located join. Corresponding plan test has been added in the ORCA test suite.
-
- 15 8月, 2018 1 次提交
-
-
由 David Kimura 提交于
The purpose of this refactor is to more closely align the GUC with postgres. It started as a suggestion in https://github.com/greenplum-db/gpdb/pull/4790. There are still differences, particularly around when this GUC can be set. In GPDB it can be set by anyone at any time (PGC_USERSET), however in postgres it is limited to postmaster restart (PGC_POSTMASTER). This difference was kept on purpose until we have more buy-in as it is a bigger change on the end-user. Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
-
- 03 8月, 2018 1 次提交
-
-
由 Karen Huddleston 提交于
This reverts commit 4750e1b6.
-
- 02 8月, 2018 1 次提交
-
-
由 Richard Guo 提交于
This is the final batch of commits from PostgreSQL 9.2 development, up to the point where the REL9_2_STABLE branch was created, and 9.3 development started on the PostgreSQL master branch. Notable upstream changes: * Index-only scan was included in the batch of upstream commits. It allows queries to retrieve data only from indexes, avoiding heap access. * Group commit was added to work effectively under heavy load. Previously, batching of commits became ineffective as the write workload increased, because of internal lock contention. * A new fast-path lock mechanism was added to reduce the overhead of taking and releasing certain types of locks which are taken and released very frequently but rarely conflict. * The new "parameterized path" mechanism was added. It allows inner index scans to use values from relations that are more than one join level up from the scan. This can greatly improve performance in situations where semantic restrictions (such as outer joins) limit the allowed join orderings. * SP-GiST (Space-Partitioned GiST) index access method was added to support unbalanced partitioned search structures. For suitable problems, SP-GiST can be faster than GiST in both index build time and search time. * Checkpoints now are performed by a dedicated background process. Formerly the background writer did both dirty-page writing and checkpointing. Separating this into two processes allows each goal to be accomplished more predictably. * Custom plan was supported for specific parameter values even when using prepared statements. * API for FDW was improved to provide multiple access "paths" for their tables, allowing more flexibility in join planning. * Security_barrier option was added for views to prevents optimizations that might allow view-protected data to be exposed to users. * Range data type was added to store a lower and upper bound belonging to its base data type. * CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The SELECT query is planned during the execution of the utility. To conform to this change, GPDB executes the utility statement only on QD and dispatches the plan of the SELECT query to QEs. Co-authored-by: NAdam Lee <ali@pivotal.io> Co-authored-by: NAlexandra Wang <lewang@pivotal.io> Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io> Co-authored-by: NAsim R P <apraveen@pivotal.io> Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io> Co-authored-by: NGang Xiong <gxiong@pivotal.io> Co-authored-by: NHaozhou Wang <hawang@pivotal.io> Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io> Co-authored-by: NJesse Zhang <sbjesse@gmail.com> Co-authored-by: NJinbao Chen <jinchen@pivotal.io> Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io> Co-authored-by: NMelanie Plageman <mplageman@pivotal.io> Co-authored-by: NPaul Guo <paulguo@gmail.com> Co-authored-by: NRichard Guo <guofenglinux@gmail.com> Co-authored-by: NShujie Zhang <shzhang@pivotal.io> Co-authored-by: NTaylor Vesely <tvesely@pivotal.io> Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
-
- 19 6月, 2018 1 次提交
-
-
由 Omer Arap 提交于
This commit introduces an end-to-end scalable solution to generate statistics of the root partitions. This is done by merging the statistics of leaf partition tables to generate the statistics of the root partition. Therefore, ability to merge leaf table statistics for the root table makes analyze very incremental and stable. **CHANGES IN LEAF TABLE STATS COLLECTION:** Incremental analyze will create sample for each partition as the previous version. While analyzing the sample and generating statistics for the partition, it will also create a `hyperloglog_counter` data structure and add values from the sample to the `hyperloglog_counter` such as number of multiples and sample size. Once the entire sample is processed, analyze will save the `hyperloglog_counter` as a byte array in `pg_statistic` catalog table. We reserve a slot for the `hyperlog_counter` in the table and signify this as a specific type of statistic kind which is `STATISTIC_KIND_HLL`. We only keep the `hyperloglog_counter` in the `pg_catalog` for the leaf partitions. If the user chooses to run FULL scan for HLL, we signify the kind as `STATISTIC_KIND_FULLHLL`. **MERGING LEAF STATISTICS** Once all the leaf partitions are analyzed, we analyze the root partition. Initially, we check if all the partitions have been analyzed properly and have all the statistics available to us in the `pg_statistic` catalog table. If there is a partition with no tuples, even though it has no entry in `pg_catalog`, we consider it as analyzed. If for some reason a single partition is not analyzed, we fall back to the original analyze algorithm that requires to acquire sample for the root partition and calculate statistic based on the sample. Merging null fraction and average width from leaf partition statistics is trivial and does not involve significant challenge. We do calculate them first. Then, the remaining statistics information are: - Number of distinct values (NDV) - Most common values (MCV), and their frequencies termed as most common frequency (MCF) - Histograms that represent the distribution of the data values in the table **Merging NDV:** Hyperloglog provides a functionality to merge multiple `hyperloglog_counter`s into one and calculate the number of distinct values using the aggregated `hyperlog_counter`. This aggregated `hyperlog_counter` is sufficient only if the user chooses to run full scan for hyperloglog. In the sample based approach, without the hyperloglog algorithm, derivation of number of distinct values is not possible. Hyperloglog enables us to merge the `hyperloglog_counter`s from each partition and calculate the NDV on the merged `hyperloglog_counter` with an acceptable error rate. However, it does not give us the ultimate NDV of the root partition, it provides us the NDV of the union of the samples from each partition. The rest of the NDV interpolation depends on four metrics in postgres and based on the formula used in postgres: NDV in the sample, number of multiple values in the sample, sample size and total rows in the table. Using these values the algorithm calculates the approximate NDV for the table. While merging the statistics from the leaf partitions, with the help of hyperloglog we can accurately generate NDV for the sample, sample size and total rows, however, number of multiples in the accumulated sample is unknown since we do not have an access to the accumulated sample at this point. _Number of Multiples_ Our approach to estimate the number of multiples in the aggregated sample (which itself is unavailable) for the root requires the availability of NDVs, number of multiples and size of each leaf sample. The NDVs in each sample is trivial to calculate using the partition's `hyperloglog_counter`. The number of multiples and sample size for each partition is saved in the `hyperloglog_counter` of the partition to be used in the merge during the leaf statistics gathering. Estimating the number of multiples in the aggregate sample for the root partition is a two step process. First, we accurately estimate the number of values that reside in more than one partition's sample. Then, we estimate the number of multiples that uniquely exists in a single partition. Finally, we add these values to estimate the overall number of multiples in the aggregate sample of the root partition. To count the number of values that uniquely exists in one single partition, we utilize hyperloglog functionality. We can easily estimate how many values appear only on a specific partition _i_. We call the NDV of overall aggregate of the entire partition as `NDV_all` and NDV of aggregate of all partitions but _i_ as `NDV_minus_i`. The difference of `NDV_all` and `NDV_minus_i` would result in the values that appear in only one partition. The rest of the values will contribute to the overall number of multiples in the root’s aggregated sample, and we call them as `nMultiple_inter` as the number of values that appear in more than one partition. However, that is not enough since even a single value only resides in one partition, the partition might have multiple of them. We need a way to express the possibility of existence of these values. Remember that we also account the number of multiples that uniquely in partition sample. We already know the number of multiples inside a partition sample, however we need to normalize this value with the proportion of the number of values unique to the partition sample to the number of distinct values of the partition sample. The normalized value would be partition sample i’s contribution to the overall calculation of the nMultiple. Finally, `nMultiple_root` would be the sum of the `nMultiple_inter` and `normalized_m_i` for each partition sample. **Merging MCVs:** We utilize the merge functionality we imported from the 4.3 version of the greenplum DB. The algorithm is trivial. We convert each MCV’s frequency into count and add them up if they appear in more than one partition. After every possible candidate’s count has been calculated, we sort the candidate values and pick the top ones which is defined by the `default_statistics_target`. 4.3 previously blindly picks the top values with the highest count. We however incorporated the same logic used in the current greenplum and postgres and test if a values is a real MCV by running some tests. Therefore, even after the merge, the logic totally aligns with the postgres. **Merging Histograms:** One of the main novel contribution of this commit comes in how we merge the histograms from the leaf partitions. In 4.3 we use priority queue to merge the histogram from the leaf partition. However, that approach is very naive and loses very important statistical information. In postgres, histogram is calculated over the values that did not qualify as an MCV. The merge logic for the histograms in 4.3, did not take this into consideration and significant statistical information is lost while we merge the MCV values. We introduce a novel approach to feed the MCV’s from the leaf partitions that did not qualify as a root MCV to the histogram merge logic. To fully utilize the previously implemented priority queue logic, we treated non-qualified MCV’s as the histograms of a so called `dummy` partitions. To be more previcate, if an MCV m1 is a non-qualified MCV we create a histogram [m1, m1] where it only has one bucket and the bucket size is the count of this non-qualified MCV. When we merge the histograms of the leaf partitions and these dummy partitions the merged histogram would not lose any statistical information. Signed-off-by: NJesse Zhang <sbjesse@gmail.com> Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
- 05 4月, 2018 1 次提交
-
-
由 Dhanashree Kashid 提交于
This test was added to check the logging of ORCA fall-back messages. The query contains CUBE grouping extension which is currently not supported by ORCA causing ORCA to fall back to planner with following log messages: LOG: NOTICE,"Feature not supported by the Pivotal Query Optimizer: Cube", LOG: Planner produced plan :0 The planner generated plan contains a Shared Scan node. During execution of this, sometimes, there is an extra log message generated indicating that Shared Scan writer is waiting for an acknowledgement from Shared Scan readers: LOG: SISC WRITER (shareid=0, slice=1): notify still wait for an answer, errno 4 The query returns successfully however this intermittently generated log message causes this test to fail. This commit fixes the flake by converting this to an EXPLAIN test, which is sufficient to demonstrate the fall back logging.
-
- 14 2月, 2018 1 次提交
-
-
由 sambitesh 提交于
-
- 09 2月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
Commit ce3153fa, about to be merged from PostgreSQL 9.0 soon, removes the -w option from pg_regress's "diff" invocation. That commit will fix all the PostgreSQL regression tests to pass without it, but we need to also fix all the GPDB tests. That's what this commit does. I did much of this in commit 06a2bb64, but now that we're about to actually merge that, more cases popped up. Co-Author: Daniel Gustafsson <dgustafsson@pivotal.io>
-
- 18 1月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
Commit ce3153fa, about to be merged from PostgreSQL 9.0 soon, removes the -w option from pg_regress's "diff" invocation. That commit will fix all the PostgreSQL regression tests to pass without it, but we need to also fix all the GPDB tests. That's what this commit does.
-
- 06 1月, 2018 1 次提交
-
-
由 Sambitesh Dash 提交于
Instead of assuming that casts are always binary coercible (and hence that we could get away with just dropping them), translate casts in ORCA plans into either a RelabelType or a FuncExpr. Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
-
- 05 1月, 2018 1 次提交
-
-
由 Jesse Zhang 提交于
The `gporca` regression test suite uses a schema but doesn't really switch `search_path` to the schema that's meant to encapsulate most of the objects it uses. This has led to multiple instances where we: 1. Either used a table from another namespace by accident; 2. Or we leaked objects into the public namespace that other tests in turned accidentally depended on. As we were about to add a few user-defined types and casts to the test suite, we want to (at last) ensure that all future additions are scoped to the namespace. Signed-off-by: NSambitesh Dash <sdash@pivotal.io> Closes #4238
-
- 21 12月, 2017 2 次提交
-
-
由 Haisheng Yuan 提交于
This reverts commit 4fac169fb1204de54a05ac14fba1a5e4d9f82c08.
-
由 Haisheng Yuan 提交于
-
- 13 12月, 2017 1 次提交
-
-
由 Shreedhar Hardikar 提交于
We don't want to use the optimizer for planning queries in SQL, pl/pgSQL etc. functions when that is done on the segments. ORCA excels in complex queries, most of which will access distributed tables. We can't run such queries from the segments slices anyway because they require dispatching a query within another - which is not allowed in GPDB. Note that this restriction also applies to non-QD master slices. Furthermore, ORCA doesn't currently support pl/* statements (relevant when they are planned on the segments). For these reasons, restrict to using ORCA on the master QD processes only. Also revert commit d79a2c7f ("Fix pipeline failures caused by 0dfd0ebc.") and separate out gporca fault injector tests in newly added gporca_faults.sql so that the rest can run in a parallel group. Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
-
- 04 12月, 2017 1 次提交
-
-
由 Shreedhar Hardikar 提交于
Move gporca regression test out of the parallel group so that gp_fault_injector functionality works correctly. Also, as it turns out, ORCA is used to run pg/PLSQL queries sometimes even when the GUC optimizer is set to off. So when gporca sets up the gp_fault_injector, it gets activated later on in parallel group qp_functions_in_from test is part of. So, reset the fault in gporca just in case.
-
- 02 12月, 2017 2 次提交
-
-
由 Shreedhar Hardikar 提交于
To support that, this commit adds 2 new ORCA APIs: - SignalInterruptGPOPT(), which notifies ORCA that an abort is requested (must be called from the signal handler) - ResetInterruptsGPOPT(), which resets ORCA's state to before the interruption, so that the next query can run normally (needs to be called only on the QD) Also check for interrupts right after ORCA returns.
-
由 Dhanashree 提交于
This was missed in commit 407b2880
-
- 13 11月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
This test would only produce the LOG lines memorized in the expected output if log_statement='all' was set. Remove the assumption, by temporarily setting log_statement (and log_min_duration_statement), like in some earlier tests in the same file.
-
- 17 10月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
This allows overriding the heuristic on whether a query has an ORDER BY. Use the directive in one of the queries in the 'gporca' test, which contains a subquery with an ORDER BY that fools the atmsort's usual heuristic. The overall order of the query is not well-defined, even though there is an ORDER BY in the subquery. The current implementation of DISTINCT in fact always also sorts the output, which is why this test is passing, but that is about to be relaxed soon, when we merge upstream commit 63247bec.
-
- 27 9月, 2017 2 次提交
-
-
For flattened IN or EXISTS sublinks, if we chose INNER JOIN path instead of SEMI JOIN then we need to apply duplicate suppression. The deduplication can be done in two ways: 1. post-join dedup unique-ify the inner join results. try_postjoin_dedup in CdbRelDedupInfo denotes if we need to got for post-join dedup 2. pre-join dedup unique-ify the rows coming from the rel containing the subquery result, before that is joined with any other rels. join_unique_ininfo in CdbRelDedupInfo denotes if we need to go for pre-join dedup. semi_operators and semi_rhs_exprs are used for this. We ported a function from 9.5 to compute these in make_outerjoininfo(). Upstream has completely different implementation of this. Upstream explores JOIN_UNIQUE_INNER and JOIN_UNIQUE_OUTER paths for this and deduplication is done create_unique_path(). GPDB does this differently since JOIN_UNIQUE_INNER and JOIN_UNIQUE_OUTER are obsolete for us. Hence we have kept the GPDB style deduplication mechanism as it in this merge. Post-join has been implemented in previous merge commits. Ref [#146890743]
-
由 Shreedhar Hardikar 提交于
0. Fix up post join dedup logic after cherry-pick 0. Fix pull_up_sublinks_jointree_recurse returning garbage relids 0. Update gporca, rangefuncs, eagerfree answer fileis 1. gporca Previously we were generating a Hash Inner Join with an HashAggregate for deduplication. Now we generate a Hash Semi Join in which case we do not need to deduplicate the inner. 2. rangefuncs We updated this answer file during the cherry-pick of e006a24a since there was a change in plan. After these cherry-picks, we are back to the original plan as master. Hence we see the original error. 3. eagerfree We are generating a not-very-useful subquery scan node with this change. This is not producing wrong results. But this subqeury scan needs to be removed. We will file a follow-up chore to investigate and fix this. 0. We no longer need helper function `hasSemiJoin()` to check whether this specialInfo list has any specialJoinInfos constructed for Semi Join (IN/EXISTS sublink). We have moved that check inside `cdb_set_cheapest_dedup()` 0. We are not exercising the pre-join-deduplication code path after this cherry-pick. Before this merge, we had three CDB specific nodes in `InClauseInfo` in which we recorded information for pre-join-dedup in case of simple uncorrelated IN sublinks. `try_join_unique`, `sub_targetlist` and `InOperators` Since we now have `SpecialJoinInfo` instead of `InClauseInfo`, we need to devise a way to record this information in `SpecialJoinInfo`. We have filed a follow-up story for this. Ref [#142356521] Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
-
- 21 7月, 2017 1 次提交
-
-
由 Venkatesh Raghavan 提交于
Arguments to the function scan can themselve have a subquery that can create new rtable entries. Therefore, first translate all arguments of the FunctionScan before setting scanrelid of the FunctionScan.
-