1. 14 3月, 2019 1 次提交
  2. 12 3月, 2019 1 次提交
  3. 19 1月, 2019 1 次提交
  4. 12 1月, 2019 1 次提交
  5. 29 12月, 2018 1 次提交
    • H
      Call executor nodes the same, whether generated by planner or ORCA. · 455b9a19
      Heikki Linnakangas 提交于
      We used to call some node types different names in EXPLAIN output,
      depending on whether the plan was generated by ORCA or the Postgres
      planner. Also, a Bitmap Heap Scan used to be called differently, when the
      table was an AO or AOCS table, but only in planner-generated plans. There
      was some historical justification for this, because they used to
      be different executor node types, but commit db516347 removed last such
      differences.
      
      Full list of renames:
      
      Table Scan -> Seq Scan
      Append-only Scan -> Seq Scan
      Append-only Columnar Scan -> Seq Scan
      Dynamic Table Scan -> Dynamic Seq Scan
      Bitmap Table Scan -> Bitmap Heap Scan
      Bitmap Append-Only Row-Oriented Scan -> Bitmap Heap Scan
      Bitmap Append-Only Column-Oriented Scan -> Bitmap Heap Scan
      Dynamic Bitmap Table Scan -> Dynamic Bitmap Heap Scan
      455b9a19
  6. 15 12月, 2018 1 次提交
  7. 13 12月, 2018 1 次提交
    • D
      Reporting cleanup for GPDB specific errors/messages · 56540f11
      Daniel Gustafsson 提交于
      The Greenplum specific error handling via ereport()/elog() calls was
      in need of a unification effort as some parts of the code was using a
      different messaging style to others (and to upstream). This aims at
      bringing many of the GPDB error calls in line with the upstream error
      message writing guidelines and thus make the user experience of
      Greenplum more consistent.
      
      The main contributions of this patch are:
      
      * errmsg() messages shall start with a lowercase letter, and not end
        with a period. errhint() and errdetail() shall be complete sentences
        starting with capital letter and ending with a period. This attempts
        to fix this on as many ereport() calls as possible, with too detailed
        errmsg() content broken up into details and hints where possible.
      
      * Reindent ereport() calls to be more consistent with the common style
        used in upstream and most parts of Greenplum:
      
      	ereport(ERROR,
      			(errcode(<CODE>),
      			 errmsg("short message describing error"),
      			 errhint("Longer message as a complete sentence.")));
      
      * Avoid breaking messages due to long lines since it makes grepping
        for error messages harder when debugging. This is also the de facto
        standard in upstream code.
      
      * Convert a few internal error ereport() calls to elog(). There are
        no doubt more that can be converted, but the low hanging fruit has
        been dealt with. Also convert a few elog() calls which are user
        facing to ereport().
      
      * Update the testfiles to match the new messages.
      
      Spelling and wording is mostly left for a follow-up commit, as this was
      getting big enough as it was. The most obvious cases have been handled
      but there is work left to be done here.
      
      Discussion: https://github.com/greenplum-db/gpdb/pull/6378Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      56540f11
  8. 03 11月, 2018 2 次提交
  9. 25 9月, 2018 1 次提交
  10. 22 9月, 2018 1 次提交
    • H
      Change pretty-printing of expressions in EXPLAIN to match upstream. · 4c54c894
      Heikki Linnakangas 提交于
      We had changed this in GPDB, to print less parens. That's fine and dandy,
      but it hardly seems worth it to carry a diff vs upstream for this. Which
      format is better, is a matter of taste. The extra parens make some
      expressions more clear, but OTOH, it's unnecessarily verbose for simple
      expressions. Let's follow the upstream on this.
      
      These changes were made to GPDB back in 2006, as part of backporting
      to EXPLAIN-related patches from PostgreSQL 8.2. But I didn't see any
      explanation for this particular change in output in that commit message.
      
      It's nice to match upstream, to make merging easier. However, this won't
      make much difference to that: almost all EXPLAIN plans in regression
      tests are different from upstream anyway, because GPDB needs Motion nodes
      for most queries. But every little helps.
      4c54c894
  11. 21 9月, 2018 2 次提交
  12. 08 9月, 2018 1 次提交
    • D
      Introduce optimizer guc to enable generating streaming material · 635c2e0f
      Dhanashree Kashid 提交于
      Previously, while optimizing nestloop joins, ORCA always generated a
      blocking materialize node (cdb_strict=true). Though, this conservative
      nature ensured that the join node produced by ORCA will always be
      deadlock safe; we sometimes produced slow running plans.
      
      ORCA now has a capability of producing blocking materialize only when
      needed by detecting motion hazard in the nestloop join. A streaming
      material will be generated when there is no motion hazard.
      
      This commit adds a guc to control this behavior. When set to off, we
      fallback to old behavior of always producing a blocking materialize.
      
      Also bump the statement_mem for a test in segspace. After this change,
      for the test query, we produce a streaming spool which changes number of
      operator groups in memory quota calculation and query fails with:
      `ERROR:  insufficient memory reserved for statement`. Bump the
      statement_mem by 1MB to test the fault injection.
      
      Also bump the orca version to 2.72.0
      Signed-off-by: NAbhijit Subramanya <asubramanya@pivotal.io>
      635c2e0f
  13. 06 9月, 2018 1 次提交
  14. 01 9月, 2018 1 次提交
    • S
      Add test to ORCA generates correct equivalence class · 27127b47
      Sambitesh Dash 提交于
      Given a query like below:
      
      SELECT Count(*)
      FROM   (SELECT *
              FROM   (SELECT tab_2.cd AS CD1,
                             tab_2.cd AS CD2
                      FROM   tab_1
                             LEFT JOIN tab_2
                                    ON tab_1.id = tab_2.id) f
              UNION ALL
              SELECT region,
                     code
              FROM   tab_3)a;
      
      Previously, orca produced an incorrect filter, (cd2 = cd) on top of the
      project list generated for producing an alias. This led to incorrect
      results as column 'cd' is produced by a nullable side of LOJ (tab2) and
      such filter produces NULL
      output.
      Ensure orca produces correct equivalence class by considering the
      nullable columns.
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      27127b47
  15. 31 8月, 2018 2 次提交
    • H
      Replace GPDB versions of some numeric aggregates with upstream's. · 325e6fcd
      Heikki Linnakangas 提交于
      Among other things, this fixes the inaccuracy of integer avg() and sum()
      functions. (i.e. fixes https://github.com/greenplum-db/gpdb/issues/5525)
      
      The upstream versions are from PostgreSQL 9.6, using the 128-bit math
      from the following commit:
      
      commit 959277a4
      Author: Andres Freund <andres@anarazel.de>
      Date:   Fri Mar 20 10:26:17 2015 +0100
      
          Use 128-bit math to accelerate some aggregation functions.
      
          On platforms where we support 128bit integers, use them to implement
          faster transition functions for sum(int8), avg(int8),
          var_*(int2/int4),stdev_*(int2/int4). Where not supported continue to use
          numeric as a transition type.
      
          In some synthetic benchmarks this has been shown to provide significant
          speedups.
      
          Bumps catversion.
      
          Discussion: 544BB5F1.50709@proxel.se
          Author: Andreas Karlsson
          Reviewed-By: Peter Geoghegan, Petr Jelinek, Andres Freund,
              Oskari Saarenmaa, David Rowley
      325e6fcd
    • H
      Rename "prelim function" to "combine function", to match upstream. · b8545d57
      Heikki Linnakangas 提交于
      The GPDB "prelim" functions did the same things as the "combine"
      functions introduced in PostgreSQL 9.6 This commit includes just the
      catalog changes, to essentially search & replace "prelim" with
      "combine". I did not pick the planner and executor changes that were
      made as part of this in the upstream, yet.
      
      Also replace the GPDB implementation of float8_amalg() and
      float8_regr_amalg(), with the upstream float8_combine() and
      float8_regr_combine(). They do the same thing, but let's use upstream
      functions where possible.
      
      Upstream commits:
      commit a7de3dc5
      Author: Robert Haas <rhaas@postgresql.org>
      Date:   Wed Jan 20 13:46:50 2016 -0500
      
          Support multi-stage aggregation.
      
          Aggregate nodes now have two new modes: a "partial" mode where they
          output the unfinalized transition state, and a "finalize" mode where
          they accept unfinalized transition states rather than individual
          values as input.
      
          These new modes are not used anywhere yet, but they will be necessary
          for parallel aggregation.  The infrastructure also figures to be
          useful for cases where we want to aggregate local data and remote
          data via the FDW interface, and want to bring back partial aggregates
          from the remote side that can then be combined with locally generated
          partial aggregates to produce the final value.  It may also be useful
          even when neither FDWs nor parallelism are in play, as explained in
          the comments in nodeAgg.c.
      
          David Rowley and Simon Riggs, reviewed by KaiGai Kohei, Heikki
          Linnakangas, Haribabu Kommi, and me.
      
      commit af025eed
      Author: Robert Haas <rhaas@postgresql.org>
      Date:   Fri Apr 8 13:44:50 2016 -0400
      
          Add combine functions for various floating-point aggregates.
      
          This allows parallel aggregation to use them.  It may seem surprising
          that we use float8_combine for both float4_accum and float8_accum
          transition functions, but that's because those functions differ only
          in the type of the non-transition-state argument.
      
          Haribabu Kommi, reviewed by David Rowley and Tomas Vondra
      b8545d57
  16. 25 8月, 2018 1 次提交
    • D
      Add tests ensuring correct handling of Full and left outer joins · 17de967d
      Dhanashree Kashid 提交于
      1. Add a test for a full outer join query on varchar columns
      In such scenario, planner expects a relabletype node on top of varchar
      column while looking up for a Sort operator. Please refer commit fab435e
      for more details.  Add a test for such queries and disable hashjoin to
      make sure that a planner is able to generate a plan with merge join
      successfully.
      
      2. Add a test for a query with an Agg and left outer join
      This test is to ensure that ORCA produces correct results, by performing
      a two stage aggregation on top of a co-located join. Corresponding plan
      test has been added in the ORCA test suite.
      17de967d
  17. 15 8月, 2018 1 次提交
  18. 03 8月, 2018 1 次提交
  19. 02 8月, 2018 1 次提交
    • R
      Merge with PostgreSQL 9.2beta2. · 4750e1b6
      Richard Guo 提交于
      This is the final batch of commits from PostgreSQL 9.2 development,
      up to the point where the REL9_2_STABLE branch was created, and 9.3
      development started on the PostgreSQL master branch.
      
      Notable upstream changes:
      
      * Index-only scan was included in the batch of upstream commits. It
        allows queries to retrieve data only from indexes, avoiding heap access.
      
      * Group commit was added to work effectively under heavy load. Previously,
        batching of commits became ineffective as the write workload increased,
        because of internal lock contention.
      
      * A new fast-path lock mechanism was added to reduce the overhead of
        taking and releasing certain types of locks which are taken and released
        very frequently but rarely conflict.
      
      * The new "parameterized path" mechanism was added. It allows inner index
        scans to use values from relations that are more than one join level up
        from the scan. This can greatly improve performance in situations where
        semantic restrictions (such as outer joins) limit the allowed join orderings.
      
      * SP-GiST (Space-Partitioned GiST) index access method was added to support
        unbalanced partitioned search structures. For suitable problems, SP-GiST can
        be faster than GiST in both index build time and search time.
      
      * Checkpoints now are performed by a dedicated background process. Formerly
        the background writer did both dirty-page writing and checkpointing. Separating
        this into two processes allows each goal to be accomplished more predictably.
      
      * Custom plan was supported for specific parameter values even when using
        prepared statements.
      
      * API for FDW was improved to provide multiple access "paths" for their tables,
        allowing more flexibility in join planning.
      
      * Security_barrier option was added for views to prevents optimizations that
        might allow view-protected data to be exposed to users.
      
      * Range data type was added to store a lower and upper bound belonging to its
        base data type.
      
      * CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The
        SELECT query is planned during the execution of the utility. To conform to
        this change, GPDB executes the utility statement only on QD and dispatches
        the plan of the SELECT query to QEs.
      Co-authored-by: NAdam Lee <ali@pivotal.io>
      Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Co-authored-by: NAsim R P <apraveen@pivotal.io>
      Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
      Co-authored-by: NGang Xiong <gxiong@pivotal.io>
      Co-authored-by: NHaozhou Wang <hawang@pivotal.io>
      Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
      Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
      Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
      Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
      Co-authored-by: NPaul Guo <paulguo@gmail.com>
      Co-authored-by: NRichard Guo <guofenglinux@gmail.com>
      Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
      Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
      Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
      4750e1b6
  20. 19 6月, 2018 1 次提交
    • O
      Utilize hyperloglog and merge utilities to derive root table statistics · 9c1b1ae3
      Omer Arap 提交于
      This commit introduces an end-to-end scalable solution to generate
      statistics of the root partitions. This is done by merging the
      statistics of leaf partition tables to generate the statistics of the
      root partition. Therefore, ability to merge leaf table statistics for
      the root table makes analyze very incremental and stable.
      
      **CHANGES IN LEAF TABLE STATS COLLECTION:**
      
      Incremental analyze will create sample for each partition as the
      previous version. While analyzing the sample and generating statistics
      for the partition, it will also create a `hyperloglog_counter` data
      structure and add values from the sample to the `hyperloglog_counter`
      such as number of multiples and sample size. Once the entire sample is
      processed, analyze will save the `hyperloglog_counter` as a byte array
      in `pg_statistic` catalog table. We reserve a slot for the
      `hyperlog_counter` in the table and signify this as a specific type of
      statistic kind which is `STATISTIC_KIND_HLL`. We only keep the
      `hyperloglog_counter` in the `pg_catalog` for the leaf partitions. If
      the user chooses to run FULL scan for HLL, we signify the kind as
      `STATISTIC_KIND_FULLHLL`.
      
      **MERGING LEAF STATISTICS**
      
      Once all the leaf partitions are analyzed, we analyze the root
      partition. Initially, we check if all the partitions have been analyzed
      properly and have all the statistics available to us in the
      `pg_statistic` catalog table. If there is a partition with no tuples,
      even though it has no entry in `pg_catalog`, we consider it as analyzed.
      If for some reason a single partition is not analyzed, we fall back to
      the original analyze algorithm that requires to acquire sample for the
      root partition and calculate statistic based on the sample.
      
      Merging null fraction and average width from leaf partition statistics
      is trivial and does not involve significant challenge. We do calculate
      them first. Then, the remaining statistics information are:
      
      - Number of distinct values (NDV)
      
      - Most common values (MCV), and their frequencies termed as most common
      frequency (MCF)
      
      - Histograms that represent the distribution of the data values in the
      table
      
      **Merging NDV:**
      
      Hyperloglog provides a functionality to merge multiple
      `hyperloglog_counter`s into one and calculate the number of distinct
      values using the aggregated `hyperlog_counter`. This aggregated
      `hyperlog_counter` is sufficient only if the user chooses to run full
      scan for hyperloglog. In the sample based approach, without the
      hyperloglog algorithm, derivation of number of distinct values is not
      possible. Hyperloglog enables us to merge the `hyperloglog_counter`s
      from each partition and calculate the NDV on the merged
      `hyperloglog_counter` with an acceptable error rate. However, it does
      not give us the ultimate NDV of the root partition, it provides us the
      NDV of the union of the samples from each partition.
      
      The rest of the NDV interpolation depends on four metrics in postgres
      and based on the formula used in postgres: NDV in the sample, number of
      multiple values in the sample, sample size and total rows in the table.
      Using these values the algorithm calculates the approximate NDV for the
      table. While merging the statistics from the leaf partitions, with the
      help of hyperloglog we can accurately generate NDV for the sample,
      sample size and total rows, however, number of multiples in the
      accumulated sample is unknown since we do not have an access to the
      accumulated sample at this point.
      
      _Number of Multiples_
      
      Our approach to estimate the number of multiples in the aggregated
      sample (which itself is unavailable) for the root requires the
      availability of NDVs, number of multiples and size of each leaf sample.
      The NDVs in each sample is trivial to calculate using the partition's
      `hyperloglog_counter`. The number of multiples and sample size for each
      partition is saved in the `hyperloglog_counter` of the partition to be
      used in the merge during the leaf statistics gathering.
      
      Estimating the number of multiples in the aggregate sample for the root
      partition is a two step process. First, we accurately estimate the
      number of values that reside in more than one partition's sample. Then,
      we estimate the number of multiples that uniquely exists in a single
      partition. Finally, we add these values to estimate the overall number
      of multiples in the aggregate sample of the root partition.
      
      To count the number of values that uniquely exists in one single
      partition, we utilize hyperloglog functionality. We can easily estimate
      how many values appear only on a specific partition _i_. We call the NDV
      of overall aggregate of the entire partition as `NDV_all` and NDV of
      aggregate of all partitions but _i_ as `NDV_minus_i`. The difference of
      `NDV_all` and  `NDV_minus_i` would result in the values that appear in
      only one partition. The rest of the values will contribute to the
      overall number of multiples in the root’s aggregated sample, and we call
      them as `nMultiple_inter` as the number of values that appear in more
      than one partition.
      
      However, that is not enough since even a single value only resides in
      one partition, the partition might have multiple of them. We need a way
      to express the possibility of existence of these values. Remember that
      we also account the number of multiples that uniquely in partition
      sample. We already know the number of multiples inside a partition
      sample, however we need to normalize this value with the proportion of
      the number of values unique to the partition sample to the number of
      distinct values of the partition sample. The normalized value would be
      partition sample i’s contribution to the overall calculation of the
      nMultiple.
      
      Finally, `nMultiple_root` would be the sum of the `nMultiple_inter` and
      `normalized_m_i` for each partition sample.
      
      **Merging MCVs:**
      
      We utilize the merge functionality we imported from the 4.3 version of
      the greenplum DB. The algorithm is trivial. We convert each MCV’s
      frequency into count and add them up if they appear in more than one
      partition. After every possible candidate’s count has been calculated,
      we sort the candidate values and pick the top ones which is defined by
      the `default_statistics_target`. 4.3 previously blindly picks the top
      values with the highest count. We however incorporated the same logic
      used in the current greenplum and postgres and test if a values is a
      real MCV by running some tests. Therefore, even after the merge, the
      logic totally aligns with the postgres.
      
      **Merging Histograms:**
      
      One of the main novel contribution of this commit comes in how we merge
      the histograms from the leaf partitions. In 4.3 we use priority queue to
      merge the histogram from the leaf partition. However, that approach is
      very naive and loses very important statistical information. In
      postgres, histogram is calculated over the values that did not qualify
      as an MCV. The merge logic for the histograms in 4.3, did not take this
      into consideration and significant statistical information is lost while
      we merge the MCV values.
      
      We introduce a novel approach to feed the MCV’s from the leaf partitions
      that did not qualify as a root MCV to the histogram merge logic. To
      fully utilize the previously implemented priority queue logic, we
      treated non-qualified MCV’s as the histograms of a so called `dummy`
      partitions. To be more previcate, if an MCV m1 is a non-qualified MCV we
      create a histogram [m1, m1] where it only has one bucket and the bucket
      size is the count of this non-qualified MCV. When we merge the
      histograms of the leaf partitions and these dummy partitions the merged
      histogram would not lose any statistical information.
      Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      9c1b1ae3
  21. 05 4月, 2018 1 次提交
    • D
      Fix fallback test in gporca.sql · 34878131
      Dhanashree Kashid 提交于
      This test was added to check the logging of ORCA fall-back messages. The
      query contains CUBE grouping extension which is currently not supported
      by ORCA causing ORCA to fall back to planner with following log
      messages:
      
      LOG:  NOTICE,"Feature not supported by the Pivotal Query Optimizer:
      Cube",
      LOG:  Planner produced plan :0
      
      The planner generated plan contains a Shared Scan node.  During
      execution of this, sometimes, there is an extra log message generated
      indicating that Shared Scan writer is waiting for an acknowledgement
      from Shared Scan readers:
      
      LOG: SISC WRITER (shareid=0, slice=1): notify still wait for an answer,
      errno 4
      
      The query returns successfully however this intermittently generated log
      message causes this test to fail.
      This commit fixes the flake by converting this to an EXPLAIN test, which
      is sufficient to demonstrate the fall back logging.
      34878131
  22. 14 2月, 2018 1 次提交
  23. 09 2月, 2018 1 次提交
    • H
      Fix more whitespace in tests, mostly in expected output. · 93b92ca4
      Heikki Linnakangas 提交于
      Commit ce3153fa, about to be merged from PostgreSQL 9.0 soon, removes
      the -w option from pg_regress's "diff" invocation. That commit will fix
      all the PostgreSQL regression tests to pass without it, but we need to
      also fix all the GPDB tests. That's what this commit does.
      
      I did much of this in commit 06a2bb64, but now that we're about to
      actually merge that, more cases popped up.
      
      Co-Author: Daniel Gustafsson <dgustafsson@pivotal.io>
      93b92ca4
  24. 18 1月, 2018 1 次提交
    • H
      Fix whitespace in tests, mostly in expected output. · 06a2bb64
      Heikki Linnakangas 提交于
      Commit ce3153fa, about to be merged from PostgreSQL 9.0 soon, removes
      the -w option from pg_regress's "diff" invocation. That commit will fix
      all the PostgreSQL regression tests to pass without it, but we need to
      also fix all the GPDB tests. That's what this commit does.
      06a2bb64
  25. 06 1月, 2018 1 次提交
  26. 05 1月, 2018 1 次提交
    • J
      set search_path and stop dropping schema in gporca test · c7ab6924
      Jesse Zhang 提交于
      The `gporca` regression test suite uses a schema but doesn't really
      switch `search_path` to the schema that's meant to encapsulate most of
      the objects it uses. This has led to multiple instances where we:
        1. Either used a table from another namespace by accident;
        2. Or we leaked objects into the public namespace that other tests in
        turned accidentally depended on.
      
      As we were about to add a few user-defined types and casts to the test
      suite, we want to (at last) ensure that all future additions are scoped
      to the namespace.
      Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
      
      Closes #4238
      c7ab6924
  27. 21 12月, 2017 2 次提交
  28. 13 12月, 2017 1 次提交
    • S
      Ensure that ORCA is not called on any process other than the master QD · 916f460f
      Shreedhar Hardikar 提交于
      We don't want to use the optimizer for planning queries in SQL, pl/pgSQL
      etc. functions when that is done on the segments.
      
      ORCA excels in complex queries, most of which will access distributed
      tables. We can't run such queries from the segments slices anyway
      because they require dispatching a query within another - which is not
      allowed in GPDB. Note that this restriction also applies to non-QD
      master slices.  Furthermore, ORCA doesn't currently support pl/*
      statements (relevant when they are planned on the segments).
      
      For these reasons, restrict to using ORCA on the master QD processes
      only.
      
      Also revert commit d79a2c7f ("Fix pipeline failures caused by 0dfd0ebc.")
      and separate out gporca fault injector tests in newly added
      gporca_faults.sql so that the rest can run in a parallel group.
      Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
      916f460f
  29. 04 12月, 2017 1 次提交
    • S
      Fix pipeline failures caused by 0dfd0ebc. · d79a2c7f
      Shreedhar Hardikar 提交于
      Move gporca regression test out of the parallel group so that
      gp_fault_injector functionality works correctly.
      Also, as it turns out, ORCA is used to run pg/PLSQL queries sometimes
      even when the GUC optimizer is set to off. So when gporca sets up the
      gp_fault_injector, it gets activated later on in parallel group
      qp_functions_in_from test is part of. So, reset the fault in gporca just
      in case.
      d79a2c7f
  30. 02 12月, 2017 2 次提交
    • S
      Support optimization interrupts in ORCA · 0dfd0ebc
      Shreedhar Hardikar 提交于
      To support that, this commit adds 2 new ORCA APIs:
      - SignalInterruptGPOPT(), which notifies ORCA that an abort is requested
        (must be called from the signal handler)
      - ResetInterruptsGPOPT(), which resets ORCA's state to before the
        interruption, so that the next query can run normally (needs to be
        called only on the QD)
      
      Also check for interrupts right after ORCA returns.
      0dfd0ebc
    • D
      Update Planner answer file for gpora · f18a3a59
      Dhanashree 提交于
      This was missed in commit 407b2880
      f18a3a59
  31. 13 11月, 2017 1 次提交
  32. 17 10月, 2017 1 次提交
    • H
      Add support for "order none" directive to atmsort. · 5390d8b7
      Heikki Linnakangas 提交于
      This allows overriding the heuristic on whether a query has an ORDER BY.
      
      Use the directive in one of the queries in the 'gporca' test, which
      contains a subquery with an ORDER BY that fools the atmsort's usual
      heuristic. The overall order of the query is not well-defined, even though
      there is an ORDER BY in the subquery. The current implementation of
      DISTINCT in fact always also sorts the output, which is why this test
      is passing, but that is about to be relaxed soon, when we merge upstream
      commit 63247bec.
      5390d8b7
  33. 27 9月, 2017 2 次提交
    • D
      Implement CDB like pre-join deduplication · efb2777a
      Dhanashree Kashid, Ekta Khanna and Omer Arap 提交于
      For flattened IN or EXISTS sublinks, if we chose INNER JOIN path instead
      of SEMI JOIN then we need to apply duplicate suppression.
      
      The deduplication can be done in two ways:
      1. post-join dedup
      unique-ify the inner join results. try_postjoin_dedup in CdbRelDedupInfo denotes
      if we need to got for post-join dedup
      
      2. pre-join dedup
      unique-ify the rows coming from the rel containing the subquery result,
      before that is joined with any other rels. join_unique_ininfo in
      CdbRelDedupInfo denotes if we need to go for pre-join dedup.
      semi_operators and semi_rhs_exprs are used for this. We ported a
      function from 9.5 to compute these in make_outerjoininfo().
      
      Upstream has completely different implementation of this. Upstream explores JOIN_UNIQUE_INNER
      and JOIN_UNIQUE_OUTER paths for this and deduplication is done create_unique_path().
      GPDB does this differently since JOIN_UNIQUE_INNER and JOIN_UNIQUE_OUTER are obsolete
      for us. Hence we have kept the GPDB style deduplication mechanism as it in this merge.
      
      Post-join has been implemented in previous merge commits.
      
      Ref [#146890743]
      efb2777a
    • S
      CDB Specific changes, other fix-ups after merging e549722a · e5f6e826
      Shreedhar Hardikar 提交于
      0. Fix up post join dedup logic after cherry-pick
      0. Fix pull_up_sublinks_jointree_recurse returning garbage relids
      0. Update gporca, rangefuncs, eagerfree answer fileis
      	1. gporca
      	Previously we were generating a Hash Inner Join with an
      	HashAggregate for deduplication. Now we generate a Hash
      	Semi Join in which case we do not need to deduplicate the
      	inner.
      
      	2. rangefuncs
      	We updated this answer file during the cherry-pick of
      	e006a24a since there was a change in plan.
      	After these cherry-picks, we are back to the original
      	plan as master. Hence we see the original error.
      
      	3. eagerfree
      	We are generating a not-very-useful subquery scan node
      	with this change. This is not producing wrong results.
      	But this subqeury scan needs to be removed.
      	We will file a follow-up chore to investigate and fix this.
      
      0. We no longer need helper function `hasSemiJoin()` to check whether
      this specialInfo list has any specialJoinInfos constructed for Semi Join
      (IN/EXISTS sublink). We have moved that check inside
      `cdb_set_cheapest_dedup()`
      
      0. We are not exercising the pre-join-deduplication code path after
      this cherry-pick. Before this merge, we had three CDB specific
      nodes in `InClauseInfo` in which we recorded information for
      pre-join-dedup in case of simple uncorrelated IN sublinks.
      `try_join_unique`, `sub_targetlist` and `InOperators`
      Since we now have `SpecialJoinInfo` instead of `InClauseInfo`, we need
      to devise a way to record this information in `SpecialJoinInfo`.
      We have filed a follow-up story for this.
      
      Ref [#142356521]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      e5f6e826
  34. 21 7月, 2017 1 次提交