1. 28 9月, 2018 4 次提交
    • D
      Order active window clauses for greater reuse of Sort nodes. · 3f0d46f7
      Daniel Gustafsson 提交于
      This is a backport of the below commit from postgres 12dev, which in turn
      is a patch that was influenced by an optimization from the previous version
      of the Greenplum Window code. The idea is to order the Sort nodes based on
      sort prefixes, such that sorts can be reused by subsequent nodes.
      
      As this uses EXPLAIN in the test output, a new expected file is added for
      ORCA output even though the patch only touches the postgres planner.
      
        commit 728202b6
        Author: Andrew Gierth <rhodiumtoad@postgresql.org>
        Date:   Fri Sep 14 17:35:42 2018 +0100
      
          Order active window clauses for greater reuse of Sort nodes.
      
          By sorting the active window list lexicographically by the sort clause
          list but putting longer clauses before shorter prefixes, we generate
          more chances to elide Sort nodes when building the path.
      
          Author: Daniel Gustafsson (with some editorialization by me)
          Reviewed-by: Alexander Kuzmenkov, Masahiko Sawada, Tom Lane
          Discussion: https://postgr.es/m/124A7F69-84CD-435B-BA0E-2695BE21E5C2%40yesql.se
      3f0d46f7
    • H
      Remove unnecessary code for the first ORDER BY column in window agg. · e70f73e0
      Heikki Linnakangas 提交于
      The purpose of this code was to treat the first ORDER BY column, in a
      window agg like "ROW_NUMBER() OVER (ORDER BY x RANGE BETWEEN 2 PRECEDING
      AND 2 FOLLOWING", the same way as volatile expressions, and add them to
      the target list as is. That was to ensure that it would be available for
      computing the window bounds. But upstream commit a2099360, merged as
      part of the 9.3 merge, got rid of the distinction between volatile and
      non-volatile expressions, so we no longer need to treat the first ORDER BY
      column any different either.
      e70f73e0
    • H
      Move code marked with FIXME to make_windowInputTargetList(). · fa4a2ccb
      Heikki Linnakangas 提交于
      make_windowInputTargetList() seems like a better place for this code,
      as suggested by the FIXME comment that was left here in the 9.3 merge.
      fa4a2ccb
    • Z
      Allow tables to be distributed on a subset of segments · 4eb65a53
      ZhangJackey 提交于
      There was an assumption in gpdb that a table's data is always
      distributed on all segments, however this is not always true for example
      when a cluster is expanded from M segments to N (N > M) all the tables
      are still on M segments, to workaround the problem we used to have to
      alter all the hash distributed tables to randomly distributed to get
      correct query results, at the cost of bad performance.
      
      Now we support table data to be distributed on a subset of segments.
      
      A new columne `numsegments` is added to catalog table
      `gp_distribution_policy` to record how many segments a table's data is
      distributed on.  By doing so we could allow DMLs on M tables, joins
      between M and N tables are also supported.
      
      ```sql
      -- t1 and t2 are both distributed on (c1, c2),
      -- one on 1 segments, the other on 2 segments
      select localoid::regclass, attrnums, policytype, numsegments
          from gp_distribution_policy;
       localoid | attrnums | policytype | numsegments
      ----------+----------+------------+-------------
       t1       | {1,2}    | p          |           1
       t2       | {1,2}    | p          |           2
      (2 rows)
      
      -- t1 and t1 have exactly the same distribution policy,
      -- join locally
      explain select * from t1 a join t1 b using (c1, c2);
                         QUERY PLAN
      ------------------------------------------------
       Gather Motion 1:1  (slice1; segments: 1)
         ->  Hash Join
               Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
               ->  Seq Scan on t1 a
               ->  Hash
                     ->  Seq Scan on t1 b
       Optimizer: legacy query optimizer
      
      -- t1 and t2 are both distributed on (c1, c2),
      -- but as they have different numsegments,
      -- one has to be redistributed
      explain select * from t1 a join t2 b using (c1, c2);
                                QUERY PLAN
      ------------------------------------------------------------------
       Gather Motion 1:1  (slice2; segments: 1)
         ->  Hash Join
               Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
               ->  Seq Scan on t1 a
               ->  Hash
                     ->  Redistribute Motion 2:1  (slice1; segments: 2)
                           Hash Key: b.c1, b.c2
                           ->  Seq Scan on t2 b
       Optimizer: legacy query optimizer
      ```
      4eb65a53
  2. 27 9月, 2018 1 次提交
  3. 23 9月, 2018 1 次提交
  4. 22 9月, 2018 2 次提交
    • J
      Revert "Add DEBUG mode to the explain_memory_verbosity GUC" · 984cd3b9
      Jesse Zhang 提交于
      Commit 825ca1e3 didn't seem to work well when we hook up ORCA's memory
      system to memory accounting. We are tripping multiple asserts in
      regression tests. The reg test failures seem to suggest we are
      double-free'ing somewhere (or incorrectly accounting). Reverting for now
      to get master back to green.
      
      This reverts commit 825ca1e3.
      984cd3b9
    • T
      Add DEBUG mode to the explain_memory_verbosity GUC · 825ca1e3
      Taylor Vesely 提交于
      The memory accounting system generates a new memory account for every
      execution node initialized in ExecInitNode. The address to these memory
      accounts is stored in the shortLivingMemoryAccountArray. If the memory
      allocated for shortLivingMemoryAccountArray is full, we will repalloc
      the array with double the number of available entries.
      
      After creating approximately 67000000 memory accounts, it will need to
      allocate more than 1GB of memory to increase the array size, and throw
      an ERROR, canceling the running query.
      
      PL/pgSQL and SQL functions will create new executors/plan nodes that
      must be tracked my the memory accounting system. This level of detail is
      not necessary for tracking memory leaks, and creating a separate memory
      account for every executor will use large amount of memory just to track
      these memory accounts.
      
      Instead of tracking millions of individual memory accounts, we
      consolidate any child executor account into a special 'X_NestedExecutor'
      account. If explain_memory_verbosity is set to 'detailed' and below,
      consolidate all child executors into this account.
      
      If more detail is needed for debugging, set explain_memory_verbosity to
      'debug', where, as was the previous behavior, every executor will be
      assigned its own MemoryAccountId.
      
      Originally we tried to remove nested execution accounts after they
      finish executing, but rolling over those accounts into a
      'X_NestedExecutor' account was impracticable to accomplish without the
      possibility of a future regression.
      
      If any accounts are created between nested executors that are not rolled
      over to an 'X_NestedExecutor' account, recording which accounts are
      rolled over can grow in the same way that the
      shortLivingMemoryAccountArray is growing today, and would also grow too
      large to reasonably fit in memory.
      
      If we were to iterate through the SharedHeaders every time that we
      finish a nested executor, it is not likely to be very performant.
      
      While we were at it, convert some of the convenience macros dealing with
      memory accounting for executor / planner node into functions, and move
      them out of memory accounting header files into the sole callers'
      compilation units.
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Co-authored-by: NEkta Khanna <ekhanna@pivotal.io>
      Co-authored-by: NAdam Berlin <aberlin@pivotal.io>
      Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
      Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
      825ca1e3
  5. 05 9月, 2018 3 次提交
  6. 03 9月, 2018 1 次提交
  7. 31 8月, 2018 1 次提交
    • H
      Rename "prelim function" to "combine function", to match upstream. · b8545d57
      Heikki Linnakangas 提交于
      The GPDB "prelim" functions did the same things as the "combine"
      functions introduced in PostgreSQL 9.6 This commit includes just the
      catalog changes, to essentially search & replace "prelim" with
      "combine". I did not pick the planner and executor changes that were
      made as part of this in the upstream, yet.
      
      Also replace the GPDB implementation of float8_amalg() and
      float8_regr_amalg(), with the upstream float8_combine() and
      float8_regr_combine(). They do the same thing, but let's use upstream
      functions where possible.
      
      Upstream commits:
      commit a7de3dc5
      Author: Robert Haas <rhaas@postgresql.org>
      Date:   Wed Jan 20 13:46:50 2016 -0500
      
          Support multi-stage aggregation.
      
          Aggregate nodes now have two new modes: a "partial" mode where they
          output the unfinalized transition state, and a "finalize" mode where
          they accept unfinalized transition states rather than individual
          values as input.
      
          These new modes are not used anywhere yet, but they will be necessary
          for parallel aggregation.  The infrastructure also figures to be
          useful for cases where we want to aggregate local data and remote
          data via the FDW interface, and want to bring back partial aggregates
          from the remote side that can then be combined with locally generated
          partial aggregates to produce the final value.  It may also be useful
          even when neither FDWs nor parallelism are in play, as explained in
          the comments in nodeAgg.c.
      
          David Rowley and Simon Riggs, reviewed by KaiGai Kohei, Heikki
          Linnakangas, Haribabu Kommi, and me.
      
      commit af025eed
      Author: Robert Haas <rhaas@postgresql.org>
      Date:   Fri Apr 8 13:44:50 2016 -0400
      
          Add combine functions for various floating-point aggregates.
      
          This allows parallel aggregation to use them.  It may seem surprising
          that we use float8_combine for both float4_accum and float8_accum
          transition functions, but that's because those functions differ only
          in the type of the non-transition-state argument.
      
          Haribabu Kommi, reviewed by David Rowley and Tomas Vondra
      b8545d57
  8. 03 8月, 2018 1 次提交
  9. 02 8月, 2018 1 次提交
    • R
      Merge with PostgreSQL 9.2beta2. · 4750e1b6
      Richard Guo 提交于
      This is the final batch of commits from PostgreSQL 9.2 development,
      up to the point where the REL9_2_STABLE branch was created, and 9.3
      development started on the PostgreSQL master branch.
      
      Notable upstream changes:
      
      * Index-only scan was included in the batch of upstream commits. It
        allows queries to retrieve data only from indexes, avoiding heap access.
      
      * Group commit was added to work effectively under heavy load. Previously,
        batching of commits became ineffective as the write workload increased,
        because of internal lock contention.
      
      * A new fast-path lock mechanism was added to reduce the overhead of
        taking and releasing certain types of locks which are taken and released
        very frequently but rarely conflict.
      
      * The new "parameterized path" mechanism was added. It allows inner index
        scans to use values from relations that are more than one join level up
        from the scan. This can greatly improve performance in situations where
        semantic restrictions (such as outer joins) limit the allowed join orderings.
      
      * SP-GiST (Space-Partitioned GiST) index access method was added to support
        unbalanced partitioned search structures. For suitable problems, SP-GiST can
        be faster than GiST in both index build time and search time.
      
      * Checkpoints now are performed by a dedicated background process. Formerly
        the background writer did both dirty-page writing and checkpointing. Separating
        this into two processes allows each goal to be accomplished more predictably.
      
      * Custom plan was supported for specific parameter values even when using
        prepared statements.
      
      * API for FDW was improved to provide multiple access "paths" for their tables,
        allowing more flexibility in join planning.
      
      * Security_barrier option was added for views to prevents optimizations that
        might allow view-protected data to be exposed to users.
      
      * Range data type was added to store a lower and upper bound belonging to its
        base data type.
      
      * CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The
        SELECT query is planned during the execution of the utility. To conform to
        this change, GPDB executes the utility statement only on QD and dispatches
        the plan of the SELECT query to QEs.
      Co-authored-by: NAdam Lee <ali@pivotal.io>
      Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Co-authored-by: NAsim R P <apraveen@pivotal.io>
      Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
      Co-authored-by: NGang Xiong <gxiong@pivotal.io>
      Co-authored-by: NHaozhou Wang <hawang@pivotal.io>
      Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
      Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
      Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
      Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
      Co-authored-by: NPaul Guo <paulguo@gmail.com>
      Co-authored-by: NRichard Guo <guofenglinux@gmail.com>
      Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
      Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
      Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
      4750e1b6
  10. 09 7月, 2018 1 次提交
  11. 12 5月, 2018 1 次提交
  12. 03 5月, 2018 1 次提交
    • Z
      Add Global Deadlock Detector. · 03915d65
      Zhenghua Lyu 提交于
      To prevent distributed deadlock, in Greenplum DB an exclusive table lock is
      held for UPDATE and DELETE commands, so concurrent updates the same table are
      actually disabled.
      
      We add a backend process to do global deadlock detect so that we do not lock
      the whole table while doing UPDATE/DELETE and this will help improve the
      concurrency of Greenplum DB.
      
      The core idea of the algorithm is to divide lock into two types:
      
      - Persistent: the lock can only be released after the transaction is over(abort/commit)
      - Otherwise cases
      
      This PR’s implementation adds a persistent flag in the LOCK, and the set rule is:
      
      - Xid lock is always persistent
      - Tuple lock is never persistent
      - Relation is persistent if it has been closed with NoLock parameter, otherwise
        it is not persistent Other types of locks are not persistent
      
      More details please refer the code and README.
      
      There are several known issues to pay attention to:
      
      - This PR’s implementation only cares about the locks can be shown
        in the view pg_locks.
      - This PR’s implementation does not support AO table. We keep upgrading
        the locks for AO table.
      - This PR’s implementation does not take networking wait into account.
        Thus we cannot detect the deadlock of GitHub issue #2837.
      - SELECT FOR UPDATE still lock the whole table.
      Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
      Co-authored-by: NNing Yu <nyu@pivotal.io>
      03915d65
  13. 02 5月, 2018 1 次提交
    • H
      Re-enable MIN/MAX optimization. · 362fc756
      Heikki Linnakangas 提交于
      I'm not sure why it's been disabled. It's not very hard to make it work, so
      let's do it. Might not be a very common query type, but if you happen to
      have a query where it helps, it helps a lot.
      
      This adds a GUC, gp_enable_minmax_optimization, to enable/disable the
      optimization. There's no such GUC in upstream, but we need at least a flag
      in PlannerConfig for it, so that we can disable the optimization for
      correlated subqueries, along with some other optimizer tricks. Seems best
      to also have a GUC for it, for consistency with other flags in
      PlannerConfig.
      362fc756
  14. 29 3月, 2018 2 次提交
    • P
      Support replicated table in GPDB · 7efe3204
      Pengzhou Tang 提交于
      * Support replicated table in GPDB
      
      Currently, tables are distributed across all segments by hash or random in GPDB. There
      are requirements to introduce a new table type that all segments have the duplicate
      and full table data called replicated table.
      
      To implement it, we added a new distribution policy named POLICYTYPE_REPLICATED to mark
      a replicated table and added a new locus type named CdbLocusType_SegmentGeneral to specify
      the distribution of tuples of a replicated table.  CdbLocusType_SegmentGeneral implies
      data is generally available on all segments but not available on qDisp, so plan node with
      this locus type can be flexibly planned to execute on either single QE or all QEs. it is
      similar with CdbLocusType_General, the only difference is that CdbLocusType_SegmentGeneral
      node can't be executed on qDisp. To guarantee this, we try our best to add a gather motion
      on the top of a CdbLocusType_SegmentGeneral node when planing motion for join even other
      rel has bottleneck locus type, a problem is such motion may be redundant if the single QE
      is not promoted to executed on qDisp finally, so we need to detect such case and omit the
      redundant motion at the end of apply_motion(). We don't reuse CdbLocusType_Replicated since
      it's always implies a broadcast motion bellow, it's not easy to plan such node as direct
      dispatch to avoid getting duplicate data.
      
      We don't support replicated table with inherit/partition by clause now, the main problem is
      that update/delete on multiple result relations can't work correctly now, we can fix this
      later.
      
      * Allow spi_* to access replicated table on QE
      
      Previously, GPDB didn't allow QE to access non-catalog table because the
      data is incomplete,
      we can remove this limitation now if it only accesses replicated table.
      
      One problem is QE need to know if a table is replicated table,
      previously, QE didn't maintain
      the gp_distribution_policy catalog, so we need to pass policy info to QE
      for replicated table.
      
      * Change schema of gp_distribution_policy to identify replicated table
      
      Previously, we used a magic number -128 in gp_distribution_policy table
      to identify replicated table which is quite a hack, so we add a new column
      in gp_distribution_policy to identify replicated table and partitioned
      table.
      
      This commit also abandon the old way that used 1-length-NULL list and
      2-length-NULL list to identify DISTRIBUTED RANDOMLY and DISTRIBUTED
      FULLY clause.
      
      Beside, this commit refactor the code to make the decision-making of
      distribution policy more clear.
      
      * support COPY for replicated table
      
      * Disable row ctid unique path for replicated table.
        Previously, GPDB use a special Unique path on rowid to address queries
        like "x IN (subquery)", For example:
        select * from t1 where t1.c2 in (select c2 from t3), the plan looks
        like:
         ->  HashAggregate
               Group By: t1.ctid, t1.gp_segment_id
                  ->  Hash Join
                        Hash Cond: t2.c2 = t1.c2
                      ->  Seq Scan on t2
                      ->  Hash
                          ->  Seq Scan on t1
      
        Obviously, the plan is wrong if t1 is a replicated table because ctid
        + gp_segment_id can't identify a tuple, in replicated table, a logical
        row may have different ctid and gp_segment_id. So we disable such plan
        for replicated table temporarily, it's not the best way because rowid
        unique way maybe the cheapest plan than normal hash semi join, so
        we left a FIXME for later optimization.
      
      * ORCA related fix
        Reported and added by Bhuvnesh Chaudhary <bchaudhary@pivotal.io>
        Fallback to legacy query optimizer for queries over replicated table
      
      * Adapt pg_dump/gpcheckcat to replicated table
        gp_distribution_policy is no longer a master-only catalog, do
        same check as other catalogs.
      
      * Support gpexpand on replicated table && alter the dist policy of replicated table
      7efe3204
    • D
      Remove FIXME about group_id in Distinct HashAgg · 2b25c663
      Dhanashree Kashid 提交于
      With the 8.4 merge, planner considers using HashAgg to implement
      DISTINCT. At the end of planning, we replace the expressions in the
      targetlist of certain operators (including Agg) into OUTER references
      in targetlist of its lefttree (see set_plan_refs() >
      set_upper_references()).
      But, as per the code, in the case when grouping() or group_id() are
      present in the target list of Agg, it skips the replacement and this is
      problematic in case the Agg is implementing DISTINCT.
      
      It seems that the Agg's targetlist need not compute grouping() or
      group_id() when its lefttree is computing it. In that case, it may
      simply refer to it. This would then also apply to other operators
      WindowAgg, Result & PartitionSelector.
      
      However, the Repeat node needs to compute these functions at each stage
      because group_id is derived from RepeatState::repeat_count. Thus, it
      connot be replaced by an OUTER reference.
      
      Hence, this commit removes the special case for these functions for all
      operators except Repeat. Then, a DISTINCT HashAgg produces the correct
      results.
      Signed-off-by: NShreedhar Hardikar <shardikar@pivotal.io>
      2b25c663
  15. 16 3月, 2018 1 次提交
    • S
      Remove GPDB_84_MERGE_FIXME in planner.c and prepunion.c · 74546663
      Shreedhar Hardikar 提交于
      These were related to chosing the right arguments to send to GPDB's
      make_agg() and cost_agg() methods for queries containing DISTINCT or set
      operations.
      
      Hash aggregation when used to implement a DISTINCT (in either form) in
      the query is not related to grouping sets and thus the argments to
      num_nullcols, input_grouping, grouping and rollup_gs_times should be 0.
      
      However, since SetOp uses the upstream TupleHashTable while HashAgg uses
      GPDB's HHashTable implementation, the hash table size calculations
      should be computed differently. This is fixed in this commit
      Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
      74546663
  16. 09 2月, 2018 1 次提交
    • H
      Refactor the way Semi-Joins plans are constructed. · d4ce0921
      Heikki Linnakangas 提交于
      This removes much of the GPDB machinery to handle "deduplication paths"
      within the planner. We will now use the upstream code to build JOIN_SEMI
      paths, as well as paths where the outer side of the join is first
      deduplicated (JOIN_UNIQUE_OUTER/INNER).
      
      The old style "join first and deduplicate later" plans can be better in
      some cases, however. To still be able to generate such plan, add new
      JOIN_DEDUP_SEMI join type, which is transformed into JOIN_INNER followed
      by the deduplication step after the join, during planning.
      
      This new way of constructing these plans is simpler, and allows removing
      a bunch of code, and reverting some more code to the way it is in the
      upstream.
      
      I'm not sure if this can generate the same plans that the old code could,
      in all cases. In particular, I think the old "late deduplication"
      mechanism could delay the deduplication further, all the way to the top of
      the join tree. I'm not sure when that woud be useful, though, and the
      regression suite doesn't seem to contain any such cases (with EXPLAIN). Or
      maybe I misunderstood the old code. In any case, I think this is good
      enough.
      d4ce0921
  17. 02 2月, 2018 1 次提交
    • H
      Remove extra planner pass to remove "trivial" Result nodes. · c613cabf
      Heikki Linnakangas 提交于
      Instead, avoid creating such Result nodes in the first place, by making
      plan_pushdown_tlist() check if the Result node would have any work to do.
      
      With this, you get Result nodes in some cases where the old code could zap
      it away. But on the other hand, this can avoid inserting Result nodes, not
      only on top of Appends, but on top of any node. This can be seen in the
      included expected output changes: some test queries lose a Result, some
      gain one. So performance-wise this is about a wash, but this is simpler.
      
      The reason to do this right now is that we ran into issues with the
      "zapping" code while working on the 9.0 merge. I'm sure we could fix those
      issues, but let's do this rather than spend time debugging and fixing the
      zapping code with the merge.
      c613cabf
  18. 13 12月, 2017 4 次提交
    • D
      Reword comment to avoid nested comments · 8105f067
      Daniel Gustafsson 提交于
      The comment added in 916f460f created a nested comment structure
      by accident, which triggered a warning in clang for -Wcomment. Reword
      the comment slightly to make the compiler happy.
      
      planner.c:194:15: warning: '/*' within block comment [-Wcomment]
               * support pl/* statements (relevant when they are planned on the segments).
                           ^
      8105f067
    • S
      Fix storage test failures caused by 916f460f · 0d3ae2a0
      Shreedhar Hardikar 提交于
      The default value of Gp_role is set to GP_ROLE_DISPATCH. Which means
      auxiliary processes inherit this value. FileRep does the same, but also
      executes queries using SPI on the segment. Which means Gp_role ==
      GP_ROLE_DISPATCH is not a sufficient check for master QD.
      
      So, bring back the check on GpIdentity.
      
      Author: Asim R P <apraveen@pivotal.io>
      Author: Shreedhar Hardikar <shardikar@pivotal.io>
      0d3ae2a0
    • S
      Rename querytree_safe_for_segment to querytree_safe_for_qe · 32f099fd
      Shreedhar Hardikar 提交于
      The original name was deceptive because this check is also done for QE
      slices that run on master. For example:
      
      EXPLAIN SELECT * FROM func1_nosql_vol(5), foo;
      
                                               QUERY PLAN
      --------------------------------------------------------------------------------------------
       Gather Motion 3:1  (slice2; segments: 3)  (cost=0.30..1.37 rows=4 width=12)
         ->  Nested Loop  (cost=0.30..1.37 rows=2 width=12)
               ->  Seq Scan on foo  (cost=0.00..1.01 rows=1 width=8)
               ->  Materialize  (cost=0.30..0.33 rows=1 width=4)
                     ->  Broadcast Motion 1:3  (slice1)  (cost=0.00..0.30 rows=3 width=4)
                           ->  Function Scan on func1_nosql_vol  (cost=0.00..0.26 rows=1 width=4)
       Settings:  optimizer=off
       Optimizer status: legacy query optimizer
      (8 rows)
      
      Note that in the plan, the function func1_nosql_vol() will be executed on a
      master slice with Gp_role as GP_ROLE_EXECUTE.
      
      Also, update output files
      Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
      32f099fd
    • S
      Ensure that ORCA is not called on any process other than the master QD · 916f460f
      Shreedhar Hardikar 提交于
      We don't want to use the optimizer for planning queries in SQL, pl/pgSQL
      etc. functions when that is done on the segments.
      
      ORCA excels in complex queries, most of which will access distributed
      tables. We can't run such queries from the segments slices anyway
      because they require dispatching a query within another - which is not
      allowed in GPDB. Note that this restriction also applies to non-QD
      master slices.  Furthermore, ORCA doesn't currently support pl/*
      statements (relevant when they are planned on the segments).
      
      For these reasons, restrict to using ORCA on the master QD processes
      only.
      
      Also revert commit d79a2c7f ("Fix pipeline failures caused by 0dfd0ebc.")
      and separate out gporca fault injector tests in newly added
      gporca_faults.sql so that the rest can run in a parallel group.
      Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
      916f460f
  19. 12 12月, 2017 1 次提交
    • D
      Replace usage of deprecated error codes · fd0a1b75
      Daniel Gustafsson 提交于
      These error codes were marked as deprecated in September 2007 but
      the code didn't get the memo. Extend the deprecation into the code
      and actually replace the usage. Ten years seems long enough notice
      so also remove the renames, the odds of anyone using these in code
      which compiles against a 6X tree should be low (and easily fixed).
      fd0a1b75
  20. 30 11月, 2017 1 次提交
  21. 24 11月, 2017 7 次提交
    • H
      Backport upstream comment updates · 122e817b
      Heikki Linnakangas 提交于
      commit 96f990e2
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Wed Jul 13 20:23:09 2011 -0400
      
          Update some comments to clarify who does what in targetlist creation.
      
          No code changes; just avoid blaming query_planner for things it doesn't
          really do.
      122e817b
    • H
      Backport upstream bugfix related to Window functions. · 411a033c
      Heikki Linnakangas 提交于
      The test case added to the regression suite actually seems to work on
      GPDB even without this, but nevertheless seems like a good idea to pick
      it now, since we have the code it affected. Also, I'm about to backport
      more stuff that depend on this.
      
      commit c1d9579d
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Tue Jul 12 18:23:55 2011 -0400
      
          Avoid listing ungrouped Vars in the targetlist of Agg-underneath-Window.
      
          Regular aggregate functions in combination with, or within the arguments
          of, window functions are OK per spec; they have the semantics that the
          aggregate output rows are computed and then we run the window functions
          over that row set.  (Thus, this combination is not really useful unless
          there's a GROUP BY so that more than one aggregate output row is possible.)
          The case without GROUP BY could fail, as recently reported by Jeff Davis,
          because sloppy construction of the Agg node's targetlist resulted in extra
          references to possibly-ungrouped Vars appearing outside the aggregate
          function calls themselves.  See the added regression test case for an
          example.
      
          Fixing this requires modifying the API of flatten_tlist and its underlying
          function pull_var_clause.  I chose to make pull_var_clause's API for
          aggregates identical to what it was already doing for placeholders, since
          the useful behaviors turn out to be the same (error, report node as-is, or
          recurse into it).  I also tightened the error checking in this area a bit:
          if it was ever valid to see an uplevel Var, Aggref, or PlaceHolderVar here,
          that was a long time ago, so complain instead of ignoring them.
      
          Backpatch into 9.1.  The failure exists in 8.4 and 9.0 as well, but seeing
          that it only occurs in a basically-useless corner case, it doesn't seem
          worth the risks of changing a function API in a minor release.  There might
          be third-party code using pull_var_clause.
      411a033c
    • H
      Cherry-pick change to pull_var_clause() API. · bd3ab7bd
      Heikki Linnakangas 提交于
      We would get this later in PostgreSQL 8.4, but I'm about to cherry-pick
      more commits now, that depends on this.
      
      Upstream commmit:
      
      commit 1d97c19a
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Sun Apr 19 19:46:33 2009 +0000
      
          Fix estimate_num_groups() to not fail on PlaceHolderVars, per report from
          Stefan Kaltenbrunner.  The most reasonable behavior (at least for the near
          term) seems to be to ignore the PlaceHolderVar and examine its argument
          instead.  In support of this, change the API of pull_var_clause() to allow
          callers to request recursion into PlaceHolderVars.  Currently
          estimate_num_groups() is the only customer for that behavior, but where
          there's one there may be others.
      bd3ab7bd
    • H
      Re-implement RANGE PRECEDING/FOLLOWING. · 14a9108a
      Heikki Linnakangas 提交于
      This is similar to the old implementation, in that we use "+", "-" to
      compute the boundaries.
      
      Unfortunately it seems unlikely that this would be accepted in the
      upstream, but at least we have that feature back in GPDB now, the way it
      used to be. See discussion on pgsql-hackers about that:
      https://www.postgresql.org/message-id/26801.1265656635@sss.pgh.pa.us
      14a9108a
    • H
      Backport implementation of ORDER BY within aggregates, from PostgreSQL 9.0. · 4319b7bb
      Heikki Linnakangas 提交于
      This is functionality that was lost by the ripout & replace.
      
      commit 34d26872
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Tue Dec 15 17:57:48 2009 +0000
      
          Support ORDER BY within aggregate function calls, at long last providing a
          non-kluge method for controlling the order in which values are fed to an
          aggregate function.  At the same time eliminate the old implementation
          restriction that DISTINCT was only supported for single-argument aggregates.
      
          Possibly release-notable behavioral change: formerly, agg(DISTINCT x)
          dropped null values of x unconditionally.  Now, it does so only if the
          agg transition function is strict; otherwise nulls are treated as DISTINCT
          normally would, ie, you get one copy.
      
          Andrew Gierth, reviewed by Hitoshi Harada
      4319b7bb
    • H
      Remove PercentileExpr. · bb6a757e
      Heikki Linnakangas 提交于
      This loses the functionality, and leaves all the regression tests that used
      those functions failing.
      
      The plan is to later backport the upstream implementation of those
      functions from PostgreSQL 9.4. The feature is called "ordered set
      aggregates" there.
      bb6a757e
    • H
      Wholesale rip out and replace Window planner and executor code · f62bd1c6
      Heikki Linnakangas 提交于
      This adds some limitations, and removes some functionality that tte old
      implementation had. These limitations will be lifted, and missing
      functionality will be added back, in subsequent commits:
      
      * You can no longer have variables in start/end offsets
      
      * RANGE is not implemented (except for UNBOUNDED)
      
      * If you have multiple window functions that require a different sort
        ordering, the planner is not smart about placing them in a way that
        minimizes the number of sorts.
      
      This also lifts some limitations that the GPDB implementation had:
      
      * LEAD/LAG offset can now be negative. In the qp_olap_windowerr, a lot of
        queries that used to throw an "ROWS parameter cannot be negative" error
        are now passing. That error was an artifact of the eay LEAD/LAG were
        implemented. Those queries contain window function calls like "LEAD(col1,
        col2 - col3)", and sometimes with suitable values in col2 and col3, the
        second argument went negative. That caused the error. implementation of
        LEAD/LAG is OK with a negative argument.
      
      * Aggregate functions with no prelimfn or invprelimfn are now supported as
        window functions
      
      * Window functions, e.g. rank(), no longer require an ORDER BY. (The output
        will vary from one invocation to another, though, because the order is
        then not well defined. This is more annoying on GPDB than on PostgreSQL,
        because in GDPB the row order tends to vary because the rows are spread
        out across the cluster and will arrive in the master in unpredictable
        order)
      
      * NTILE doesn't require the argument expression to be in PARTITION BY
      
      * A window function's arguments may contain references to an outer query.
      
      This changes the OIDs of the built-in window functions to match upstream.
      Unfortunately, the OIDs had been hard-coded in ORCA, so to work around that
      until those hard-coded values are fixed in ORCA, the ORCA translator code
      contains a hack to map the old OID to the new ones.
      f62bd1c6
  22. 23 11月, 2017 2 次提交
    • H
      Make 'must_gather' logic when planning DISTINCT and ORDER BY more robust. · a5610212
      Heikki Linnakangas 提交于
      The old logic was:
      
      1. Decide if we need to put a Gather motion on top of the plan
      2. Add nodes to handle DISTINCT
      3. Add nodes to handle ORDER BY.
      4. Add Gather node, if we decided so in step 1.
      
      If in step 1, if the result was already focused on a single segment, we
      would make note that no Gather is needed, and not add one in step 4.
      However, the DISTINCT processing might add a Redistribute Motion node, so
      that the final result is not focused on a single node.
      
      I couldn't come up with a query where that would happen, as the code stands,
      but we saw such a case on the "window functions rewrite" branch we've been
      working on. There, the sort order/distribution of the input can be changed
      to process window functions. But even if this isn't actively broken right
      now, it seems more robust to change the logic so that 'must_gather' means
      'at the end, the result must end up on a single node', instead of 'we must
      add a Gather node'. The test that this adds exercises this issue after the
      the window functions rewrite, but right now it passes with or without these
      code changes. But might as well add it now.
      a5610212
    • H
      Fix DISTINCT with window functions. · 898ced7c
      Heikki Linnakangas 提交于
      The last 8.4 merge commit introduced support for DISTINCT with hashing,
      and refactored the way grouping_planner() works with the path keys. That
      broke DISTINCT with window functions, because the new distinct_pathkeys
      field was not set correctly.
      
      In commit 474f1db0, I moved some GPDB-added tests from the 'aggregates'
      test, to a new 'gp_aggregates' test. But I forgot to add the new test file
      to the test schedule, so it was not run. Oops. Add it to the schedule now.
      The tests in 'gp_aggregates' cover this bug.
      898ced7c
  23. 21 10月, 2017 1 次提交
    • H
      Fix distribution of rows in CREATE TABLE AS and ORDER BY. · c159ec72
      Heikki Linnakangas 提交于
      If a CREATE TABLE AS query contained an ORDER BY, the planner put a Motion
      node on top of the plan that focuses all the rows to a single node.
      However, that was confused with the re-distribute motion that CREATE TABLE
      AS that is supposed to go to the top, to distribute the rows according to
      the DISTRIBUTED BY of the table. This used to work before commit
      7e268107, because we used to not add an explicit Motion node on top of
      the plan for ORDER BY, but we just changed the sort-order information in
      the Flow.
      
      I have a nagging feeling that the apply_motion code isn't dealing with
      Motion on top of a Motion node correctly, because I would've expected to
      get a plan like that without this fix. Perhaps apply_motion silentlye
      refuses to add a Motion node on top of an existing Motion? That'd be a
      silly plan, of course, and the planner doesn't fortunately create such
      plans, so I'm not going to dig deeper into that right now.
      
      The test case is a simplified version from one of the
      "mpp21090_drop_col_oids_dml_*" TINC tests. I noticed this while moving
      those tests over from TINC to the main suite. We only run those tests
      in the concourse pipeline with "set optimizer=on", so it didn't catch
      this issue with optimizer=off.
      
      Fixes github issue #3577.
      c159ec72