1. 30 11月, 2017 1 次提交
  2. 24 11月, 2017 7 次提交
    • H
      Backport upstream comment updates · 122e817b
      Heikki Linnakangas 提交于
      commit 96f990e2
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Wed Jul 13 20:23:09 2011 -0400
      
          Update some comments to clarify who does what in targetlist creation.
      
          No code changes; just avoid blaming query_planner for things it doesn't
          really do.
      122e817b
    • H
      Backport upstream bugfix related to Window functions. · 411a033c
      Heikki Linnakangas 提交于
      The test case added to the regression suite actually seems to work on
      GPDB even without this, but nevertheless seems like a good idea to pick
      it now, since we have the code it affected. Also, I'm about to backport
      more stuff that depend on this.
      
      commit c1d9579d
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Tue Jul 12 18:23:55 2011 -0400
      
          Avoid listing ungrouped Vars in the targetlist of Agg-underneath-Window.
      
          Regular aggregate functions in combination with, or within the arguments
          of, window functions are OK per spec; they have the semantics that the
          aggregate output rows are computed and then we run the window functions
          over that row set.  (Thus, this combination is not really useful unless
          there's a GROUP BY so that more than one aggregate output row is possible.)
          The case without GROUP BY could fail, as recently reported by Jeff Davis,
          because sloppy construction of the Agg node's targetlist resulted in extra
          references to possibly-ungrouped Vars appearing outside the aggregate
          function calls themselves.  See the added regression test case for an
          example.
      
          Fixing this requires modifying the API of flatten_tlist and its underlying
          function pull_var_clause.  I chose to make pull_var_clause's API for
          aggregates identical to what it was already doing for placeholders, since
          the useful behaviors turn out to be the same (error, report node as-is, or
          recurse into it).  I also tightened the error checking in this area a bit:
          if it was ever valid to see an uplevel Var, Aggref, or PlaceHolderVar here,
          that was a long time ago, so complain instead of ignoring them.
      
          Backpatch into 9.1.  The failure exists in 8.4 and 9.0 as well, but seeing
          that it only occurs in a basically-useless corner case, it doesn't seem
          worth the risks of changing a function API in a minor release.  There might
          be third-party code using pull_var_clause.
      411a033c
    • H
      Cherry-pick change to pull_var_clause() API. · bd3ab7bd
      Heikki Linnakangas 提交于
      We would get this later in PostgreSQL 8.4, but I'm about to cherry-pick
      more commits now, that depends on this.
      
      Upstream commmit:
      
      commit 1d97c19a
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Sun Apr 19 19:46:33 2009 +0000
      
          Fix estimate_num_groups() to not fail on PlaceHolderVars, per report from
          Stefan Kaltenbrunner.  The most reasonable behavior (at least for the near
          term) seems to be to ignore the PlaceHolderVar and examine its argument
          instead.  In support of this, change the API of pull_var_clause() to allow
          callers to request recursion into PlaceHolderVars.  Currently
          estimate_num_groups() is the only customer for that behavior, but where
          there's one there may be others.
      bd3ab7bd
    • H
      Re-implement RANGE PRECEDING/FOLLOWING. · 14a9108a
      Heikki Linnakangas 提交于
      This is similar to the old implementation, in that we use "+", "-" to
      compute the boundaries.
      
      Unfortunately it seems unlikely that this would be accepted in the
      upstream, but at least we have that feature back in GPDB now, the way it
      used to be. See discussion on pgsql-hackers about that:
      https://www.postgresql.org/message-id/26801.1265656635@sss.pgh.pa.us
      14a9108a
    • H
      Backport implementation of ORDER BY within aggregates, from PostgreSQL 9.0. · 4319b7bb
      Heikki Linnakangas 提交于
      This is functionality that was lost by the ripout & replace.
      
      commit 34d26872
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Tue Dec 15 17:57:48 2009 +0000
      
          Support ORDER BY within aggregate function calls, at long last providing a
          non-kluge method for controlling the order in which values are fed to an
          aggregate function.  At the same time eliminate the old implementation
          restriction that DISTINCT was only supported for single-argument aggregates.
      
          Possibly release-notable behavioral change: formerly, agg(DISTINCT x)
          dropped null values of x unconditionally.  Now, it does so only if the
          agg transition function is strict; otherwise nulls are treated as DISTINCT
          normally would, ie, you get one copy.
      
          Andrew Gierth, reviewed by Hitoshi Harada
      4319b7bb
    • H
      Remove PercentileExpr. · bb6a757e
      Heikki Linnakangas 提交于
      This loses the functionality, and leaves all the regression tests that used
      those functions failing.
      
      The plan is to later backport the upstream implementation of those
      functions from PostgreSQL 9.4. The feature is called "ordered set
      aggregates" there.
      bb6a757e
    • H
      Wholesale rip out and replace Window planner and executor code · f62bd1c6
      Heikki Linnakangas 提交于
      This adds some limitations, and removes some functionality that tte old
      implementation had. These limitations will be lifted, and missing
      functionality will be added back, in subsequent commits:
      
      * You can no longer have variables in start/end offsets
      
      * RANGE is not implemented (except for UNBOUNDED)
      
      * If you have multiple window functions that require a different sort
        ordering, the planner is not smart about placing them in a way that
        minimizes the number of sorts.
      
      This also lifts some limitations that the GPDB implementation had:
      
      * LEAD/LAG offset can now be negative. In the qp_olap_windowerr, a lot of
        queries that used to throw an "ROWS parameter cannot be negative" error
        are now passing. That error was an artifact of the eay LEAD/LAG were
        implemented. Those queries contain window function calls like "LEAD(col1,
        col2 - col3)", and sometimes with suitable values in col2 and col3, the
        second argument went negative. That caused the error. implementation of
        LEAD/LAG is OK with a negative argument.
      
      * Aggregate functions with no prelimfn or invprelimfn are now supported as
        window functions
      
      * Window functions, e.g. rank(), no longer require an ORDER BY. (The output
        will vary from one invocation to another, though, because the order is
        then not well defined. This is more annoying on GPDB than on PostgreSQL,
        because in GDPB the row order tends to vary because the rows are spread
        out across the cluster and will arrive in the master in unpredictable
        order)
      
      * NTILE doesn't require the argument expression to be in PARTITION BY
      
      * A window function's arguments may contain references to an outer query.
      
      This changes the OIDs of the built-in window functions to match upstream.
      Unfortunately, the OIDs had been hard-coded in ORCA, so to work around that
      until those hard-coded values are fixed in ORCA, the ORCA translator code
      contains a hack to map the old OID to the new ones.
      f62bd1c6
  3. 23 11月, 2017 2 次提交
    • H
      Make 'must_gather' logic when planning DISTINCT and ORDER BY more robust. · a5610212
      Heikki Linnakangas 提交于
      The old logic was:
      
      1. Decide if we need to put a Gather motion on top of the plan
      2. Add nodes to handle DISTINCT
      3. Add nodes to handle ORDER BY.
      4. Add Gather node, if we decided so in step 1.
      
      If in step 1, if the result was already focused on a single segment, we
      would make note that no Gather is needed, and not add one in step 4.
      However, the DISTINCT processing might add a Redistribute Motion node, so
      that the final result is not focused on a single node.
      
      I couldn't come up with a query where that would happen, as the code stands,
      but we saw such a case on the "window functions rewrite" branch we've been
      working on. There, the sort order/distribution of the input can be changed
      to process window functions. But even if this isn't actively broken right
      now, it seems more robust to change the logic so that 'must_gather' means
      'at the end, the result must end up on a single node', instead of 'we must
      add a Gather node'. The test that this adds exercises this issue after the
      the window functions rewrite, but right now it passes with or without these
      code changes. But might as well add it now.
      a5610212
    • H
      Fix DISTINCT with window functions. · 898ced7c
      Heikki Linnakangas 提交于
      The last 8.4 merge commit introduced support for DISTINCT with hashing,
      and refactored the way grouping_planner() works with the path keys. That
      broke DISTINCT with window functions, because the new distinct_pathkeys
      field was not set correctly.
      
      In commit 474f1db0, I moved some GPDB-added tests from the 'aggregates'
      test, to a new 'gp_aggregates' test. But I forgot to add the new test file
      to the test schedule, so it was not run. Oops. Add it to the schedule now.
      The tests in 'gp_aggregates' cover this bug.
      898ced7c
  4. 21 10月, 2017 1 次提交
    • H
      Fix distribution of rows in CREATE TABLE AS and ORDER BY. · c159ec72
      Heikki Linnakangas 提交于
      If a CREATE TABLE AS query contained an ORDER BY, the planner put a Motion
      node on top of the plan that focuses all the rows to a single node.
      However, that was confused with the re-distribute motion that CREATE TABLE
      AS that is supposed to go to the top, to distribute the rows according to
      the DISTRIBUTED BY of the table. This used to work before commit
      7e268107, because we used to not add an explicit Motion node on top of
      the plan for ORDER BY, but we just changed the sort-order information in
      the Flow.
      
      I have a nagging feeling that the apply_motion code isn't dealing with
      Motion on top of a Motion node correctly, because I would've expected to
      get a plan like that without this fix. Perhaps apply_motion silentlye
      refuses to add a Motion node on top of an existing Motion? That'd be a
      silly plan, of course, and the planner doesn't fortunately create such
      plans, so I'm not going to dig deeper into that right now.
      
      The test case is a simplified version from one of the
      "mpp21090_drop_col_oids_dml_*" TINC tests. I noticed this while moving
      those tests over from TINC to the main suite. We only run those tests
      in the concourse pipeline with "set optimizer=on", so it didn't catch
      this issue with optimizer=off.
      
      Fixes github issue #3577.
      c159ec72
  5. 13 10月, 2017 1 次提交
    • J
      Remove superfluous pathkey canonicalization · 7913e231
      Jesse Zhang 提交于
      `make_pathkeys_for_sortclauses` with a `true` last argument promises to
      canonicalize the returned path keys. We somehow cargo-culted a few
      unnecessary `canonicalize_pathkeys` immediately after those calls.
      
      This commit removes such superfluous calls to `canonicalize_pathkeys`.
      Signed-off-by: NMax Yang <myang@pivotal.io>
      7913e231
  6. 12 10月, 2017 1 次提交
  7. 27 9月, 2017 8 次提交
    • S
      Remove dead code around JoinExpr::subqfromlist. · f16deabd
      Shreedhar Hardikar 提交于
      This was used to keep information about the subquery join tree for
      pulled-up sublinks for use later in deconstruct_recurse().  With the
      upstream subselect merge, a JoinExpr constructed at the pull-up time
      itself, so this is no longer needed since the subquery join tree
      information is available in the constructed JoinExpr.
      
      Also with the merge, deconstruct_recurse() handles JOIN_SEMI JoinExprs.
      However, since GPDB differs from upstream by treating SEMI joins as
      INNER join for internal join planning, this commit also updates
      inner_join_rels correctly for SEMI joins (see regression test).
      
      Also remove unused function declaration for not_null_inner_vars().
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      f16deabd
    • E
      Improve pull_up_subqueries logic w.r.t PlaceHolderVar · da29e67a
      Ekta Khanna 提交于
      commit c59d8dd4
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Tue Apr 28 21:31:16 2009 +0000
      
          Improve pull_up_subqueries logic so that it doesn't insert unnecessary
          PlaceHolderVar nodes in join quals appearing in or below the lowest
          outer join that could null the subquery being pulled up.  This improves
          the planner's ability to recognize constant join quals, and probably
          helps with detection of common sort keys (equivalence classes) as well.
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      da29e67a
    • E
      Refrain from creating the planner's placeholder_list · 695c9fdf
      Ekta Khanna 提交于
      commit 31468d05
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Wed Oct 22 20:17:52 2008 +0000
      
          Dept of better ideas: refrain from creating the planner's placeholder_list
          until vars are distributed to rels during query_planner() startup.  We don't
          really need it before that, and not building it early has some advantages.
          First, we don't need to put it through the various preprocessing steps, which
          saves some cycles and eliminates the need for a number of routines to support
          PlaceHolderInfo nodes at all.  Second, this means one less unused plan for any
          sub-SELECT appearing in a placeholder's expression, since we don't build
          placeholder_list until after sublink expansion is complete.
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      695c9fdf
    • B
      Add a concept of "placeholder" variables to the planner · 2b5c8201
      Bhuvnesh Chaudhary 提交于
      commit e6ae3b5d
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Tue Oct 21 20:42:53 2008 +0000
      
          Add a concept of "placeholder" variables to the planner.  These are variables
          that represent some expression that we desire to compute below the top level
          of the plan, and then let that value "bubble up" as though it were a plain
          Var (ie, a column value).
      
          The immediate application is to allow sub-selects to be flattened even when
          they are below an outer join and have non-nullable output expressions.
          Formerly we couldn't flatten because such an expression wouldn't properly
          go to NULL when evaluated above the outer join.  Now, we wrap it in a
          PlaceHolderVar and arrange for the actual evaluation to occur below the outer
          join.  When the resulting Var bubbles up through the join, it will be set to
          NULL if necessary, yielding the correct results.  This fixes a planner
          limitation that's existed since 7.1.
      
          In future we might want to use this mechanism to re-introduce some form of
          Hellerstein's "expensive functions" optimization, ie place the evaluation of
          an expensive function at the most suitable point in the plan tree.
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      2b5c8201
    • E
      Improve sublink pullup code to handle ANY/EXISTS sublinks · 1ddcb97e
      Ekta Khanna 提交于
      commit 19e34b62
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Sun Aug 17 01:20:00 2008 +0000
      
          Improve sublink pullup code to handle ANY/EXISTS sublinks that are at top
          level of a JOIN/ON clause, not only at top level of WHERE.  (However, we
          can't do this in an outer join's ON clause, unless the ANY/EXISTS refers
          only to the nullable side of the outer join, so that it can effectively
          be pushed down into the nullable side.)  Per request from Kevin Grittner.
      
          In passing, fix a bug in the initial implementation of EXISTS pullup:
          it would Assert if the EXIST's WHERE clause used a join alias variable.
          Since we haven't yet flattened join aliases when this transformation
          happens, it's necessary to include join relids in the computed set of
          RHS relids.
      
      Ref [#142356521]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      1ddcb97e
    • E
      Replace JOIN_LASJ by JOIN_ANTI · 6e7b4722
      Ekta Khanna 提交于
      After merging with e006a24a, Anti Semi Join will
      be denoted by `JOIN_ANTI` instead of `JOIN_LASJ`
      
      Ref [#142355175]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      6e7b4722
    • E
      CDBlize the cherry-pick e006a24a · 0feb1bd9
      Ekta Khanna 提交于
      Original Flow:
      cdb_flatten_sublinks
      	+--> pull_up_IN_clauses
      		+--> convert_sublink_to_join
      
      New Flow:
      cdb_flatten_sublinks
      	+--> pull_up_sublinks
      
      This commit contains relevant changes for the above flow.
      
      Previously, `try_join_unique` was part of `InClauseInfo`. It was getting
      set in `convert_IN_to_join()` and used in `cdb_make_rel_dedup_info()`.
      Now, since `InClauseInfo` is not present and we construct
      `FlattenedSublink` instead in `convert_ANY_sublink_to_join()`. And later
      in the flow, we construct `SpecialJoinInfo` from `FlattenedSublink` in
      `deconstruct_sublink_quals_to_rel()`. Hence, adding `try_join_unique` as
      part of both `FlattenedSublink` and `SpecialJoinInfo`.
      
      Ref [#142355175]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      0feb1bd9
    • E
      Implement SEMI and ANTI joins in the planner and executor. · fe2eb2c9
      Ekta Khanna 提交于
      commit e006a24a
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Thu Aug 14 18:48:00 2008 +0000
      
          Implement SEMI and ANTI joins in the planner and executor.  (Semijoins replace
          the old JOIN_IN code, but antijoins are new functionality.)  Teach the planner
          to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
          joins respectively.  Also, LEFT JOINs with suitable upper-level IS NULL
          filters are recognized as being anti joins.  Unify the InClauseInfo and
          OuterJoinInfo infrastructure into "SpecialJoinInfo".  With that change,
          it becomes possible to associate a SpecialJoinInfo with every join attempt,
          which permits some cleanup of join selectivity estimation.  That needs to be
          taken much further than this patch does, but the next step is to change the
          API for oprjoin selectivity functions, which seems like material for a
          separate patch.  So for the moment the output size estimates for semi and
          especially anti joins are quite bogus.
      
      Ref [#142355175]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      fe2eb2c9
  8. 25 9月, 2017 1 次提交
    • H
      Remove row order information from Flow. · 7e268107
      Heikki Linnakangas 提交于
      A Motion node often needs to "merge" the incoming streams, to preserve the
      overall sort order. Instead of carrying sort order information throughout
      the later stages of planning, in the Flow struct, pass it as argument
      directly to make_motion() and other functions, where a Motion node is
      created. This simplifies things.
      
      To make that work, we can no longer rely on apply_motion() to add the final
      Motion on top of the plan, when the (sub-)query contains an ORDER BY. That's
      because we no longer have that information available at apply_motion(). Add
      the Motion node in grouping_planner() instead, where we still have that
      information, as a path key.
      
      When I started to work on this, this also fixed a bug, where the sortColIdx
      of plan flow node may refer to wrong resno. A test case for that is
      included. However, that case was since fixed by other coincidental changes
      to partition elimination, so now this is just refactoring.
      7e268107
  9. 21 9月, 2017 2 次提交
    • H
      Fix CURRENT OF to work with PL/pgSQL cursors. · 91411ac4
      Heikki Linnakangas 提交于
      It only worked for cursors declared with DECLARE CURSOR, before. You got
      an "there is no parameter $0" error if you tried. This moves the decision
      on whether a plan is "simply updatable", from the parser to the planner.
      Doing it in the parser was awkward, because we only want to do it for
      queries that are used in a cursor, and for SPI queries, we don't know it
      at that time yet.
      
      For some reason, the copy, out, read-functions of CurrentOfExpr were missing
      the cursor_param field. While we're at it, reorder the code to match
      upstream.
      
      This only makes the required changes to the Postgres planner. ORCA has never
      supported updatable cursors. In fact, it will fall back to the Postgres
      planner on any DECLARE CURSOR command, so that's why the existing tests
      have passed even with optimizer=off.
      91411ac4
    • H
      Add support for CREATE FUNCTION EXECUTE ON [MASTER | ALL SEGMENTS] · aa148d2a
      Heikki Linnakangas 提交于
      We already had a hack for the EXECUTE ON ALL SEGMENTS case, by setting
      prodataaccess='s'. This exposes the functionality to users via DDL, and adds
      support for the EXECUTE ON MASTER case.
      
      There was discussion on gpdb-dev about also supporting ON MASTER AND ALL
      SEGMENTS, but that is not implemented yet. There is no handy "locus" in the
      planner to represent that. There was also discussion about making a
      gp_segment_id column implicitly available for functions, but that is also
      not implemented yet.
      
      The old behavior was that a function that if a function was marked as
      IMMUTABLE, it could be executed anywhere. Otherwise it was always executed
      on the master. For backwards-compatibility, this keeps that behavior for
      EXECUTE ON ANY (the default), so even if a function is marked as EXECUTE ON
      ANY, it will always be executed on the master unless it's IMMUTABLE.
      
      There is no support for these new options in ORCA. Using any ON MASTER or
      ON ALL SEGMENTS functions in a query cause ORCA to fall back. This is the
      same as with the prodataaccess='s' hack that this replaces, but now that it
      is more user-visible, it would be nice to teach ORCA about it.
      
      The new options are only supported for set-returning functions, because for
      a regular function marked as EXECUTE ON ALL SEGMENTS, it's not clear how
      the results should be combined. ON MASTER would probably be doable, but
      there's no need for that right now, so punt.
      
      Another restriction is that a function with ON ALL SEGMENTS or ON MASTER can
      only be used in the FROM clause, or in the target list of a simple SELECT
      with no FROM clause. So "SELECT func()" is accepted, but "SELECT func() FROM
      foo" is not. "SELECT * FROM func(), foo" works, however. EXECUTE ON ANY
      functions, which is the default, work the same as before.
      aa148d2a
  10. 17 9月, 2017 1 次提交
    • H
      Convert WindowFrame to frameOptions + start + end · ebf9763c
      Heikki Linnakangas 提交于
      In GPDB, we have so far used a WindowFrame struct to represent the start
      and end window bound, in a ROWS/RANGE BETWEEN clause, while PostgreSQL
      uses the combination of  a frameOptions bitmask and start and end
      expressions. Refactor to replace the WindowFrame with the upstream
      representation.
      ebf9763c
  11. 12 9月, 2017 1 次提交
    • H
      Split WindowSpec into separate before and after parse-analysis structs. · 789f443d
      Heikki Linnakangas 提交于
      In the upstream, two different structs are used to represent a window
      definition. WindowDef in the grammar, which is transformed into
      WindowClause during parse analysis. In GPDB, we've been using the same
      struct, WindowSpec, in both stages. Split it up, to match the upstream.
      
      The representation of the window frame, i.e. "ROWS/RANGE BETWEEN ..." was
      different between the upstream implementation and the GPDB one. We now use
      the upstream frameOptions+startOffset+endOffset representation in raw
      WindowDef parse node, but it's still converted to the WindowFrame
      representation for the later stages, so WindowClause still uses that. I
      will switch over the rest of the codebase to the upstream representation as
      a separate patch.
      
      Also, refactor WINDOW clause deparsing to be closer to upstream.
      
      One notable difference is that the old WindowSpec.winspec field corresponds
      to the winref field in WindowDef andWindowClause, except that the new
      'winref' is 1-based, while the old field was 0-based.
      
      Another noteworthy thing is that this forbids specifying "OVER (w
      ROWS/RANGE BETWEEN ...", if the window "w" already specified a window frame,
      i.e. a different ROWS/RANGE BETWEEN. There was one such case in the
      regression suite, in window_views, and this updates the expected output of
      that to be an error.
      789f443d
  12. 06 9月, 2017 1 次提交
    • H
      Ensure that stable functions in a prepared statement are re-evaluated. · ccca0af2
      Heikki Linnakangas 提交于
      If a prepared statement, or a cached plan for an SPI query e.g. from a
      PL/pgSQL function, contains stable functions, the stable functions were
      incorrectly evaluated only once at plan time, instead of on every execution
      of the plan. This happened to not be a problem in queries that contain any
      parameters, because in GPDB, they are re-planned on every invocation
      anyway, but non-parameter queries were broken.
      
      In the planner, before this commit, when simplifying expressions, we set
      the transform_stable_funcs flag to true for every query, and evaluated all
      stable functions at planning time. Change it to false, and also rename it
      back to 'estimate', as it's called in the upstream. That flag was changed
      back in 2010, in order to allow partition pruning to work with qual
      containing stable functions, like TO_DATE. I think back then, we always
      re-planned every query, so that was OK, but we do cache plans now.
      
      To avoid regressing to worse plans, change eval_const_expressions() so that
      it still does evaluate stable functions, even when the 'estimate' flag is
      off. But when it does so, mark the plan as "one-off", meaning that it must
      be re-planned on every execution. That gives the old, intended, behavior,
      that such plans are indeed re-planned, but it still allows plans that don't
      use stable functions to be cached.
      
      This seems to fix github issue #2661. Looking at the direct dispatch code
      in apply_motion(), I suspect there are more issues like this lurking there.
      There's a call to planner_make_plan_constant(), modifying the target list
      in place, and that happens during planning. But this at least fixes the
      non-direct dispatch cases, and is a necessary step for fixing any remaining
      issues.
      
      For some reason, the query now gets planned *twice* for every invocation.
      That's not ideal, but it was an existing issue for prepared statements with
      parameters, already. So let's deal with that separately.
      ccca0af2
  13. 04 9月, 2017 1 次提交
  14. 01 9月, 2017 1 次提交
  15. 21 8月, 2017 1 次提交
    • D
      Move ORCA invocation into standard_planner · d5dbbfd9
      Daniel Gustafsson 提交于
      The way ORCA was tied into the planner, running a planner_hook
      was not supported in the intended way. This commit moves ORCA
      into standard_planner() instead of planner() and leaves the hook
      for extensions to make use of, with or without ORCA. Since the
      intention with the optimizer GUC is to replace the planner in
      postgres, while keeping the planning proess, this allows for
      planner extensions to co-operate with that.
      
      In order to reduce the Greenplum footprint in upstream postgres
      source files for future merges, the ORCA functions are moved to
      their own file.
      
      Also adds a memaccounting class for planner hooks since they
      otherwise ran in the planner scope, as well as a test for using
      planner_hooks.
      d5dbbfd9
  16. 09 8月, 2017 1 次提交
  17. 01 8月, 2017 1 次提交
    • P
      Choose segment randomly to server as singleton reader gang · 06f56fe8
      Pengzhou Tang 提交于
      This is a typo issue that cause segment 0 was always assigned
      as singleton reader. It existed for a long time with no
      functional issue, but may result in performance issue somehow.
      
      Beside, root->config->cdbpath_segments is tuneable by GUC
      gp_segments_for_planner, so gp_singleton_segindex may point to
      an invalid segment, we use real segment count instead to avoid
      mismatch.
      06f56fe8
  18. 17 6月, 2017 1 次提交
    • F
      Merge 8.4 CTE (sans recursive) · 41c3b698
      This brought in postgres/postgres@44d5be0 pretty much wholesale, except:
      
      1. We leave `WITH RECURSIVE` for a later commit. The code is brought in,
          but kept dormant by us bailing early at the parser whenever there is
          a recursive CTE.
      2. We use `ShareInputScan` in the stead of `CteScan`. ShareInputScan is
          basically the parallel-capable `CteScan`. (See `set_cte_pathlist`
          and `create_ctescan_plan`)
      3. Consequently we do not put the sub-plan for the CTE in a
          pseudo-initplan: it is directly present in the main plan tree
          instead, hence we disable `SS_process_ctes` inside
          `subquery_planner`
      4. Another corollary is that all new operators (`CteScan`,
          `RecursiveUnion`, and `WorkTableScan`) are dead code right now. But
          they will come to live once we bring in parallel implementation of
          `WITH RECURSIVE`
      
      In general this commit reduces the divergence between Greenplum and
      upstream.
      
      User visible changes:
      The merge in parser enables a corner case previously treated as error:
      you can now specify fewer columns in your `WITH` clause than the actual
      projected columns in the body subquery of the `WITH`.
      
      Original commit message:
      
      > Implement SQL-standard WITH clauses, including WITH RECURSIVE.
      >
      > There are some unimplemented aspects: recursive queries must use UNION ALL
      > (should allow UNION too), and we don't have SEARCH or CYCLE clauses.
      > These might or might not get done for 8.4, but even without them it's a
      > pretty useful feature.
      >
      > There are also a couple of small loose ends and definitional quibbles,
      > which I'll send a memo about to pgsql-hackers shortly.  But let's land
      > the patch now so we can get on with other development.
      >
      > Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane
      >
      
      (cherry picked from commit 44d5be0e)
      41c3b698
  19. 01 6月, 2017 1 次提交
    • B
      Fixup subplans referring to same plan_id · d0aea184
      Bhuvnesh Chaudhary 提交于
      Before parallelization on nodes in cdbparallelize if there are
      any subplan nodes in the plan which refer to the same plan_id,
      parallelization step breaks as a node must be processed only
      once by it. This patch fixes the issue by generating a new
      subplan node in glob subplans, and updating the plan_id of the
      subplan to refer to the newly created node.
      d0aea184
  20. 11 5月, 2017 1 次提交
  21. 26 4月, 2017 1 次提交
    • S
      Add expansion support to HHashTable to optimize HashAgg · 3bb360de
      Shreedhar Hardikar 提交于
      When creating HHashTable, instead of using the available memory as the
      sole basis to determine the number of buckets, it now computes nbuckets
      as a function of estimated groups/entries given by the planner. To
      prevent performance degradation when the statistics are off, the
      hash table expands by doubling the number of buckets and rehashing all
      the entries until it is out of memory.
      If more space is needed, HHashTable spills to disk as before, but it can
      now accurately allocate buckets when the spill files are reloaded based
      on the number of entries spilled.
      
      This commit also makes other minor fixes:
        - Change calcHashAggTableSizes() signature to make it reusable
        - Keep track of in-memory entries in the HT
        - Add tests for when it overflows multiple times
        - Estimate the overhead per entry in the hash table more acurately
        - Refactor statistics collection for EXPLAIN ANALZYE
      3bb360de
  22. 14 4月, 2017 1 次提交
    • D
      Cherry-pick upstream commit eaf1b5d3 "SS_finalize_plan" · 4697811d
      Dhanashree Kashid and Jemish Patel 提交于
      This is a cherry-pick of following upstream commit:
      
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Thu Jul 10 02:14:03 2008 +0000
      
          Tighten up SS_finalize_plan's computation of valid_params to exclude Params of
          the current query level that aren't in fact output parameters of the current
          initPlans.  (This means, for example, output parameters of regular subplans.)
          To make this work correctly for output parameters coming from sibling
          initplans requires rejiggering the API of SS_finalize_plan just a bit:
          we need the siblings to be visible to it, rather than hidden as
          SS_make_initplan_from_plan had been doing.  This is really part of my response
          to bug #4290, but I concluded this part probably shouldn't be back-patched,
          since all that it's doing is to make a debugging cross-check tighter.
      
          (cherry picked from commit eaf1b5d3)
      4697811d
  23. 01 4月, 2017 1 次提交
    • H
      Use PartitionSelectors for partition elimination, even without ORCA. · e378d84b
      Heikki Linnakangas 提交于
      The old mechanism was to scan the complete plan, searching for a pattern
      with a Join, where the outer side included an Append node. The inner
      side was duplicated into an InitPlan, with the pg_partition_oid aggregate
      to collect the Oids of all the partitions that can match. That was
      inefficient and broken: if the duplicated plan was volatile, you might
      choose wrong partitions. And scanning the inner side twice can obviously
      be slow, if there are a lot of tuples.
      
      Rewrite the way such plans are generated. Instead of using an InitPlan,
      inject a PartitionSelector node into the inner side of the join.
      
      Fixes github issues #2100 and #2116.
      e378d84b
  24. 31 3月, 2017 1 次提交
    • H
      Remove unused Append.hasXslice fields and code to set it. · 068f0d53
      Heikki Linnakangas 提交于
      It was added back in 2010, as part of a patch to:
      
      commit d0ca3d8a4333db510ac68145c30ff917626d2037
      Author: kentj <a@b>
      Date:   Mon Jan 25 13:50:14 2010 -0800
      
          MPP-7734: initialize executor nodes for the active slice instead of the
          whole tree. Postponding the initialization of the subplans of an Append
          node to the time when it is processed.
          Send init gpmon packages for the subnodes of the Append node even
          though we don't initialize them.
      
          [git-p4: depot-paths = "//cdb2/main/": change = 43428]
      
      However, it was reverted only a few weeks later:
      
      commit 5cb7e64a093dfcc1bcbdca5ed74c261e9c56d3a3
      Author: kentj <a@b>
      Date:   Tue Feb 16 16:40:49 2010 -0800
      
          MPP-8031, MPP-7734: revert back the Append changes, since it breaks the
          explain analyze. It is hard to fix the explain analyze with the existing
          Append changes. It needs some more thinking.
      
          [git-p4: depot-paths = "//cdb2/main/": change = 45078]
      
      The revert removed all use of the flag, but left the flag behind. Remove
      it.
      068f0d53
  25. 02 3月, 2017 1 次提交
    • H
      Add a GUC, to produce a message at INFO level when ORCA falls back. · fff8e621
      Heikki Linnakangas 提交于
      We have a bunch of existing tests that test whether ORCA falls back. They
      set optimizer_log_failure='all' and client_min_messages='log', and grep
      the output for "Planner" or "Planner produced plan". That's error-prone.
      
      This new GUC makes that kind of tests easier and more robust. You can
      simply set the GUC, and if there are no extra INFO messages in the output,
      ORCA didn't fall back.
      fff8e621