1. 08 9月, 2017 2 次提交
  2. 07 9月, 2017 20 次提交
    • H
      Refactor EDGE_* macros in nodeWindow.c. · ec8cbc2c
      Heikki Linnakangas 提交于
      Refactor the EDGE_IS_* macros so that they take the current
      WindowStatePerLevel as argument, rather than a WindowFrameEdge. This will
      make the transition to the upstream window function implementation a bit
      easier, because the upstream implementation doesn't have a WindowFrameEdge
      struct.
      
      Encapsulate access to level_state->is_rows in a macro. This is also in
      preparation for the upstream implementation, which will contain that
      information as a flag in the frameOptions bitmask.
      
      Many functions used an 'is_lead' argument, and one used 'is_trail', to
      keep track whether we're dealing with the leading or trailing edge.
      Introduce a little enum with EDGE_TRAIL and EDGE_LEAD values, to avoid
      having to remember what 'true' or 'false' means in which context.
      ec8cbc2c
    • H
      Decide whether a window edge is "delayed" later, in the executor. · 23b262f2
      Heikki Linnakangas 提交于
      In principle, it makes sense to determine at plan-time, whether the
      expression needs to be re-evaluated for every row. In practice, it seems
      simpler to decide that in the executor, when initializing the Window node.
      This allows removing a bunch of code from the planner, and from the ORCA
      translator, including the hack to force the expression to be delayed if it
      was a SubLink.
      
      The planner always set the delayed flag, unless the expression was a Const.
      We can easily and quickly check for that in the executor too. I'm not sure
      how ORCA decided whether to delay or not, but in some quick testing I
      cannot come up with a case where it would decide differently.
      23b262f2
    • H
      Refactor code that selects a common type for columns in a UNION query. · 352362a6
      Heikki Linnakangas 提交于
      The big difference is that each leaf query is now transformed in one go,
      like it's done in the upstream, instead of transforming the target list
      and FROM list first. That partial transformation was causing trouble for
      another refactoring that I'm working on, which ill change the way window
      functions are handled in parse analysis.
      
      This two-pass code is GPDB-specific, PostgreSQL uses a simpler algorithm
      that works bottom-up, one setop node at a time, to select the column types.
      352362a6
    • H
      Remove remnants of "EXCLUDE [CURRENT ROW|GROUP|TIES|NO OTHERS]" syntax. · 646cdc60
      Heikki Linnakangas 提交于
      It hasn't been implemented, but there is basic support in the grammar,
      just enough to detect the syntax and throw an error or ignore it. All the
      rest was dead code.
      646cdc60
    • R
      Verify the newval for GUC 'statement_mem' and 'max_resource_groups' only if... · 7c322d0c
      Richard Guo 提交于
      Verify the newval for GUC 'statement_mem' and 'max_resource_groups' only if they are actually being set.
      
      In the process of starting gpdb, one step is to check if all GUCs are valid with new values, but without actually setting them.
      7c322d0c
    • D
      Enable ORCA to use IndexScan on Leaf Partitions · dae6849f
      Currently ORCA does not support index scan on leaf partitions. It only supports
      index scan if we query the root table. This commit along with the corresponding
      ORCA changes adds a support for using indexes when leaf partitions are queried
      directly.
      
      When a root table that has indexes (either homogenous/complete or
      heterogenous/partial) is queried; the Relcache Translator sends index
      information to ORCA.  This enables ORCA to generate an alternative plan with
      Dynamic Index Scan on all partitions (in case of homogenous index) or a plan
      with partial scan i.e. Dynamic Table Scan on leaf partitions that  don’t have
      indexes + Dynamic Index Scan on leaf partitions with indexes (in case of
      heterogeneous index).
      
      This is a two step process in Relcache Translator as described below:
      
      Step 1 - Get list of all index oids
      
      `CTranslatorRelcacheToDXL::PdrgpmdidRelIndexes()` performs this step and it
      only retrieves indexes on root and regular tables; for leaf partitions it bails
      out.
      
      Now for root, list of index oids is nothing but index oids on its leaf
      partitions. For instance:
      
      ```
      CREATE TABLE foo ( a int, b int, c int, d int) DISTRIBUTED by (a) PARTITION
      BY RANGE(b) (PARTITION p1 START (1) END (10) INCLUSIVE, PARTITION p2 START (11)
      END (20) INCLUSIVE);
      
      CREATE INDEX complete_c on foo USING btree (c); CREATE INDEX partial_d on
      foo_1_prt_p2 using btree(d);
      ```
      The index list will look like = { complete_c_1_prt_p1, partial_d }
      
      For a complete index, the index oid of the first leaf partitions is retrieved.
      If there are partial indexes, all the partial index oids are retrieved.
      
      Step 2 - Construct Index Metadata object
      
      `CTranslatorRelcacheToDXL::Pmdindex()` performs this step.
      
      For each index oid retrieved in Step #1 above; construct an Index Metadata
      object (CMDIndexGPDB) to be stored in metadata cache such that ORCA can get all
      the information about the index.
      Along with all other information about the index, `CMDIndexGPDB` also contains
      a flag `fPartial` which denotes if the given index is homogenous (if yes, ORCA
      will apply it to all partitions selected by partition selector) or heterogenous
      (if yes, the index will be applied to only appropriate partitions).
      The process is as follows:
      ```
      	Foreach oid in index oid list :
      		Get index relation (rel)
      		If rel is a leaf partition :
      			Get the root rel of the leaf partition
      			Get all	the indexes on the root (this will be same list as step #1)
      			Determine if the current index oid is homogenous or heterogenous
      			Construct CMDIndexGPDB based appropriately (with fPartial, part constraint,
      			defaultlevels info)
      		Else:
      			Construct a normal CMDIndexGPDB object.
      ```
      
      Now for leaf partitions, there is no notion of homogenous or heterogenous
      indexes since a leaf partition is like a regular table. Hence in `Pmdindex()`
      we should not got for checking if index is complete or not.
      
      Additionally, If a given index is homogenous or heterogenous needs to be
      decided from the perspective of relation we are querying(such as root or a
      leaf).
      
      Hence the right place of `fPartial` flag is in the relation metadata object
      (CMDRelationGPDB) and not the independent Index metadata object (CMDIndexGPDB).
      This commit makes following changes to support index scan on leaf partitions
      along with partial scans :
      
      Relcache Translator:
      
      In Step1, retrieve the index information on the leaf partition and create a
      list of CMDIndexInfo object which contain the index oid and `fPartial` flag.
      Step 1 is the place where we know what relation we are querying which enable us
      to determine whether or not the index is homogenous from the context of the
      relation.
      
      The relation metadata tag will look like following after this change:
      
      Before:
      ```
      	<dxl:Indexes>
      		<dxl:Index Mdid="0.17159874.1.0"/>
      		<dxl:Index Mdid="0.17159920.1.0"/>
      	</dxl:Indexes>
      ```
      
      After:
      ```
      	<dxl:IndexInfoList>
      		<dxl:IndexInfo Mdid="0.17159874.1.0" IsPartial="true"/>
      		<dxl:IndexInfo Mdid="0.17159920.1.0" IsPartial="false"/>
      	</dxl:IndexInfoList>
      
      ```
      
      A new class `CMDIndexInfo` has been created in ORCA which contains index mdid
      and `fPartial` flag.  For external tables, normal tables and leaf partitions;
      the `fPartial` flag will always be false.
      
      Hence at the end, relcache translator will provide list of indexes defined on
      leaf partitions when they are queried directly with `fPartial` being false
      always. And when root table is queried; the `fPartial` will be set
      appropriately based on the completeness of the index.  ORCA will refer to
      Relation Metadata for fPartial information and not to the indepedent Index
      Metadata Object.
      
      [Ref ##120303669]
      dae6849f
    • H
      Fix a compilation error and some warnings introduced by the recursive CTEs. · 7cb69995
      Heikki Linnakangas 提交于
      * In ruleutils.c, the ereport() was broken. Use elog() instead, like in
        the upstream. (elog() is fine for "can't happen" kind of sanity checks)
      
      * Remove a few unused local variables.
      
      * Add a missing cast from Plan * to Node *.
      7cb69995
    • J
      Un-hide recursive CTE on master [#150861534] · 20152cbf
      Jesse Zhang 提交于
      We will be less conservative and enable by default recursive CTE on
      master, while keeping recursive CTE hidden as we progress on developing
      the feature.
      
      This reverts the following two commits:
      * 280c577a "Set gp_recursive_cte_prototype GUC to true in test"
      * 4d5f8087 "Guard Recursive CTE behind a GUC"
      Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io>
      Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
      20152cbf
    • H
      Force a stand-alone backend to run in utility mode. · 814d82d6
      Heikki Linnakangas 提交于
      In a stand-alone backend ("postgres --single"), you cannot realistically
      expect any of the infrastructure needed for MPP processing to be present.
      Let's force a stand-alone backend to run in utility mode, to make sure
      that we don't try to dispatch queries, participate in distributed
      transactions, or anything like that, in a stand-alone backend.
      
      Fixes github issue #3172, which was one such case where we tried to
      dispatch a SET command in single-user mode, and got all confused.
      814d82d6
    • J
      Evaluate lesser joins to produce best join tree · 6ad94ff2
      Jemish Patel 提交于
      Previously we were setting the value of
      `optimizer_join_arity_for_associativity_commutativity` to very large
      number and so ORCA would spend a very long time evaluating all possible n_way_join
      combinations to come up with the cheapest join tree to use in the plan.
      
      We are reducing this value to `7` as it does not prove to be beneficial
      to spend time and resources to evaluate any more than 7_way_joins in
      trying to find the cheapest join tree.
      6ad94ff2
    • H
      Fix check for using window functions in WHERE clause. · ad093812
      Heikki Linnakangas 提交于
      In commit 563c8c6b, I all but removed the WindowRef.winlevelsup field, but
      missed that checkExprHasWindFuncs() was relying on it in a subtle way. Even
      though winlevelsup was always 0 during parsing, checkExprHasWindFuncs()
      compared it against the "current" nesting level, when it recursed into
      subqueries. Before commit 563c8c6b, it could never actually return true
      within subqueries, because the node->winlevelsup == context_sublevels_up
      condition would never be true. But when I removed the winlevelsup field,
      I erroneosly removed that condition altogether, making it always return true
      in subqueries.
      
      To fix, don't recurse into subqueries in checkExprHasWindFuncs(). There's no
      point in recursing, if it can never return true in a subquery. It doesn't
      recurse in the upstream either.
      
      Add a test case for the failing case (reduced from TPC-DS query 70), to
      the 'bfv_olap' test. While we're at it, remove alternative expected output
      file for 'bfv_olap', because there was no meaningful (i.e not-ignored)
      difference between that and the main expected output file.
      ad093812
    • J
      Set gp_recursive_cte_prototype GUC to true in test · 280c577a
      Jesse Zhang 提交于
      Plus minor corrections in spelling and comments.
      Signed-off-by: NSam Dash <sdash@pivotal.io>
      280c577a
    • K
      Guard Recursive CTE behind a GUC · 4d5f8087
      Kavinder Dhaliwal 提交于
      While Recurisve CTE is still being developed it will be hidden from users by
      the guc gp_recursive_cte_prototype
      Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
      4d5f8087
    • T
      Fix handling of changed-Param signaling for CteScan plan nodes. We were using · 2e598035
      Tom Lane 提交于
      the "cteParam" as a proxy for the possibility that the underlying CTE plan
      depends on outer-level variables or Params, but that doesn't work very well
      because it sometimes causes calling subqueries to be treated as SubPlans when
      they could be InitPlans.  This is inefficient and also causes the outright
      failure exhibited in bug #4902.  Instead, leave the cteParam out of it and
      copy the underlying CTE plan's extParams directly.  Per bug #4902 from
      Marko Tiikkaja.
      
      (cherry picked from commit 9298d2ff)
      2e598035
    • T
      Fix up ruleutils.c for CTE features. The main problem was that · 8e4b2f67
      Tom Lane 提交于
      get_name_for_var_field didn't have enough context to interpret a reference to
      a CTE query's output.  Fixing this requires separate hacks for the regular
      deparse case (pg_get_ruledef) and for the EXPLAIN case, since the available
      context information is quite different.  It's pretty nearly parallel to the
      existing code for SUBQUERY RTEs, though.  Also, add code to make sure we
      qualify a relation name that matches a CTE name; else the CTE will mistakenly
      capture the reference when reloading the rule.
      
      In passing, fix a pre-existing problem with get_name_for_var_field not working
      on variables in targetlists of SubqueryScan plan nodes.  Although latent all
      along, this wasn't a problem until we made EXPLAIN VERBOSE try to print
      targetlists.  To do this, refactor the deparse_context_for_plan API so that
      the special case for SubqueryScan is all on ruleutils.c's side.
      
      (cherry picked from commit 742fd06d)
      8e4b2f67
    • F
      Supporting ReScan of HashJoin with Spilled HashTable (#2770) · 391e9ea7
      foyzur 提交于
      To support RecursiveCTE we need to be able to ReScan a HashJoin as many times as the recursion depth. The HashJoin was previously ReScannable only if it has one memory-resident batch. Now, we support ReScannability for more than one batch. The approach that we took is to keep the inner batch files around for more than the duration of a single iteration of join if we detect that we need to reuse the batch files for rescanning. This can also improve the performance of the subplan as we no longer need to materialize and rebuild the hash table. Rather, we can just reload the batches from their corresponding batch files.
      
      To accomplish reloading of inner batch files, we keep the inner batch files around even if the outer is joined as we wait for the reuse in subsequent rescan (if rescannability is desired).
      
      The corresponding mail thread is here: https://groups.google.com/a/greenplum.org/forum/#!searchin/gpdb-dev/Rescannability$20of$20HashJoin%7Csort:relevance/gpdb-dev/E5kYU0FwJLg/Cqcxx0fOCQAJ
      
      Contributed by Haisheng Yuan, Kavinder Dhaliwal and Foyzur Rahman
      391e9ea7
    • K
      Error out when parsing certain keywords in a recursive CTE · 3f5cf5c7
      Kavinder Dhaliwal 提交于
      Currently Recursive CTE's do not support the following operations in the
      recursive term:
      
      - Group By
      - Window Functions
      - Subqueries with a self-reference
      - Distinct
      
      This commit produces an error in the parsing stage whenever any of the
      above is found in the recursive term of a CTE definition
      3f5cf5c7
    • K
      Improve behavior of WITH RECURSIVE with an untyped literal in the · 5c3d4f55
      Kavinder Dhaliwal 提交于
      non-recursive term.  Per an example from Dickson S. Guedes.
      5c3d4f55
    • K
      Error out when self-ref set operation in recursive term · 2168ecc5
      Kavinder Dhaliwal 提交于
      This commit ensures that if there is ever a self reference to a
      recursive cte within a set operation in the recursive term an error will
      be produced
      
      For example
      
      WITH RECURSIVE x(n) AS (
      	SELECT 1
      	UNION ALL
      	SELECT n+1 FROM (SELECT * FROM x UNION SELECT * FROM z)foo)
      SELECT * FROM x;
      
      Will produce an error, while
      
      WITH RECURSIVE x(n) AS (
      	SELECT 1
      	UNION ALL
      	SELECT n+1 FROM (SELECT * from z UNION SELECT * FROM u)foo, x where foo.x = x.n)
      SELECT * FROM x;
      
      Will not because the set operation does not have a self reference to its
      cte.
      2168ecc5
    • H
      Bring in recursive CTE to GPDB · fd61a4ca
      Haisheng Yuan 提交于
      Planner generates plan that doesn't insert any motion between WorkTableScan and
      its corresponding RecursiveUnion, because currently in GPDB motions are not
      rescannable. For example, a MPP plan for recursive CTE query may look like:
      ```
      Gather Motion 3:1
         ->  Recursive Union
               ->  Seq Scan on department
                     Filter: name = 'A'::text
               ->  Nested Loop
                     Join Filter: d.parent_department = sd.id
                     ->  WorkTable Scan on subdepartment sd
                     ->  Materialize
                           ->  Broadcast Motion 3:3
                                 ->  Seq Scan on department d
      ```
      
      For the current solution, the WorkTableScan is always put on the outer side of
      the top most Join (the recursive part of RecusiveUnion), so that we can safely
      rescan the inner child of join without worrying about the materialization of a
      potential underlying motion. This is a heuristic based plan, not a cost based
      plan.
      
      Ideally, the WorkTableScan can be placed on either side of the join with any
      depth, and the plan should be chosen based on the cost of the recursive plan
      and the number of recursions. But we will leave it for later work.
      
      Note: The hash join is temporarily disabled for plan generation of recursive
      part, because if the hash table spills, the batch file is going to be removed
      as it executes. We have a following story to enable spilled hash table to be
      rescannable.
      
      See discussion at gpdb-dev mailing list:
      https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/s_SoXKlwd6I
      fd61a4ca
  3. 06 9月, 2017 6 次提交
    • H
      Ensure that stable functions in a prepared statement are re-evaluated. · ccca0af2
      Heikki Linnakangas 提交于
      If a prepared statement, or a cached plan for an SPI query e.g. from a
      PL/pgSQL function, contains stable functions, the stable functions were
      incorrectly evaluated only once at plan time, instead of on every execution
      of the plan. This happened to not be a problem in queries that contain any
      parameters, because in GPDB, they are re-planned on every invocation
      anyway, but non-parameter queries were broken.
      
      In the planner, before this commit, when simplifying expressions, we set
      the transform_stable_funcs flag to true for every query, and evaluated all
      stable functions at planning time. Change it to false, and also rename it
      back to 'estimate', as it's called in the upstream. That flag was changed
      back in 2010, in order to allow partition pruning to work with qual
      containing stable functions, like TO_DATE. I think back then, we always
      re-planned every query, so that was OK, but we do cache plans now.
      
      To avoid regressing to worse plans, change eval_const_expressions() so that
      it still does evaluate stable functions, even when the 'estimate' flag is
      off. But when it does so, mark the plan as "one-off", meaning that it must
      be re-planned on every execution. That gives the old, intended, behavior,
      that such plans are indeed re-planned, but it still allows plans that don't
      use stable functions to be cached.
      
      This seems to fix github issue #2661. Looking at the direct dispatch code
      in apply_motion(), I suspect there are more issues like this lurking there.
      There's a call to planner_make_plan_constant(), modifying the target list
      in place, and that happens during planning. But this at least fixes the
      non-direct dispatch cases, and is a necessary step for fixing any remaining
      issues.
      
      For some reason, the query now gets planned *twice* for every invocation.
      That's not ideal, but it was an existing issue for prepared statements with
      parameters, already. So let's deal with that separately.
      ccca0af2
    • H
      Fix reuse of cached plans in user-defined functions. · 2f4d8554
      Heikki Linnakangas 提交于
      CdbDispatchPlan() was making a copy of the plan tree, in the same memory
      context as the old plan tree was in. If the plan came from the plan cache,
      the copy will also be stored in the CachedPlan context. That means that
      every execution of the cached plan will leak a copy of the plan tree in
      the long-lived memory context.
      
      Commit 8b693868 fixed this for cached plans being used directly with
      the extended query protocol, but it did not fix the same issue with plans
      being cached as part of a user-defined function. To fix this properly,
      revert the changes to exec_bind_message, and instead in CdbDispatchPlan,
      make the copy of the plan tree in a short-lived memory context.
      
      Aside from the memory leak, it was never a good idea to change the original
      PlannedStmt's planTree pointer to point to the modified copy of the plan
      tree. That copy has had all the parameters replaced with their current
      values, but on the next execution, we should do that replacement again. I
      think that happened to not be an issue, because we had code elsewhere that
      forced re-planning of all queries anyway. Or maybe it was in fact broken.
      But in any case, stop scribbling on the original PlannedStmt, which might
      live in the plan cache, and make a temporary copy that we can freely
      scribble on in CdbDispatchPlan, that's only used for the dispatch.
      2f4d8554
    • H
      Refactor the way seqserver host and port are stored. · 208a3cad
      Heikki Linnakangas 提交于
      They're not really per-portal settings, so it doesn't make much sense
      to pass them to PortalStart. And most of the callers were passing
      savedSeqServerHost/Port anyway. Instead, set the "current" host and port
      in postgres.c, when we receive them from the QD.
      208a3cad
    • H
      Mark Abort/Commit/Transaction as static again. · 5fac1a58
      Heikki Linnakangas 提交于
      We don't care about old versions of dtrace anymore. Revert the code to
      the way it's in the upstream, to reduce our diff footprint.
      5fac1a58
    • H
      Don't initialize random seed when creating a temporary file. · be894afd
      Heikki Linnakangas 提交于
      That seems like a very random place to do it (sorry for the pun). The
      random seed is initialized at backend startup anyway, that ought to be
      good enough, so just remove the spurious initialization from bfz.c.
      
      In the passing, improve the debug-message to mention which compression
      algorithm was used.
      be894afd
    • H
      Remove unnecessary parse-analysis error position callback. · b325dc8e
      Heikki Linnakangas 提交于
      I guess once upon a time this was needed to get better error messages,
      with error positions, but we rely on the 'location' fields in the parse
      nodes nowadays. Removing this doesn't affect any of the error messages
      memorized in the regression tests, so it's not needed anymore.
      b325dc8e
  4. 05 9月, 2017 5 次提交
  5. 04 9月, 2017 7 次提交