1. 07 9月, 2017 5 次提交
    • K
      Error out when self-ref set operation in recursive term · 2168ecc5
      Kavinder Dhaliwal 提交于
      This commit ensures that if there is ever a self reference to a
      recursive cte within a set operation in the recursive term an error will
      be produced
      
      For example
      
      WITH RECURSIVE x(n) AS (
      	SELECT 1
      	UNION ALL
      	SELECT n+1 FROM (SELECT * FROM x UNION SELECT * FROM z)foo)
      SELECT * FROM x;
      
      Will produce an error, while
      
      WITH RECURSIVE x(n) AS (
      	SELECT 1
      	UNION ALL
      	SELECT n+1 FROM (SELECT * from z UNION SELECT * FROM u)foo, x where foo.x = x.n)
      SELECT * FROM x;
      
      Will not because the set operation does not have a self reference to its
      cte.
      2168ecc5
    • H
      Bring in recursive CTE to GPDB · fd61a4ca
      Haisheng Yuan 提交于
      Planner generates plan that doesn't insert any motion between WorkTableScan and
      its corresponding RecursiveUnion, because currently in GPDB motions are not
      rescannable. For example, a MPP plan for recursive CTE query may look like:
      ```
      Gather Motion 3:1
         ->  Recursive Union
               ->  Seq Scan on department
                     Filter: name = 'A'::text
               ->  Nested Loop
                     Join Filter: d.parent_department = sd.id
                     ->  WorkTable Scan on subdepartment sd
                     ->  Materialize
                           ->  Broadcast Motion 3:3
                                 ->  Seq Scan on department d
      ```
      
      For the current solution, the WorkTableScan is always put on the outer side of
      the top most Join (the recursive part of RecusiveUnion), so that we can safely
      rescan the inner child of join without worrying about the materialization of a
      potential underlying motion. This is a heuristic based plan, not a cost based
      plan.
      
      Ideally, the WorkTableScan can be placed on either side of the join with any
      depth, and the plan should be chosen based on the cost of the recursive plan
      and the number of recursions. But we will leave it for later work.
      
      Note: The hash join is temporarily disabled for plan generation of recursive
      part, because if the hash table spills, the batch file is going to be removed
      as it executes. We have a following story to enable spilled hash table to be
      rescannable.
      
      See discussion at gpdb-dev mailing list:
      https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/s_SoXKlwd6I
      fd61a4ca
    • M
      gp_era: change usage from md5 to sha256 · c13a9177
      Marbin Tan 提交于
      There is a bug with python 2.7 where you can't use hashlib.md5() with a
      system that has fips mode on. python 2.7 will segfault if you run the
      following
      `python -c "import ssl; import hashlib; m = hashlib.md5(); m.update('abc');"`
      
      Use sha256 instead as a workaround of the python 2.7 md5 issue.
      
      gp_era saves the hashed value into a file which gets read when creating
      a new mirror. It's mainly used to see if any segments gets out of
      synced with the new era file.
      c13a9177
    • H
      Add missing subselect test case with CTE [#150338742] · 4765e971
      Haisheng Yuan and Jesse Zhang 提交于
      Commit 038c36b6 from Postgres 8.3 was
      merged into Greenplum in a453004e. Commit 038c36b6 is a partial back
      port of commit 688aafa1 from Postgres 8.4. What's partial about 038c36b6
      is the omission of a test case containing CTE: a whole-row variable can
      refer to either an aliased `FROM` clause, or it can refer to a CTE. The
      CTE case was omitted because upstream 8.3 didn't have CTE.
      
      The non-CTE test case was slightly modified to add an `ORDER BY` clause
      because atmsort is confused by the `ORDER BY` inside the subselect:
      semantically we expect the differ to canonicalize (sort) the output
      before comparison, because sorted order of a subselect is not preserved
      according to SQL standard, but in this case atmsort believes the output
      is already sorted (by virtue of the presence of `ORDER BY`, even though
      it's within the subselect).
      
      Original commit message of 688aafa1 is enclosed:
      
      > Fix whole-row Var evaluation to cope with resjunk columns (again).
      >
      > When a whole-row Var is reading the result of a subquery, we need it to
      > ignore any "resjunk" columns that the subquery might have evaluated for
      > GROUP BY or ORDER BY purposes.  We've hacked this area before, in commit
      > 68e40998, but that fix only covered
      > whole-row Vars of named composite types, not those of RECORD type; and it
      > was mighty klugy anyway, since it just assumed without checking that any
      > extra columns in the result must be resjunk.  A proper fix requires getting
      > hold of the subquery's targetlist so we can actually see which columns are
      > resjunk (whereupon we can use a JunkFilter to get rid of them).  So bite
      > the bullet and add some infrastructure to make that possible.
      >
      > Per report from Andrew Dunstan and additional testing by Merlin Moncure.
      > Back-patch to all supported branches.  In 8.3, also back-patch commit
      > 292176a1, which for some reason I had
      > not done at the time, but it's a prerequisite for this change.
      
      (cherry picked from commit 688aafa15d8d83077c686d2b5b88226528e29840)
      4765e971
    • T
  2. 06 9月, 2017 13 次提交
    • H
      Ensure that stable functions in a prepared statement are re-evaluated. · ccca0af2
      Heikki Linnakangas 提交于
      If a prepared statement, or a cached plan for an SPI query e.g. from a
      PL/pgSQL function, contains stable functions, the stable functions were
      incorrectly evaluated only once at plan time, instead of on every execution
      of the plan. This happened to not be a problem in queries that contain any
      parameters, because in GPDB, they are re-planned on every invocation
      anyway, but non-parameter queries were broken.
      
      In the planner, before this commit, when simplifying expressions, we set
      the transform_stable_funcs flag to true for every query, and evaluated all
      stable functions at planning time. Change it to false, and also rename it
      back to 'estimate', as it's called in the upstream. That flag was changed
      back in 2010, in order to allow partition pruning to work with qual
      containing stable functions, like TO_DATE. I think back then, we always
      re-planned every query, so that was OK, but we do cache plans now.
      
      To avoid regressing to worse plans, change eval_const_expressions() so that
      it still does evaluate stable functions, even when the 'estimate' flag is
      off. But when it does so, mark the plan as "one-off", meaning that it must
      be re-planned on every execution. That gives the old, intended, behavior,
      that such plans are indeed re-planned, but it still allows plans that don't
      use stable functions to be cached.
      
      This seems to fix github issue #2661. Looking at the direct dispatch code
      in apply_motion(), I suspect there are more issues like this lurking there.
      There's a call to planner_make_plan_constant(), modifying the target list
      in place, and that happens during planning. But this at least fixes the
      non-direct dispatch cases, and is a necessary step for fixing any remaining
      issues.
      
      For some reason, the query now gets planned *twice* for every invocation.
      That's not ideal, but it was an existing issue for prepared statements with
      parameters, already. So let's deal with that separately.
      ccca0af2
    • H
      Fix reuse of cached plans in user-defined functions. · 2f4d8554
      Heikki Linnakangas 提交于
      CdbDispatchPlan() was making a copy of the plan tree, in the same memory
      context as the old plan tree was in. If the plan came from the plan cache,
      the copy will also be stored in the CachedPlan context. That means that
      every execution of the cached plan will leak a copy of the plan tree in
      the long-lived memory context.
      
      Commit 8b693868 fixed this for cached plans being used directly with
      the extended query protocol, but it did not fix the same issue with plans
      being cached as part of a user-defined function. To fix this properly,
      revert the changes to exec_bind_message, and instead in CdbDispatchPlan,
      make the copy of the plan tree in a short-lived memory context.
      
      Aside from the memory leak, it was never a good idea to change the original
      PlannedStmt's planTree pointer to point to the modified copy of the plan
      tree. That copy has had all the parameters replaced with their current
      values, but on the next execution, we should do that replacement again. I
      think that happened to not be an issue, because we had code elsewhere that
      forced re-planning of all queries anyway. Or maybe it was in fact broken.
      But in any case, stop scribbling on the original PlannedStmt, which might
      live in the plan cache, and make a temporary copy that we can freely
      scribble on in CdbDispatchPlan, that's only used for the dispatch.
      2f4d8554
    • K
      2f15ab8c
    • H
      Refactor the way seqserver host and port are stored. · 208a3cad
      Heikki Linnakangas 提交于
      They're not really per-portal settings, so it doesn't make much sense
      to pass them to PortalStart. And most of the callers were passing
      savedSeqServerHost/Port anyway. Instead, set the "current" host and port
      in postgres.c, when we receive them from the QD.
      208a3cad
    • H
      Remove useless system_catalog TINC tests. · 0e9380b3
      Heikki Linnakangas 提交于
      All of these queries were wrapped in gpdiff ignore-blocks. What's the
      point?
      0e9380b3
    • H
      Mark Abort/Commit/Transaction as static again. · 5fac1a58
      Heikki Linnakangas 提交于
      We don't care about old versions of dtrace anymore. Revert the code to
      the way it's in the upstream, to reduce our diff footprint.
      5fac1a58
    • C
      66842386
    • J
      Add migrated cs_walrep CCP tests to pipeline ALL group · cdba4245
      Jimmy Yih 提交于
      [ci skip]
      cdba4245
    • J
      Migrate cs-walrepl-multinode from Pulse to CCP · 1b960a73
      Jimmy Yih 提交于
      1b960a73
    • J
      Reorder TINC walrep_2 to fix ordering test failure · 1a6797e9
      Jimmy Yih 提交于
      Also remove some useless Makefile targets.
      1a6797e9
    • J
      Add TINC support with CCP · 2aea56b7
      Jimmy Yih 提交于
      TINC tests are planned to be migrated over to run natively in
      Concourse using CCP. This commit adds the task and script files needed
      to create the new TINC jobs.
      2aea56b7
    • H
      Don't initialize random seed when creating a temporary file. · be894afd
      Heikki Linnakangas 提交于
      That seems like a very random place to do it (sorry for the pun). The
      random seed is initialized at backend startup anyway, that ought to be
      good enough, so just remove the spurious initialization from bfz.c.
      
      In the passing, improve the debug-message to mention which compression
      algorithm was used.
      be894afd
    • H
      Remove unnecessary parse-analysis error position callback. · b325dc8e
      Heikki Linnakangas 提交于
      I guess once upon a time this was needed to get better error messages,
      with error positions, but we rely on the 'location' fields in the parse
      nodes nowadays. Removing this doesn't affect any of the error messages
      memorized in the regression tests, so it's not needed anymore.
      b325dc8e
  3. 05 9月, 2017 8 次提交
  4. 04 9月, 2017 14 次提交
    • D
      Use GetOptions for options parsing in get_ereport · c637a0d0
      Daniel Gustafsson 提交于
      When adding the GPTest version printing it become clear that not
      only was the existing version printing broken, the options parsing
      was too. See sample execution below:
      
        ./get_ereport.pl -version
        Use of uninitialized value $ARGV[0] in pattern match (m//) at ./get_ereport.pl line 99.
        Missing argument in sprintf at ./get_ereport.pl line 163.
        ./get_ereport.pl version 0.
      
      So while in there, this commit fixes both. The options are now
      properly parsed with GetOptions() using pass_through and the version
      printed using the GPTest module.
      c637a0d0
    • D
      Move version printing to common module for Perl code · 7d64740b
      Daniel Gustafsson 提交于
      The perl code in src/test/regress was using a mix of either not
      printing the version, printing it wrong (due to us not using CVS
      anymore) or using a hardcoded string. Implement a new module for
      common test code called GPTest.pm which abstracts this (for now
      it's the only thing it does but this might/will change, hence the
      name). The module is created by autoconf to make it pull in the
      GP_VERSION from there.
      
      While there, simplify the version output in gpdiff which included
      the version of the system diff command - somewhat uninteresting
      information as it's not something that changes very often and just
      cluttered up the output.
      
      This removes the MakeMaker support but since we have no intention
      of packaging these programs into a CPAN module it seems pointless
      to carry that format around.
      7d64740b
    • D
      Refactor Greenplum specific testcode to a new file · 01304a76
      Daniel Gustafsson 提交于
      regress.c is an upstream file, and all Greenplum additions can
      cause conflicts as we merge with PostgreSQL. This refactors all
      GPDB specific code into a new file, regress_gp.c, to keep the
      upstream file as close to upstream as possible (with backports).
      The new file gets compiled and loaded just like regress.c, so
      no change in how it works.
      
      Also remove an unused function, perform rudimentary codereview
      on the Greenplum tests and massage regress.c slightly to make
      it closer to upstream.
      01304a76
    • X
      Refactor copy's target segment computing function · 36f2f6d6
      Xiaoran Wang 提交于
      There are same codes computing target segment in both function CopyFrom
      and CopyFromDispatch. Extract the codes into separate functions.
      Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>
      36f2f6d6
    • H
      Share external URL-mapping code between planner and ORCA. · cbb8ea18
      Heikki Linnakangas 提交于
      Planner and ORCA translator both implemented the same logic, to assign
      external table URIs to segments. But I spotted one case where the logic
      differed:
      
      CREATE EXTERNAL TABLE exttab_with_on_master( i int, j text )
      LOCATION ('file://@hostname@@abs_srcdir@/data/exttab_few_errors.data') ON MASTER FORMAT 'TEXT' (DELIMITER '|');
      
      SELECT * FROM exttab_with_on_master;
      ERROR:  'ON MASTER' is not supported by this protocol yet.
      
      With ORCA you got a less user-friendly error:
      
      set optimizer=on;
      set optimizer_enable_master_only_queries = on;
      postgres=# explain SELECT * FROM exttab_with_on_master;
      ERROR:  External scan error: Could not assign a segment database for external file (CTranslatorDXLToPlStmt.cpp:472)
      
      The immediate cause of that was that commit fcf82234 didn't remember to
      modify the ORCA translator's copy of the same logic. But really, it's silly
      and error-prone to duplicate the code, so modify ORCA to use the same code
      that the planner does.
      cbb8ea18
    • H
      Further refactoring of ParseFuncOrColumn and func_get_detail. · 5a7563cc
      Heikki Linnakangas 提交于
      This backports the new FUNCDETAIL_WINDOWFUNC return code from PostgreSQL
      8.4, and refactors the code to match upstream, as much as feasible. A few
      error scenarios now give better error messages.
      5a7563cc
    • D
      Fix typo in copy.c comment · 102aac6f
      Daniel Gustafsson 提交于
      102aac6f
    • H
      Replace custom expandable buffer implementation with StringInfo. · 38f354aa
      Heikki Linnakangas 提交于
      Simpler that way.
      38f354aa
    • H
      Replace redundant functions with contain_window_function() from PG 8.4 · 232ecfc3
      Heikki Linnakangas 提交于
      We don't need two different functions to check whether an expression
      contains a window function. Replace both with the variant used in
      the upstream, contain_window_function().
      232ecfc3
    • H
      Cherry-pick locate_windowfunc() from PostgreSQL 8.4. · 3fc342d6
      Heikki Linnakangas 提交于
      This allows having error positions for more syntax errors, and reduces
      the diff footprint of our window functions implementation against the
      one in PostgreSQL 8.4.
      3fc342d6
    • X
      Handle the failure in AssignResGroupOnMaster() · 931d5d57
      xiong-gang 提交于
      As AssignResGroupOnMaster() is called before the transaction is
      actually started, so the failure won't cause transaction abort,
      we need handle the error to prevent slot leaking.
      Signed-off-by: NZhenghua Lyu <zlv@pivotal.io>
      931d5d57
    • H
      Cosmetic fixes, to reduce diff vs upstream. · 74fdbc5d
      Heikki Linnakangas 提交于
      Most notably, move the definition of XmlExpr and friends to where they are
      in the upstream.
      74fdbc5d
    • H
      Rename checkExprHasWindFuncs to checkExprHasWindowFuncs to match upstream. · e94a339a
      Heikki Linnakangas 提交于
      Also move the function to where it is in the upstream.
      
      To reduce our diff footprint.
      e94a339a
    • H
      Remove overly-complicated SzAllocate function. · 4212fdad
      Heikki Linnakangas 提交于
      There was only one caller, and it provided no memory pool. The fault
      injection was also unused AFAICS.
      4212fdad