1. 21 9月, 2017 1 次提交
    • B
      Fix multistage aggregation plan targetlists · ad166563
      Bhuvnesh Chaudhary 提交于
      If there are aggregation queries with aliases same as the table actual
      columns and they are propagated further from subqueries and grouping is
      applied on the column alias it may result in inconsistent targetlists
      for aggregation plan causing crash.
      
      	CREATE TABLE t1 (a int) DISTRIBUTED RANDOMLY;
      	SELECT substr(a, 2) as a
      	FROM
      		(SELECT ('-'||a)::varchar as a
      			FROM (SELECT a FROM t1) t2
      		) t3
      	GROUP BY a;
      ad166563
  2. 20 9月, 2017 1 次提交
    • H
      Cherry-pick psprintf() function from upstream, and use it. · 12c7f256
      Heikki Linnakangas 提交于
      This makes constructing strings a lot simpler, and less scary. I changed
      many places in GPDB code to use the new psprintf() function, where it
      seemed to make most sense. A lot of code remains that could use it, but
      there's no urgency.
      
      I avoided changing upstream code to use it yet, even where it would make
      sense, to avoid introducing unnecessary merge conflict.
      
      The biggest changes are in cdbbackup.c, where the code to count the buffer
      sizes was most really complex. I also refactored the #ifdef USE_DDBOOST
      blocks so that there is less repetition between the USE_DDBOOST and
      !USE_DDBOOST blocks, that should make it easier to catch bugs at compilation
      time, that affect the !USE_DDBOOST case, when compiling with USE_DDBOOST,
      and vice versa. I also switched to using pstrdup instead of strdup() in
      a few places, to avoid memory leaks. (Although the way cdbbackup works,
      it would only get launched once per connection, so it didn't really matter
      in practice.)
      12c7f256
  3. 14 9月, 2017 1 次提交
  4. 12 9月, 2017 2 次提交
    • S
      Fix wrong results for NOT-EXISTS sublinks with aggs & LIMIT · d8c7b947
      Shreedhar Hardikar 提交于
      During NOT EXISTS sublink pullup, we create a one-time false filter when
      the sublink contains aggregates without checking for limitcount. However
      in situations where the sublink contains an aggregate with limit 0, we
      should not generate such filter as it produces incorrect results.
      
      Added regress test.
      
      Also, initialize all the members of IncrementVarSublevelsUp_context
      properly.
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      d8c7b947
    • B
      Refactor adding explicit distribution motion logic · 75339b9f
      Bhuvnesh Chaudhary 提交于
      nMotionNodes tracks the number of Motion in a plan, and each
      plan node maintains nMotionNodes. Counting number of Motions in a plan node by
      traversing the tree and adding up nMotionNodes found in nested plans will give
      incorrect number of Motion nodes. So instead of using nMotionNodes, use
      a boolean flag to track if the subtree tree excluding the initplans
      contains a motion node
      75339b9f
  5. 08 9月, 2017 1 次提交
  6. 07 9月, 2017 2 次提交
    • H
      Force a stand-alone backend to run in utility mode. · abedbc23
      Heikki Linnakangas 提交于
      In a stand-alone backend ("postgres --single"), you cannot realistically
      expect any of the infrastructure needed for MPP processing to be present.
      Let's force a stand-alone backend to run in utility mode, to make sure
      that we don't try to dispatch queries, participate in distributed
      transactions, or anything like that, in a stand-alone backend.
      
      Fixes github issue #3172, which was one such case where we tried to
      dispatch a SET command in single-user mode, and got all confused.
      abedbc23
    • H
      Bring in recursive CTE to GPDB · 546fa1f6
      Haisheng Yuan 提交于
      Planner generates plan that doesn't insert any motion between WorkTableScan and
      its corresponding RecursiveUnion, because currently in GPDB motions are not
      rescannable. For example, a MPP plan for recursive CTE query may look like:
      ```
      Gather Motion 3:1
         ->  Recursive Union
               ->  Seq Scan on department
                     Filter: name = 'A'::text
               ->  Nested Loop
                     Join Filter: d.parent_department = sd.id
                     ->  WorkTable Scan on subdepartment sd
                     ->  Materialize
                           ->  Broadcast Motion 3:3
                                 ->  Seq Scan on department d
      ```
      
      For the current solution, the WorkTableScan is always put on the outer side of
      the top most Join (the recursive part of RecusiveUnion), so that we can safely
      rescan the inner child of join without worrying about the materialization of a
      potential underlying motion. This is a heuristic based plan, not a cost based
      plan.
      
      Ideally, the WorkTableScan can be placed on either side of the join with any
      depth, and the plan should be chosen based on the cost of the recursive plan
      and the number of recursions. But we will leave it for later work.
      
      Note: The hash join is temporarily disabled for plan generation of recursive
      part, because if the hash table spills, the batch file is going to be removed
      as it executes. We have a following story to enable spilled hash table to be
      rescannable.
      
      See discussion at gpdb-dev mailing list:
      https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/s_SoXKlwd6I
      546fa1f6
  7. 06 9月, 2017 1 次提交
    • H
      Fix reuse of cached plans in user-defined functions. · 9fc02221
      Heikki Linnakangas 提交于
      CdbDispatchPlan() was making a copy of the plan tree, in the same memory
      context as the old plan tree was in. If the plan came from the plan cache,
      the copy will also be stored in the CachedPlan context. That means that
      every execution of the cached plan will leak a copy of the plan tree in
      the long-lived memory context.
      
      Commit 8b693868 fixed this for cached plans being used directly with
      the extended query protocol, but it did not fix the same issue with plans
      being cached as part of a user-defined function. To fix this properly,
      revert the changes to exec_bind_message, and instead in CdbDispatchPlan,
      make the copy of the plan tree in a short-lived memory context.
      
      Aside from the memory leak, it was never a good idea to change the original
      PlannedStmt's planTree pointer to point to the modified copy of the plan
      tree. That copy has had all the parameters replaced with their current
      values, but on the next execution, we should do that replacement again. I
      think that happened to not be an issue, because we had code elsewhere that
      forced re-planning of all queries anyway. Or maybe it was in fact broken.
      But in any case, stop scribbling on the original PlannedStmt, which might
      live in the plan cache, and make a temporary copy that we can freely
      scribble on in CdbDispatchPlan, that's only used for the dispatch.
      9fc02221
  8. 05 9月, 2017 1 次提交
    • N
      Simplify tuple serialization in Motion nodes. · 11e4aa66
      Ning Yu 提交于
      * Simplify tuple serialization in Motion nodes.
      
      There is a fast-path for tuples that contain no toasted attributes,
      which writes the raw tuple almost as is. However, the slow path is
      significantly more complicated, calling each attribute's binary
      send/receive functions (although there's a fast-path for a few
      built-in datatypes). I don't see any need for calling I/O functions
      here. We can just write the raw Datum on the wire. If that works
      for tuples with no toasted attributes, it should work for all tuples,
      if we just detoast any toasted attributes first.
      
      This makes the code a lot simpler, and also fixes a bug with data
      types that don't have a binary send/receive routines. We used to
      call the regular (text) I/O functions in that case, but didn't handle
      the resulting cstring correctly.
      
      Diagnosis and test case by Foyzur Rahman.
      Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io>
      Signed-off-by: NNing Yu <nyu@pivotal.io>
      11e4aa66
  9. 01 9月, 2017 3 次提交
    • D
      Fix Copyright and file headers across the tree · ed7414ee
      Daniel Gustafsson 提交于
      This bumps the copyright years to the appropriate years after not
      having been updated for some time.  Also reformats existing code
      headers to match the upstream style to ensure consistency.
      ed7414ee
    • D
      Set errcode on AO checksum errors · 60f9ac3d
      Daniel Gustafsson 提交于
      The missing errcode makes the ereport call include the line number
      of the invocation from the .c file, which not only isn't too useful
      but cause the tests to fail on adding/removing code from the file.
      60f9ac3d
    • D
      Always read returnvalue from stat calls · 742e5415
      Daniel Gustafsson 提交于
      {f}stat() can fail and reading the stat buffer without checking for
      status is bad hygiene. Ensure to always test return value and take
      the appropriate error path in case of stat error.
      742e5415
  10. 30 8月, 2017 2 次提交
    • H
      Remove misc unused code. · 37d2a5b3
      Heikki Linnakangas 提交于
      'nuff said.
      37d2a5b3
    • H
      Eliminate '#include "utils/resowner.h"' from lock.h · 6b25c0a8
      Heikki Linnakangas 提交于
      It was getting in the way of backporting commit 9b1b9446f5 from PostgreSQL,
      which added an '#include "storage/lock.h"' to resowner.h, forming a cycle.
      
      The include was only needed for the decalaration of awaitedOwner global
      variable. Replace "ResourceOwner" with the equivalent "struct
      ResourceOwnerData *" to avoid it.
      
      This revealed a bunch of other files that were relying on resowner.h
      being indirectly included through lock.h. Include resowner.h directly
      in those files.
      
      The ResPortalIncrement.owner field was not used for anything, so instead
      of including resowner.h in that file, just remove the field that needed
      it.
      6b25c0a8
  11. 29 8月, 2017 1 次提交
    • P
      Perform resource group operations only when it's initialized · 939208b5
      Pengzhou Tang 提交于
      The resource group is enabled but not initialized on auxiliary processes
      and special backends like ftsprobe and filerep, previously we performed
      resource group operations no matter resource group is initialized or not
      which leads to some unexpected error.
      939208b5
  12. 28 8月, 2017 3 次提交
    • D
      Avoid side effects in assertions · 288dde95
      Daniel Gustafsson 提交于
      An assertion with a side effect may alter the main codepath when
      the tree is built with --enable-cassert, which in turn may lead
      to subtle differences due compiler optimizations and/or straight
      bugs in the side effect. Rewrite the assertions without side
      effects to leave the main codepath intact.
      288dde95
    • A
      Optimize `COPY TO ON SEGMENT` result processing · 266355d3
      Adam Lee 提交于
      Don't send nonsense '\n' characters just for counting, let segments
      report how many rows are processed instead.
      Signed-off-by: NMing LI <mli@apache.org>
      266355d3
    • X
      Check distribution key restriction for `COPY FROM ON SEGMEN` · 65321259
      Xiaoran Wang 提交于
      When use command `COPY FROM ON SEGMENT`, we copy data from
      local file to the table on the segment directly. When copying
      data, we need to apply the distribution policy on the record to compute
      the target segment. If the target segment ID isn't equal to
      current segment ID, we will report error to keep the distribution
      key restriction.
      
      Because the segment has no meta data info about table distribution policy and
      partition policy,we copy the distribution policy of main table from
      master to segment in the query plan. When the parent table and
      partitioned sub table has different distribution policy, it is difficult
      to check all the distribution key restriction in all sub tables. In this
      case , we will report error.
      
      In case of the partitioned table's distribution policy is
      RANDOMLY and different from the parent table, user can use GUC value
      `gp_enable_segment_copy_checking` to disable this check.
      
      Check the distribution key restriction as follows:
      
      1) Table isn't partioned:
          Compute the data target segment.If the data doesn't belong the
          segment, will report error.
      
      2) Table is partitioned and the distribution policy of partitioned table
      as same as the main table:
          Compute the data target segment.If the data doesn't belong
          the segment, will report error.
      
      3) Table is partitioned and the distribution policy of partitioned
      table is different from main table:
          Not support to check ,report error.
      Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>
      Signed-off-by: NMing LI <mli@apache.org>
      Signed-off-by: NAdam Lee <ali@pivotal.io>
      65321259
  13. 19 8月, 2017 3 次提交
    • A
      Calculate checksum during persistent table reset_all. · 19082615
      Ashwin Agrawal 提交于
      When `gp_persistent_reset_all()` is called, gp_relation_node index is truncated
      and while writing meta-page checksum calculation was missed. Ideally, following
      `gp_persistent_build_all()` fixes the same and correctly builds the index and
      calculates the checksum file. Better to get proper message like "ERROR: Did not
      find gp_relation_node entry for relation name....." on operations after
      reset_all than page verification failed.
      19082615
    • J
      Add assert checking to "UnderLock" functions · af8dc9c2
      Jacob Champion 提交于
      Signed-off-by: NTaylor Vesely <tvesely@pivotal.io>
      af8dc9c2
    • T
      Introduce LW locks to protect filespace and tablespace hash tables. · 677c3eac
      Taylor Vesely 提交于
      Adds two new LW locks to protect filespace and tablespace hash tables
      to disambiguate them from PersistentObjLock. Previously, PersistentObjLock was
      overloaded to both protect these hash tables along with persistent heap tables.
      If two backends try to flush the same dirty buffer, a deadlock could
      potentially arise in which backend 1 holds PersistentObjLock and requests
      io_in_progress lock of the buffer to be evicted.  Backend 2 holds
      io_in_progress lock on the same buffer and attempts to obtain file path.
      Because the file path is in hash tables protected by PersistentObjLock, backend
      2 requests PersistentObjLock and blocks due to backend 1.
      Signed-off-by: NAsim R P <apraveen@pivotal.io>
      677c3eac
  14. 18 8月, 2017 1 次提交
    • H
      Fix check for all-Const target list, in single-row-insert dispatch. · 2b497f09
      Heikki Linnakangas 提交于
      If you have a simple insert, like "INSERT INTO foo VALUES ('bar')", we
      evalute the target list (i.e. 'bar') in the master, and route the insert to
      the correct partition and segment, based on the constants. However, there
      was a mismatch between allConstantValuesClause(), and what its callers
      assumed. The callers assumed that if allConstantValuesClause() returns
      true, the target list contains only Const nodes. But in reality,
      allConstantValuesClause() also returned true, if there were non-volatile
      function expressions in the target list, that could be evaluated, and would
      then produce a constant result.
      
      Fix the mismatch, by making allConstantValuesClause() be more strict, so
      so that it only returns true if all the entries are true Consts.
      
      Fixes github issue #285, reported by @liruto.
      2b497f09
  15. 17 8月, 2017 1 次提交
    • H
      Remove unusued Plan.plan_parent_node_id field. · 5c155847
      Heikki Linnakangas 提交于
      This allows removing all the code in CTranslatorDXLToPlStmt that tracked
      the parent of each call.
      
      I found the plan node IDs awkward, when I was hacking on
      CTranslatorDXLToPlStmt. I tried to make a change where a function would
      construct a child Plan node first, and a Result node on top of that, but
      only if necessary, depending on the kind of child plan. The parent plan
      node IDs made it impossible to construct a part of Plan tree like that, in
      a bottom-up fashion, because you always had to pass the parent's ID when
      constructing a child node. Now that is possible.
      5c155847
  16. 12 8月, 2017 1 次提交
    • H
      Inline helper function. · f29262bd
      Heikki Linnakangas 提交于
      The pattern of palloc'ing a Datum and isnull array is ubiquitous, no point
      in hiding it behind a function, especially when the function only has one
      caller.
      f29262bd
  17. 11 8月, 2017 7 次提交
  18. 09 8月, 2017 7 次提交
    • P
      Include gp-libpq-int.h in cdbcopy.c · fdb5d6c3
      Pengzhou Tang 提交于
      cf7cddf7 has conflict with cc38f526, struct PQExpBufferData is
      needed by structure SegmentDatabaseDescriptor, so bring gp-libpq-int.h back
      fdb5d6c3
    • P
      Do not include gp-libpq-fe.h and gp-libpq-int.h in cdbconn.h · cf7cddf7
      Pengzhou Tang 提交于
      The whole cdb directory was shipped to end users and all header files
      that cdb*.h included are also need to be shipped to make checkinc.py
      pass. However, exposing gp_libpq_fe/*.h will confuse customer because
      they are almost the same as libpq/*, as Heikki's suggestion, we should
      keep gp_libpq_fe/* unchanged. So to make system work, we include
      gp-libpg-fe.h and gp-libpq-int.h directly in c files that need them
      cf7cddf7
    • P
      Add debug info for interconnect network timeout · 9a9cd48b
      Pengzhou Tang 提交于
      It was very difficult to verify if interconnect is stucked in resending
      phase or if there is udp resending latency within interconnect. To improve
      it, this commit record a debug message every Gp_interconnect_debug_retry_interval
      times when gp_log_interconnect is set to DEBUG.
      9a9cd48b
    • H
      Remove list_find_* functions. · 0340f543
      Heikki Linnakangas 提交于
      They don't exist in the upstream. All but one of the callers actually just
      needed list_member_*().
      0340f543
    • H
      Remove unnecessary #includes · 4d573999
      Heikki Linnakangas 提交于
      4d573999
    • H
      Remove unused function. · 9637bb1d
      Heikki Linnakangas 提交于
      9637bb1d
    • H
      Replace special "QE details" protocol message with standard ParameterStatus msg. · d85257f7
      Heikki Linnakangas 提交于
      This gets rid of the GPDB-specific "QE details" message, that was only sent
      once at QE backend startup, to notify the QD about the motion listener port
      of the QE backend. Use a standard ParameterStatus message instead, pretending
      that there is a GUC called "qe_listener_port". This reduces the difference
      between the gp_libpq_fe copy of libpq, and libpq proper. I have a dream that
      one day we will start using the standard libpq also for QD-QE communication,
      and get rid of the special gp_libpq_fe copy altogether, and this is a small
      step in that direction.
      
      In the passing, change the type of Gp_listener_port variable from signed to
      unsigned. Gp_listener_port actually holds two values: the TCP and UDP
      listener ports, and there is bit-shifting code to store those two 16-bit
      port numbers in the single 32-bit integer. But the bit-shifting was a bit
      iffy, on a signed integer. Making it unsigned makes it more clear what's
      happening.
      d85257f7
  19. 08 8月, 2017 1 次提交
    • H
      Remove unnecessary use of PQExpBuffer. · cc38f526
      Heikki Linnakangas 提交于
      StringInfo is more appropriate in backend code. (Unless the buffer needs to
      be used in a thread.)
      
      In the passing, rename the 'conn' static variable in cdbfilerepconnclient.c.
      It seemed overly generic.
      cc38f526