1. 03 2月, 2018 1 次提交
    • A
      Vacuum fix for ERROR updated tuple is already HEAP_MOVED_OFF. · aa5798a9
      Ashwin Agrawal 提交于
      `repair_frag()` should consult distributed snapshot
      (`localXidSatisfiesAnyDistributedSnapshot()`) while following and moving chains
      of updated tuples. Vacuum consults distributed snapshot
      (`localXidSatisfiesAnyDistributedSnapshot()`) to find which tuples can be
      deleted and not. For RECENTLY_DEAD tuples it used to make decision just based on
      comparison with OldestXmin which is not sufficient and even there distributed
      snapshot must be checked.
      
      Fixes #4298
      
      (cherry picked from commit 313ab24f)
      aa5798a9
  2. 02 2月, 2018 2 次提交
  3. 31 1月, 2018 1 次提交
    • H
      Fix dispatching of queries with record-type parameters. · f24b9ab5
      Heikki Linnakangas 提交于
      This fixes the "ERROR:  record type has not been registered" error, when
      a record-type variable is used in a query inside a PL/pgSQL function.
      This is essentially the same problem we battled with in Motion nodes
      in GPDB 5, and added the whole tuple remapper to deal with it. Only this
      time, the problem is with record Datums being dispatched from QD to QE,
      as Params, rather than with record Datums being transferred across a
      Motion.
      
      To fix, send the transient record type cache along with the query
      parameters, if there are any of the parameters are transient record types.
      This is a bit inefficient, as the transient record type cache can be quite
      large. A more fine-grained approach would be to send only those record
      types that are actually used in the parameters, but more code would be
      required to figure that out. This will do for now.
      
      Refactor the serialization and deserialization of the query parameters, to
      leverage the outfast/readfast functions.
      
      Backport to 5X_STABLE. This changes the wire format of query parameters, so
      this requires the QD and QE to be on the same minor version. But this does
      not change the on-disk format, or the numbering of existing Node tags.
      
      Fixes github issue #4444.
      f24b9ab5
  4. 30 1月, 2018 1 次提交
    • W
      Alloc Instrumentation in Shmem · 9a0954e4
      Wang Hao 提交于
      On postmaster start, additional space in Shmem is allocated for Instrumentation
      slots and a header. The number of slots is controlled by a cluster level GUC,
      default is 5MB (approximate 30K slots). The default number is estimated by 250
      concurrent queries * 120 nodes per query. If the slots are exhausted,
      instruments are allocated in local memory as fallback.
      
      These slots are organized as a free list:
        - Header points to the first free slot.
        - Each free slot points to next free slot.
        - The last free slot's next pointer is NULL.
      
      ExecInitNode calls GpInstrAlloc to pick an empty slot from the free list:
        - The free slot pointed by the header is picked.
        - The picked slot's next pointer is assigned to the header.
        - A spin lock on the header to prevent concurrent writing.
        - When GUC gp_enable_query_metrics is off, Instrumentation will
          be allocated in local memory.
      
      Slots are recycled by resource owner callback function.
      
      Benchmark result with TPC-DS shows performance impact by this commit is less than 0.1%
      To improve performance of instrumenting, following optimizations are added:
        - Introduce instrument_option to skip CDB info collection
        - Optimize tuplecount in Instrumentation from double to uint64
        - Replace instrument tuple entry/exit function with macro
        - Add need_timer to Instrumentation, to allow eliminating of timing overhead.
          This is porting part of upstream commit:
      ------------------------------------------------------------------------
      commit af7914c6
      Author: Robert Haas <rhaas@postgresql.org>
      Date:   Tue Feb 7 11:23:04 2012 -0500
      
      Add TIMING option to EXPLAIN, to allow eliminating of timing overhead.
      ------------------------------------------------------------------------
      
      Author: Wang Hao <haowang@pivotal.io>
      Author: Zhang Teng <tezhang@pivotal.io>
      9a0954e4
  5. 22 1月, 2018 2 次提交
  6. 18 1月, 2018 1 次提交
  7. 28 12月, 2017 1 次提交
    • A
      Able to cancel COPY PROGRAM ON SEGMENT if the program hangs · ecd44052
      Adam Lee 提交于
      There are two places that QD keep trying to get data, ignore SIGINT, and
      not send signal to QEs. If the program on segment has no input/output,
      copy command hangs.
      
      To fix it, this commit:
      
      1, lets QD wait connections able to be read before PQgetResult(), and
      cancels queries if gets interrupt signals while waiting
      2, sets DF_CANCEL_ON_ERROR when dispatch in cdbcopy.c
      3, completes copy error handling
      
      -- prepare
      create table test(t text);
      copy test from program 'yes|head -n 655360';
      
      -- could be canceled
      copy test from program 'sleep 100 && yes test';
      copy test from program 'sleep 100 && yes test<SEGID>' on segment;
      copy test from program 'yes test';
      copy test to '/dev/null';
      copy test to program 'sleep 100 && yes test';
      copy test to program 'sleep 100 && yes test<SEGID>' on segment;
      
      -- should fail
      copy test from program 'yes test<SEGID>' on segment;
      copy test to program 'sleep 0.1 && cat > /dev/nulls';
      copy test to program 'sleep 0.1<SEGID> && cat > /dev/nulls' on segment;
      
      (cherry picked from commit 25c70407dc038a2c56ccb37a3540c9af6a99e6e4)
      ecd44052
  8. 07 12月, 2017 1 次提交
    • P
      Resent a cancel/finish signal if QE didn't respond for a long time. · 58492956
      Pengzhou Tang 提交于
      Previously, dispatcher only send cancel/finish signal to QEs once, so if
      the signal arrives faster than the query or is omitted by the secure_read(),
      the QE may have no chance to quit if the QE is assigned to execute a MOTION
      node and it's peer has been canceled.
      
      This fixes issue #3950
      58492956
  9. 09 11月, 2017 1 次提交
  10. 08 11月, 2017 1 次提交
    • P
      Do a force flush before checking the result of a connection · 4661b620
      Pengzhou Tang 提交于
      Previously, to speed up dispatching, cdbdisp_dispatchToGang_async
      and cdbdisp_waitDispatchFinish_async are designed to use nonblock
      flush to dispatch commands in bulk, however, risks exist that some
      commands are not fully dispatched in corner error cases, so QD must
      do a force flush before handling such connections, otherwise QD will
      get stuck.
      4661b620
  11. 06 11月, 2017 1 次提交
    • A
      Symlink libpq files for backend and optimize the makefiles · 5bc459c1
      Adam Lee 提交于
      src/backend's makefiles have its own rules, this commit symlinks libpq
      files for backend to leverage them, canonical and much simpler.
      
      What are the rules?
      
      1, src/backend compile SUBDIR, list OBJS in sub-directories'
      objfiles.txt, then link them all into postgres.
      
      2, mock.mk links all OBJS, but filters out the objects which mocked by
      cases.
      
      (cherry picked from commit 1e9cd7d9)
      5bc459c1
  12. 02 11月, 2017 1 次提交
    • H
      Don't pass around MemTuples as HeapTuples. · 06e7a09f
      Heikki Linnakangas 提交于
      Invent a new pointer type, GenericTuple, for when we might be dealing with
      either a MemTuple or a HeapTuple. The old practice of holding a MemTuple
      in a HeapTuple-typed variable, or passing a MemTuple to a function that's
      declared to take a HeapTuple parameter, seemed dangerous.
      06e7a09f
  13. 30 10月, 2017 2 次提交
    • A
      Retire gp_libpq_fe part 2, changing including path · 6f650ed5
      Adam Lee 提交于
      Signed-off-by: NAdam Lee <ali@pivotal.io>
      6f650ed5
    • A
      Retire gp_libpq_fe part 1, libpq itself · 8f9fcd73
      Adam Lee 提交于
          commit b0328d5631088cca5f80acc8dd85b859f062ebb0
          Author: mcdevc <a@b>
          Date:   Fri Mar 6 16:28:45 2009 -0800
      
              Separate our internal libpq front end from the client libpq library
              upgrade libpq to the latest to pick up bug fixes and support for more
              client authentication types (GSSAPI, KRB5, etc)
              Upgrade all files dependent on libpq to handle new version.
      
      Above is the initial commit of gp_libpq_fe, seems no good reasons still
      having it.
      
      Key things this PR do:
      
      1, remove the gp_libpq_fe directory.
      2, build libpq source codes into two versions, for frontend and backend,
      check the macro FRONTEND.
      3, libpq for backend still bypasses local authentication, SSL and some
      environment variables, and these are the whole differences.
      
      (back ported from 510a20b6, with some
      fixes for SUSE and Windows)
      Signed-off-by: NAdam Lee <ali@pivotal.io>
      8f9fcd73
  14. 27 10月, 2017 1 次提交
    • A
      Fix has_external_partition() incompatible pointer type warnings · 15acffab
      Adam Lee 提交于
      cdbpartition.c: In function ‘rel_has_external_partition’:
      cdbpartition.c:489:32: warning: passing argument 1 of ‘has_external_partition’ from incompatible pointer type [-Wincompatible-pointer-types]
        return has_external_partition(n->rules);
                                      ^
      cdbpartition.c:106:13: note: expected ‘PartitionRule * {aka struct PartitionRule *}’ but argument is of type ‘List * {aka struct List *}’
       static bool has_external_partition(PartitionRule *rules);
                   ^~~~~~~~~~~~~~~~~~~~~~
      15acffab
  15. 26 10月, 2017 1 次提交
    • X
      Support exchange sub partition with external table · 5a415e5e
      Xiaoran Wang 提交于
      For example:
      
          alter tableA alter partition partiton1
          exchange partition partition1_subpartition1 with table external_table;
      
      partition1 is the first level partition of tableA.
      partition1_subpartition1 is a sub partition of partition1.
      
      1) The routine ATPExecPartExchange in tablecmds.c searches
      the target partition's table through parameters following
      the key 'alter'. Then no matter what level the partition is, the
      routine exchanges partition with the new table in the same way.
      So exchanging sub partition with external table works well now.
      
      2) Queries against partitioned tables that are altered to
      use an external table as a leaf child partition fall back
      to the legacy query optimizer. By calling routine
      rel_has_external_partition in cdbpartition.c, GPORCA knows
      if the table has an external partition. However, the routine
      just searchs the first level partition child and it returns
      false when the sub partition is an external table. Fix it in this
      commit though searching every partition in table.
      Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>
      5a415e5e
  16. 10 10月, 2017 1 次提交
    • B
      Fix multistage aggregation final target list · 219f226e
      Bhuvnesh Chaudhary 提交于
      If a target list entry is found under a relabelnode, the newly created
      var node should be nested inside the relabelnode if the vartype
      of the var node is different than the resulttype of the relablenode.
      Otherwise, the cast information is lost and executor will complain type mismatch
      	```sql
      	CREATE TABLE t1 (a varchar, b character varying) DISTRIBUTED RANDOMLY;
      	SELECT array_agg(f)  FROM (SELECT b::text as f FROM t1 GROUP BY b) q;
      	ERROR:  attribute 1 has wrong type (execQual.c:763)  (seg0 slice2 127.0.0.1:25432 pid=7064)
      	DETAIL:  Table has type character varying, but query expects text.
      	```
      219f226e
  17. 09 10月, 2017 1 次提交
    • R
      Decouple GUC max_resource_groups and max_connections. · 2fe7c8d2
      Richard Guo 提交于
      Previously there is a restriction on GUC 'max_resource_groups'
      that it cannot be larger than 'max_connections'.
      This restriction may cause gpdb fail to start if the two GUCs
      are not set properly.
      We decide to decouple these two GUCs and set a hard limit
      of 100 for 'max_resource_groups'.
      2fe7c8d2
  18. 26 9月, 2017 1 次提交
  19. 25 9月, 2017 1 次提交
    • A
      Report COPY PROGRAM's error output · 8e23d64e
      Adam Lee 提交于
      Replace popen() with popen_with_stderr() which is used in external web
      table also to collect the stderr output of program.
      
      Since popen_with_stderr() forks a `sh` process, it's almost always
      sucessful, this commit catches errors happen in fwrite().
      
      Also passes variables as the same as what external web table does.
      Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>
      (cherry picked from commit 2b51c16b)
      8e23d64e
  20. 21 9月, 2017 1 次提交
    • B
      Fix multistage aggregation plan targetlists · ad166563
      Bhuvnesh Chaudhary 提交于
      If there are aggregation queries with aliases same as the table actual
      columns and they are propagated further from subqueries and grouping is
      applied on the column alias it may result in inconsistent targetlists
      for aggregation plan causing crash.
      
      	CREATE TABLE t1 (a int) DISTRIBUTED RANDOMLY;
      	SELECT substr(a, 2) as a
      	FROM
      		(SELECT ('-'||a)::varchar as a
      			FROM (SELECT a FROM t1) t2
      		) t3
      	GROUP BY a;
      ad166563
  21. 20 9月, 2017 1 次提交
    • H
      Cherry-pick psprintf() function from upstream, and use it. · 12c7f256
      Heikki Linnakangas 提交于
      This makes constructing strings a lot simpler, and less scary. I changed
      many places in GPDB code to use the new psprintf() function, where it
      seemed to make most sense. A lot of code remains that could use it, but
      there's no urgency.
      
      I avoided changing upstream code to use it yet, even where it would make
      sense, to avoid introducing unnecessary merge conflict.
      
      The biggest changes are in cdbbackup.c, where the code to count the buffer
      sizes was most really complex. I also refactored the #ifdef USE_DDBOOST
      blocks so that there is less repetition between the USE_DDBOOST and
      !USE_DDBOOST blocks, that should make it easier to catch bugs at compilation
      time, that affect the !USE_DDBOOST case, when compiling with USE_DDBOOST,
      and vice versa. I also switched to using pstrdup instead of strdup() in
      a few places, to avoid memory leaks. (Although the way cdbbackup works,
      it would only get launched once per connection, so it didn't really matter
      in practice.)
      12c7f256
  22. 14 9月, 2017 1 次提交
  23. 12 9月, 2017 2 次提交
    • S
      Fix wrong results for NOT-EXISTS sublinks with aggs & LIMIT · d8c7b947
      Shreedhar Hardikar 提交于
      During NOT EXISTS sublink pullup, we create a one-time false filter when
      the sublink contains aggregates without checking for limitcount. However
      in situations where the sublink contains an aggregate with limit 0, we
      should not generate such filter as it produces incorrect results.
      
      Added regress test.
      
      Also, initialize all the members of IncrementVarSublevelsUp_context
      properly.
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      d8c7b947
    • B
      Refactor adding explicit distribution motion logic · 75339b9f
      Bhuvnesh Chaudhary 提交于
      nMotionNodes tracks the number of Motion in a plan, and each
      plan node maintains nMotionNodes. Counting number of Motions in a plan node by
      traversing the tree and adding up nMotionNodes found in nested plans will give
      incorrect number of Motion nodes. So instead of using nMotionNodes, use
      a boolean flag to track if the subtree tree excluding the initplans
      contains a motion node
      75339b9f
  24. 08 9月, 2017 1 次提交
  25. 07 9月, 2017 2 次提交
    • H
      Force a stand-alone backend to run in utility mode. · abedbc23
      Heikki Linnakangas 提交于
      In a stand-alone backend ("postgres --single"), you cannot realistically
      expect any of the infrastructure needed for MPP processing to be present.
      Let's force a stand-alone backend to run in utility mode, to make sure
      that we don't try to dispatch queries, participate in distributed
      transactions, or anything like that, in a stand-alone backend.
      
      Fixes github issue #3172, which was one such case where we tried to
      dispatch a SET command in single-user mode, and got all confused.
      abedbc23
    • H
      Bring in recursive CTE to GPDB · 546fa1f6
      Haisheng Yuan 提交于
      Planner generates plan that doesn't insert any motion between WorkTableScan and
      its corresponding RecursiveUnion, because currently in GPDB motions are not
      rescannable. For example, a MPP plan for recursive CTE query may look like:
      ```
      Gather Motion 3:1
         ->  Recursive Union
               ->  Seq Scan on department
                     Filter: name = 'A'::text
               ->  Nested Loop
                     Join Filter: d.parent_department = sd.id
                     ->  WorkTable Scan on subdepartment sd
                     ->  Materialize
                           ->  Broadcast Motion 3:3
                                 ->  Seq Scan on department d
      ```
      
      For the current solution, the WorkTableScan is always put on the outer side of
      the top most Join (the recursive part of RecusiveUnion), so that we can safely
      rescan the inner child of join without worrying about the materialization of a
      potential underlying motion. This is a heuristic based plan, not a cost based
      plan.
      
      Ideally, the WorkTableScan can be placed on either side of the join with any
      depth, and the plan should be chosen based on the cost of the recursive plan
      and the number of recursions. But we will leave it for later work.
      
      Note: The hash join is temporarily disabled for plan generation of recursive
      part, because if the hash table spills, the batch file is going to be removed
      as it executes. We have a following story to enable spilled hash table to be
      rescannable.
      
      See discussion at gpdb-dev mailing list:
      https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/s_SoXKlwd6I
      546fa1f6
  26. 06 9月, 2017 1 次提交
    • H
      Fix reuse of cached plans in user-defined functions. · 9fc02221
      Heikki Linnakangas 提交于
      CdbDispatchPlan() was making a copy of the plan tree, in the same memory
      context as the old plan tree was in. If the plan came from the plan cache,
      the copy will also be stored in the CachedPlan context. That means that
      every execution of the cached plan will leak a copy of the plan tree in
      the long-lived memory context.
      
      Commit 8b693868 fixed this for cached plans being used directly with
      the extended query protocol, but it did not fix the same issue with plans
      being cached as part of a user-defined function. To fix this properly,
      revert the changes to exec_bind_message, and instead in CdbDispatchPlan,
      make the copy of the plan tree in a short-lived memory context.
      
      Aside from the memory leak, it was never a good idea to change the original
      PlannedStmt's planTree pointer to point to the modified copy of the plan
      tree. That copy has had all the parameters replaced with their current
      values, but on the next execution, we should do that replacement again. I
      think that happened to not be an issue, because we had code elsewhere that
      forced re-planning of all queries anyway. Or maybe it was in fact broken.
      But in any case, stop scribbling on the original PlannedStmt, which might
      live in the plan cache, and make a temporary copy that we can freely
      scribble on in CdbDispatchPlan, that's only used for the dispatch.
      9fc02221
  27. 05 9月, 2017 1 次提交
    • N
      Simplify tuple serialization in Motion nodes. · 11e4aa66
      Ning Yu 提交于
      * Simplify tuple serialization in Motion nodes.
      
      There is a fast-path for tuples that contain no toasted attributes,
      which writes the raw tuple almost as is. However, the slow path is
      significantly more complicated, calling each attribute's binary
      send/receive functions (although there's a fast-path for a few
      built-in datatypes). I don't see any need for calling I/O functions
      here. We can just write the raw Datum on the wire. If that works
      for tuples with no toasted attributes, it should work for all tuples,
      if we just detoast any toasted attributes first.
      
      This makes the code a lot simpler, and also fixes a bug with data
      types that don't have a binary send/receive routines. We used to
      call the regular (text) I/O functions in that case, but didn't handle
      the resulting cstring correctly.
      
      Diagnosis and test case by Foyzur Rahman.
      Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io>
      Signed-off-by: NNing Yu <nyu@pivotal.io>
      11e4aa66
  28. 01 9月, 2017 3 次提交
    • D
      Fix Copyright and file headers across the tree · ed7414ee
      Daniel Gustafsson 提交于
      This bumps the copyright years to the appropriate years after not
      having been updated for some time.  Also reformats existing code
      headers to match the upstream style to ensure consistency.
      ed7414ee
    • D
      Set errcode on AO checksum errors · 60f9ac3d
      Daniel Gustafsson 提交于
      The missing errcode makes the ereport call include the line number
      of the invocation from the .c file, which not only isn't too useful
      but cause the tests to fail on adding/removing code from the file.
      60f9ac3d
    • D
      Always read returnvalue from stat calls · 742e5415
      Daniel Gustafsson 提交于
      {f}stat() can fail and reading the stat buffer without checking for
      status is bad hygiene. Ensure to always test return value and take
      the appropriate error path in case of stat error.
      742e5415
  29. 30 8月, 2017 2 次提交
    • H
      Remove misc unused code. · 37d2a5b3
      Heikki Linnakangas 提交于
      'nuff said.
      37d2a5b3
    • H
      Eliminate '#include "utils/resowner.h"' from lock.h · 6b25c0a8
      Heikki Linnakangas 提交于
      It was getting in the way of backporting commit 9b1b9446f5 from PostgreSQL,
      which added an '#include "storage/lock.h"' to resowner.h, forming a cycle.
      
      The include was only needed for the decalaration of awaitedOwner global
      variable. Replace "ResourceOwner" with the equivalent "struct
      ResourceOwnerData *" to avoid it.
      
      This revealed a bunch of other files that were relying on resowner.h
      being indirectly included through lock.h. Include resowner.h directly
      in those files.
      
      The ResPortalIncrement.owner field was not used for anything, so instead
      of including resowner.h in that file, just remove the field that needed
      it.
      6b25c0a8
  30. 29 8月, 2017 1 次提交
    • P
      Perform resource group operations only when it's initialized · 939208b5
      Pengzhou Tang 提交于
      The resource group is enabled but not initialized on auxiliary processes
      and special backends like ftsprobe and filerep, previously we performed
      resource group operations no matter resource group is initialized or not
      which leads to some unexpected error.
      939208b5
  31. 28 8月, 2017 2 次提交
    • D
      Avoid side effects in assertions · 288dde95
      Daniel Gustafsson 提交于
      An assertion with a side effect may alter the main codepath when
      the tree is built with --enable-cassert, which in turn may lead
      to subtle differences due compiler optimizations and/or straight
      bugs in the side effect. Rewrite the assertions without side
      effects to leave the main codepath intact.
      288dde95
    • A
      Optimize `COPY TO ON SEGMENT` result processing · 266355d3
      Adam Lee 提交于
      Don't send nonsense '\n' characters just for counting, let segments
      report how many rows are processed instead.
      Signed-off-by: NMing LI <mli@apache.org>
      266355d3