1. 31 7月, 2017 2 次提交
  2. 25 7月, 2017 1 次提交
    • N
      Fix resgroup ICW failures · 4165a543
      Ning Yu 提交于
      * Fix the resgroup assert failure on CREATE INDEX CONCURRENTLY syntax.
      
      When resgroup is enabled an assertion failure will be encountered with
      below case:
      
          SET gp_create_index_concurrently TO true;
          DROP TABLE IF EXISTS concur_heap;
          CREATE TABLE concur_heap (f1 text, f2 text, dk text) distributed by (dk);
          CREATE INDEX CONCURRENTLY concur_index1 ON concur_heap(f2,f1);
      
      The root cause is that we had the assumption on QD that a command is
      dispatched to QEs when assigned to a resgroup, but this is false with
      CREATE INDEX CONCURRENTLY syntax.
      
      To fix it we have to make necessary check and cleanup on QEs.
      
      * Do not assign a resource group in SIGUSR1 handler.
      
      When assigning a resource group on master it might call WaitLatch() to
      wait for a free slot. However as WaitLatch() expects to be waken by the
      SIGUSR1 signal, it will run into endless waiting when SIGUSR1 is
      blocked.
      
      One scenario is the catch up handler. Catch up handler is triggered and
      executed directly in the SIGUSR1 handler, so during its execution
      SIGUSR1 is blocked. And as catch up handler will begin a transaction so
      it will try to assign a resource group and trigger the endless waiting.
      
      To fix this we add the check to not assign a resource group when running
      inside the SIGUSR1 handler. As signal handlers are supposed to be light
      and short and safe, so skip resource group in such a case shall be
      reasonable.
      4165a543
  3. 24 7月, 2017 2 次提交
    • X
      Use non-blocking recv() in internal_cancel() · 23e5a5ee
      xiong-gang 提交于
      The issue of hanging on recv() in internal_cancel() are reported
      serveral times, the socket status is shown 'ESTABLISHED' on master,
      while the peer process on the segment has already exit. We are not
      sure how exactly dose this happen, but we are able to simulate this
      hang issue by dropping packet or reboot the system on the segment.
      
      This patch use poll() to do non-blocking recv() in internal_cancel();
      the timeout of poll() is set to the max value of authentication_timeout
      to make sure the process on segment has already exit before attempting
      another retry; and we expect retry on connect() can detect network issue.
      Signed-off-by: NNing Yu <nyu@pivotal.io>
      23e5a5ee
    • Z
      Detect cgroup mount point at runtime. (#2790) · 1b1b3a11
      Zhenghua Lyu 提交于
      In the past, we use hard coded path "/sys/fs/cgroup" as cgroup mount
      point, this can be wrong when 1) running on old kernels or 2) the
      customer has special cgroup mount points.
      
      Now we detect the mount point at runtime by checking /proc/self/mounts.
      Signed-off-by: NNing Yu <nyu@pivotal.io>
      1b1b3a11
  4. 22 7月, 2017 4 次提交
    • A
      Revert "Make SQL based fault injection function available to all tests." · 582d0fd4
      Asim R P 提交于
      Loading a C UDF within postgres binary is not a good idea.  The binary cannot
      be loaded as a shared object on Linux (it works on OSX).
      
      This reverts commit 9361a6dd.
      582d0fd4
    • A
      Revert "Move gp_inject_fault function to pg_regress/regress.c" · d3c78a1a
      Asim R P 提交于
      regress.c cannot include fmgroids.h because the header file is generated during
      build process.  The ICW jobs in CI checkout gpdb source code and run make from
      within src/test/regress.  That fails to find fmgroids.h.  It seems we need a
      dedicated contrib module for gp_inject_fault.
      
      This reverts commit bd26a268.
      d3c78a1a
    • A
      Move gp_inject_fault function to pg_regress/regress.c · bd26a268
      Asim R P 提交于
      This fixes ICW breakage due to "postgres" binary cannot be loaded as a shared
      library.  To run gp_fault_inject() function manually, generate regress.so by
      running make in src/test/regress.  Thereafter, create function command can be
      used to create the function, as in create_fault_function.source.
      bd26a268
    • A
      Make SQL based fault injection function available to all tests. · 9361a6dd
      Asim R P 提交于
      The function gp_inject_fault() was defined in a test specific contrib module
      (src/test/dtm).  All tests can now make use of it.  Two pg_regress tests
      (dispatch and cursor) are modified to demonstrate the usage.  The function is
      also made capable to inject fault in any segment, specified by dbid.  No more
      invoking gpfaultinjector python script from SQL files.
      9361a6dd
  5. 18 7月, 2017 1 次提交
  6. 13 7月, 2017 3 次提交
    • D
      Remove unreachable and unused code (#2611) · f4e50a64
      Daniel Gustafsson 提交于
      This removes code which is either unreachable due to prior identical
      tests which break the codepath, or which is dead due to always being
      true. Asserting that an unsigned integer is >= 0 will always be true,
      so it's pointless.
      
      Per "logically dead code" gripes by Coverity
      f4e50a64
    • A
      gpfaultinjector should work with filerep disabled · 41ba1012
      Abhijit Subramanya 提交于
      If we try to inject certain faults when the system is initialized with filerep
      disabled, we get the following error:
      
      ```
      gpfaultinjector error: Injection Failed: Failure: could not insert fault
      injection, segment not in primary or mirror role
      Failure: could not insert fault injection, segment not in primary or mirror
      role
      ```
      
      This patch removes the check for the role for non-filerep faults so that they
      don't fail on a cluster initialized without filerep.
      41ba1012
    • A
      Add GUC to control number of blocks that a resync worker operates on · 2960bd7c
      Asim R P 提交于
      The GUC gp_changetracking_max_rows replaces a compile time constant.  Resync
      worker obtains at the most gp_changetracking_max_rows number of changed blocks
      from changetracking log at one time.  Controling this with a GUC allows
      exploiting bugs in resync logic around this area.
      2960bd7c
  7. 10 7月, 2017 2 次提交
  8. 07 7月, 2017 1 次提交
    • N
      Resgroup catalog changes · 4fafebe2
      Ning Yu 提交于
      Change initial contents in pg_resgroupcapability:
      * Remove memory_redzone_limit;
      * Add memory_shared_quota, memory_spill_ratio;
      
      Change resgroup concurrency range to [1, 'max_connections']:
      * Original range is [0, 'max_connections'], and -1 means unlimited.
      * Now the range is [1, 'max_connections'], and -1 is not supported.
      
      Change resgroup limit type from float to int.
      
      Changed below resgroup resource limit types from float to int percentage value:
      * cpu_rate_limit;
      * memory_limit;
      * memory_shared_quota;
      * memory_spill_ratio;
      4fafebe2
  9. 06 7月, 2017 2 次提交
    • D
      Remove SAN failover catalog leftovers (#2721) · ad2f4d1c
      Daniel Gustafsson 提交于
      Commit a8f956c6 removed the old SAN
      failover code but left the catalogs in place due to catalog change
      freeze. This removes the no longer used catalogs and the relevant
      doc entries.
      ad2f4d1c
    • D
      Support an optional message in backend cancel/terminate (#2729) · fa6c2d43
      Daniel Gustafsson 提交于
      This adds the ability for the caller of pg_terminate_backend() or
      pg_cancel_backend() to include an optional message to the process
      which is being signalled. The message will be appended to the error
      message returned to the killed process. The new syntax is overloaded
      as:
      
          SELECT pg_terminate_backend(<pid> [, msg]);
          SELECT pg_cancel_backend(<pid> [, msg]);
      fa6c2d43
  10. 29 6月, 2017 3 次提交
    • H
      Change the way OIDs are preserved during pg_upgrade. · f51f2f57
      Heikki Linnakangas 提交于
      Instead of meticulously recording the OIDs of each object in the pg_dump
      output, dump and load all OIDs as a separate steps in pg_upgrade.
      
      We now only preserve OIDs of types, relations and schemas from the old
      cluster. Other objects are assigned new OIDs as part of the restore.
      To ensure the OIDs are consistent between the QD and QEs, we dump the
      (new) OIDs of all objects to a file, after upgrading the QD node, and use
      those OIDs when restoring the QE nodes. We were already using a similar
      mechanism for new array types, but we now do that for all objects.
      f51f2f57
    • D
      Fix incorrect struct member in backport · 51bd8009
      Daniel Gustafsson 提交于
      The backport of the data checksum catalog changes backported the
      relevant GUC from a version which has struct config_bool defined
      differently than GPDB. The reason an extra NULL in the config_bool
      array initialization wasn't causing a compilation failure is that
      there is an extra bool member at the end which is only set during
      runtime, reset_val. The extra NULL was "overflowing" into this
      member and thus only raised a warning under -Wint-conversion:
      
          guc.c:1180:15: warning: incompatible pointer to integer
      	               conversion initializing 'bool' (aka 'char')
      				   with an expression of type 'void *’
      
      Fix by removing the superflous NULL. Since it was setting reset_val
      to NULL (and for a GUC which is yet to "do something") there should
      be no effects by this.
      51bd8009
    • N
      Implement resgroup memory limit (#2669) · b5e1fb0a
      Ning Yu 提交于
      Implement resgroup memory limit.
      
      In a resgroup we divide the memory into several slots, the number
      depends on the concurrency setting in the resgroup. Each slot has a
      reserved quota of memory, all the slots also share some shared memory
      which can be acquired preemptively.
      
      Some GUCs and resgroup options are defined to adjust the exact allocation
      policy:
      
      resgroup options:
      - memory_shared_quota
      - memory_spill_ratio
      
      GUCs:
      - gp_resource_group_memory_limit
      Signed-off-by: NNing Yu <nyu@pivotal.io>
      b5e1fb0a
  11. 28 6月, 2017 2 次提交
    • K
      Destory dangling Gang if interrupted during creation. (#2696) · 90f59f88
      Kenan Yao 提交于
      If QD receives a SIGINT and calls CHECK_FOR_INTERRUPTS after finishing Gang creation, but before recording this Gang in global variables like primaryWriterGang, this Gang would not be destroyed, hence next time QD wants to create a new writer Gang, it would find existing writer Gang on segments, and report snapshot collision error.
      90f59f88
    • A
      Set the stage for heap checksum feature by pulling in pg_control change · bc2d9891
      Asim R P 提交于
      This patch pulls in the addition of checksum version information to pg_control
      and a GUC to report the checksum version.  Heap data checksum feature will be
      pulled in its entirety as subsequent patches.
      
      Upstream commit that this patch pulls from:
      
      commit 96ef3b8f
      Author: Simon Riggs <simon@2ndQuadrant.com>
      Date:   Fri Mar 22 13:54:07 2013 +0000
      
          Allow I/O reliability checks using 16-bit checksums
      
      commit 44395174
      Author: Simon Riggs <simon@2ndQuadrant.com>
      Date:   Tue Apr 30 12:27:12 2013 +0100
      
          Record data_checksum_version in control file.
      commit 5a7e75849cb595943fc605c4532716e9dd69f8a0
      Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
      Date:   Mon Sep 16 14:36:01 2013 +0300
      
          Add a GUC to report whether data page checksums are enabled.
      bc2d9891
  12. 27 6月, 2017 1 次提交
    • N
      Alter resgroup cpu · 212fb781
      Ning Yu 提交于
      Support ALTER RESOURCE GROUP SET CPU_RATE_LIMIT syntax.
      
      The new cpu rate limit take effect immediately at end of transaction.
      
      Example 1:
      
          CREATE RESOURCE GROUP g1
              WITH (cpu_rate_limit=0.1,memory_limit=0.1);
          ALTER RESOURCE GROUP g1 SET CPU_RATE_LIMIT 0.2;
      
      The new cpu rate limit take effect immediately.
      
      Example 2:
      
          BEGIN;
          ALTER RESOURCE GROUP g1 SET CPU_RATE_LIMIT 0.2;
      
      The new cpu rate limit doesn't take effect unless the transaction is
      committed.
      Signed-off-by: NRichard Guo <riguo@pivotal.io>
      Signed-off-by: NGang Xiong <gxiong@pivotal.io>
      212fb781
  13. 24 6月, 2017 1 次提交
    • A
      Enable xlogging for create fs objects on segments. · 9efec6b2
      Ashwin Agrawal 提交于
      Incase of --enable-segwalrep, write-ahead logging should not be skipped for
      anything, as it relies on that mechanism to construct the things on
      mirror. Write-ahead logging for these pieces were only enabled performed for
      master, with this commit gets enabled for segments as well.
      9efec6b2
  14. 22 6月, 2017 3 次提交
    • R
      Manipulate callback functions for resource group related operations. · 4ccf54c4
      Richard Guo 提交于
      A dedicated list is maintained for resource group related callbacks.
      At transaction end, the callback functions are processed in the order
      of FIFO on COMMIT, and in the order of LIFO on ABORT.
      Signed-off-by: NPengzhou Tang <ptang@pivotal.io>
      4ccf54c4
    • F
      Eliminating alien nodes before execution (#2588) · 9b8f5c0b
      foyzur 提交于
      In GPDB the dispatcher dispatches the entire plan tree to each query executor (QX). Each QX deserializes the entire plan tree and starts execution from the root of the plan tree. This begins by calling InitPlan on the QueryDesc, which blindly calls ExecInitNode on the root of the plan.
      
      Unfortunately, this is wasteful, in terms of memory and CPU. Each QX is in charge of a single slice. There can be many slices. Looking into plan nodes that belong to other slices, and initializing (e.g., creating PlanState for such nodes) is clearly wasteful. For large plans, particularly planner plans, in the presence of partitions, this can add up to a significant waste.
      
      This PR proposes a fix to solve this problem. The idea is to find the local root for each slice and start ExecInitNode there.
      
      There are few special cases:
      
      SubPlans are special, as they appear as expression but the expression holds the root of the sub plan tree. All the subplans are bundled in the plannedstmt->subplans, but confusingly as Plan pointers (i.e., we save the root of the SubPlan expression's Plan tree). Therefore, to find the relevant sub plans, we need to first find the relevant expressions and extract their roots and then iterate the plannedstmt->subplans, but only ExecInitNode on the ones that we can reach from some expressions in current slice.
      
      InitPlan are no better as they can appear anywhere in the Plan tree. Walking from a local motion is not sufficient to find these InitPlan. Therefore, we need to walk from the root of the plan tree and identify all the SubPlan. Note: unlike regular subplan, the initplan may not appear in the expression as subplan; rather it will appear as a parameter generator in some other parts of the tree. We need to find these InitPlan and obtain the SubPlan for each InitPlan. We can then use the SubPlan's setParam to copy precomputed parameter values from estate->es_param_list_info to estate->es_param_exec_vals
      
      We also found that the origSliceIdInPlan is highly unreliable and cannot be used as an indicator of a plan node's slice information. Therefore, we precompute each plan node's slice information to correctly determine if a Plan node is alien or not. This makes alien node identification more accurate. In successive PRs, we plan to use the alien memory account balance as a test to see if we successfully eliminated all aliens. We will also use the alien account balance to determine memory savings.
      9b8f5c0b
    • F
      Assign Rollover as parent account if the parent account is obsolete (#2609) · 2cabdf9d
      foyzur 提交于
      Detecting dead parent account and replacing with Rollover during memory accounting array to tree conversion.
      
      * Unit test to check if children of dead parents are serialized as children of Rollover account.
      2cabdf9d
  15. 21 6月, 2017 1 次提交
  16. 20 6月, 2017 2 次提交
    • A
      Remove tmlock test and add an assert instead. · 944306d7
      Abhijit Subramanya 提交于
      The test used to validate that the tmlock is not held after completing the DTM
      recovery. The root cause for not releasing the lock was that in case of an
      error during recovery `elog_demote(WARNING)` was called which would demote the
      error to a warning. This would cause the abort processing code to not get
      executed and hence the lock would not be released. Adding a simple assert in
      the code once DTM recovery is complete is sufficient to make sure that the lock
      is released.
      944306d7
    • A
      Readers shouldn't check lock waitMask if writer holds the lock · 61623ce7
      Asim R P 提交于
      Otherwise there is a possibility of distributed deadlock.  One such deadlock is
      caused by ENTRY_DB_SINGLETON reader entering LockAcquire when QD writer of the
      same MPP session already holds the lock.  A backend from another MPP session is
      already waiting on the lock with a lockmode that conflicts with the reader's
      requested lockmode.  This results in waitMask conflict and the reader is
      enqueued in the wait queue.  But the QD writer is never going to release the
      lock because it's waiting for tuples from segments (QE writers/readers).  And
      the QE writers/readers are also waiting for the ENTRY_DB_SINGLETON reader,
      completing the cycle necessary for deadlock.
      
      The fix is to avoid checking waitMask conflicts for a reader if writer of the
      same MPP session already holds the lock.  In such a case the reader is granted
      the lock as long as it does not conflict with existing holders of the lock.
      
      Two isloation2 tests are added.  One simulates the above mentioned deadlock and
      fails if it occurs.  Another ensures that granting locks to readers without
      checking waitMask conflict does not starve existing waiters.
      
      cf. https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/OS1-ODIK0P4/ZIzayBbMBwAJSigned-off-by: NXin Zhang <xzhang@pivotal.io>
      61623ce7
  17. 19 6月, 2017 2 次提交
  18. 17 6月, 2017 1 次提交
    • F
      Merge 8.4 CTE (sans recursive) · 41c3b698
      This brought in postgres/postgres@44d5be0 pretty much wholesale, except:
      
      1. We leave `WITH RECURSIVE` for a later commit. The code is brought in,
          but kept dormant by us bailing early at the parser whenever there is
          a recursive CTE.
      2. We use `ShareInputScan` in the stead of `CteScan`. ShareInputScan is
          basically the parallel-capable `CteScan`. (See `set_cte_pathlist`
          and `create_ctescan_plan`)
      3. Consequently we do not put the sub-plan for the CTE in a
          pseudo-initplan: it is directly present in the main plan tree
          instead, hence we disable `SS_process_ctes` inside
          `subquery_planner`
      4. Another corollary is that all new operators (`CteScan`,
          `RecursiveUnion`, and `WorkTableScan`) are dead code right now. But
          they will come to live once we bring in parallel implementation of
          `WITH RECURSIVE`
      
      In general this commit reduces the divergence between Greenplum and
      upstream.
      
      User visible changes:
      The merge in parser enables a corner case previously treated as error:
      you can now specify fewer columns in your `WITH` clause than the actual
      projected columns in the body subquery of the `WITH`.
      
      Original commit message:
      
      > Implement SQL-standard WITH clauses, including WITH RECURSIVE.
      >
      > There are some unimplemented aspects: recursive queries must use UNION ALL
      > (should allow UNION too), and we don't have SEARCH or CYCLE clauses.
      > These might or might not get done for 8.4, but even without them it's a
      > pretty useful feature.
      >
      > There are also a couple of small loose ends and definitional quibbles,
      > which I'll send a memo about to pgsql-hackers shortly.  But let's land
      > the patch now so we can get on with other development.
      >
      > Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane
      >
      
      (cherry picked from commit 44d5be0e)
      41c3b698
  19. 09 6月, 2017 1 次提交
  20. 08 6月, 2017 1 次提交
  21. 07 6月, 2017 3 次提交
    • P
      restore TCP interconnect · 353a937d
      Pengzhou Tang 提交于
      This commit restore TCP interconnect and fix some hang issues.
      
      * restore TCP interconnect code
      * Add GUC called gp_interconnect_tcp_listener_backlog for tcp to control the backlog param of listen call
      * use memmove instead of memcpy because the memory areas do overlap.
      * call checkForCancelFromQD() for TCP interconnect if there are no data for a while, this can avoid QD from getting stuck.
      * revert cancelUnfinished related modification in 8d251945, otherwise some queries will get stuck
      * move and rename faultinjector "cursor_qe_reader_after_snapshot" to make test cases pass under TCP interconnect.
      353a937d
    • P
      Misc changes of gp_log_gang · 9d5b10ae
      Pengzhou Tang 提交于
      * Change the default level of gp_log_gang to off.
      * Log the query plan size in level TERSE, it's useful for debugging.
      9d5b10ae
    • M
      gpperfmon: remove iterators_history data and table · c0c1897f
      Melanie Plageman 提交于
      - Remove iteration specific members of qexec packet
      - Remove iterators_history table
      - Remove measures used to populate iterators_history
      - Remove iterator_aggregate flag
      Signed-off-by: NNadeem Ghani <nghani@pivotal.io>
      Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>
      c0c1897f
  22. 05 6月, 2017 1 次提交
    • R
      Record memory usage for resource group. · 83b58aef
      Richard Guo 提交于
      Record memory usage for resource group.
      
      1. Update total memory usage for a resource group when
         a session belonging to this group allocates/frees memory.
      
      2. Update total memory usage for related resource groups
         when a session enters into or leaves from a resource group.
      
      3. Dispatch current resource group ID from QD to QEs to
         keep track of current resource group.
      
      4. Show total memory usage of a resource group.
      
      5. Add test case for memory usage recording of resource group.
      Signed-off-by: Nxiong-gang <gxiong@pivotal.io>
      Signed-off-by: NKenan Yao <kyao@pivotal.io>
      Signed-off-by: NNing Yu <nyu@pivotal.io>
      83b58aef