1. 07 7月, 2018 1 次提交
    • J
      Do not automatically create an array type for child partitions · a8f5a045
      Jimmy Yih 提交于
      As part of the Postgres 8.3 merge, all heap tables now automatically
      create an array type. The array type will usually be created with
      typname '_<heap_name>' since the automatically created composite type
      already takes the typname '<heap_name>' first. If typname
      '_<heap_name>' is taken, the logic will continue to prepend
      underscores until no collision (truncating the end if typname gets
      past NAMEDATALEN of 64). This might be an oversight in upstream
      Postgres since certain scenarios involving creating a large number of
      heap tables with similar names could result in a lot of typname
      collisions until no heap tables with similar names can be
      created. This is very noticable in Greenplum heap partition tables
      because Greenplum has logic to automatically name child partitions
      with similar names instead of having the user name each child
      partition.
      
      To prevent typname collision failures when creating a heap partition
      table with a large number of child partitions, we will now stop
      automatically creating the array type for child partitions.
      
      References:
      https://www.postgresql.org/message-id/flat/20070302234016.GF3665%40fetter.org
      https://github.com/postgres/postgres/commit/bc8036fc666a8f846b1d4b2f935af7edd90eb5aa
      a8f5a045
  2. 06 7月, 2018 1 次提交
    • O
      Remove deduplication in hyperloglog code · 9c456084
      Omer Arap 提交于
      We had significant deduplication in hyperloglog extension and utility
      library that we use in the analyze related code. This commit removes the
      deduplication as well as significant amount dead code. It also fixes
      some compiler warnings and some coverity issues.
      
      This commit also puts the hyperloglog functions in a separate schema
      which is non-modifiable by non superusers.
      Signed-off-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
      9c456084
  3. 30 6月, 2018 2 次提交
    • S
      Remove irrelevant comments from sql test file · a8f6260e
      Shreedhar Hardikar 提交于
      a8f6260e
    • S
      Fix 'no parameter found for initplan subquery' · f50e5daf
      Shreedhar Hardikar 提交于
      The issue happens because of constant folding in the testexpr of the
      SUBPLAN expression node. The testexpr may be reduced to a const and any
      PARAMs, previous used in the testexpr, disappear, However, the subplan
      still remains.
      
      This behavior is similar in upstream Postgres 10 and may be of
      performance consideration. Leaving that aside for now, the constant
      folding produces an elog(ERROR)s when the plan has subplans and no
      PARAMs are used. This check in `addRemoteExecParamsToParamList()` uses
      `context.params` which computes the used PARAMs in the plan and `nIntPrm
      = list_length(root->glob->paramlist`, which is the number of PARAMs
      declared/created.
      Given the ERROR messages generated, the above check makes no sense.
      Especially since it won’t even trip for the InitPlan bug (mentioned in
      the comments) as long as there is at least one PARAM in the query.
      
      This commit removes this check since it doesn't correctly capture the
      intent.
      
      In theory, it could be be replaced by one specifically aimed at
      InitPlans, that is, find all the params ids used by InitPlan and then
      make sure they are used in the plan. But we already do this and
      remove any unused initplans in `remove_unused_initplans()`. So I don’t
      see the point of adding that.
      
      Fixes #2839
      f50e5daf
  4. 29 6月, 2018 3 次提交
    • O
      Fix incremental analyze for non-matching attnums · ef39e0d0
      Omer Arap 提交于
      To merge stats in incremental analyze for root partition, we use leaf
      tables' statistics. In commit b28d0297,
      we fixed an issue where child attnum do not match with a root table's
      attnum for the same column. After we fixed that issue with a test, that
      test also exposed the bug in analyze code.
      
      This commit fixes the issue in analyze using the similar fix in
      b28d0297.
      ef39e0d0
    • O
      Fix querying stats for largest child · b28d0297
      Omer Arap 提交于
      Previously, we would use the root table's information to acquire stats
      from the `syscache` which return no result. The reason it does not
      return any result is because we query syscache using `inh` field which
      is set true for root table and false for the leaf tables.
      
      Another issue which is not evident is the possibility of mismatching
      `attnum`s for the root and leaf tables after running specific scenarios.
      When we delete a column and then split a partition, unchanged partitions
      and old partitions preserves the old attnums while newly created
      partitions have increasing attnums with no gaps. If we query syscache
      using the root's attnum for that column, we would be getting a wrong
      stats for that specific column. Passing root's `inh` hide the issue of
      having wrong stats.
      
      This commit fixes the issue by getting the attribute name using the
      root's attnume and use it to acquire correct attnum for the largest leaf
      partition.
      b28d0297
    • A
      Perform analyze on specific table in spilltodisk test. · 37c75753
      Ashwin Agrawal 提交于
      No need to have database scope analyze, only specific table needs to be
      analyzed for the test.
      37c75753
  5. 20 6月, 2018 1 次提交
  6. 19 6月, 2018 2 次提交
    • O
      Utilize hyperloglog and merge utilities to derive root table statistics · 9c1b1ae3
      Omer Arap 提交于
      This commit introduces an end-to-end scalable solution to generate
      statistics of the root partitions. This is done by merging the
      statistics of leaf partition tables to generate the statistics of the
      root partition. Therefore, ability to merge leaf table statistics for
      the root table makes analyze very incremental and stable.
      
      **CHANGES IN LEAF TABLE STATS COLLECTION:**
      
      Incremental analyze will create sample for each partition as the
      previous version. While analyzing the sample and generating statistics
      for the partition, it will also create a `hyperloglog_counter` data
      structure and add values from the sample to the `hyperloglog_counter`
      such as number of multiples and sample size. Once the entire sample is
      processed, analyze will save the `hyperloglog_counter` as a byte array
      in `pg_statistic` catalog table. We reserve a slot for the
      `hyperlog_counter` in the table and signify this as a specific type of
      statistic kind which is `STATISTIC_KIND_HLL`. We only keep the
      `hyperloglog_counter` in the `pg_catalog` for the leaf partitions. If
      the user chooses to run FULL scan for HLL, we signify the kind as
      `STATISTIC_KIND_FULLHLL`.
      
      **MERGING LEAF STATISTICS**
      
      Once all the leaf partitions are analyzed, we analyze the root
      partition. Initially, we check if all the partitions have been analyzed
      properly and have all the statistics available to us in the
      `pg_statistic` catalog table. If there is a partition with no tuples,
      even though it has no entry in `pg_catalog`, we consider it as analyzed.
      If for some reason a single partition is not analyzed, we fall back to
      the original analyze algorithm that requires to acquire sample for the
      root partition and calculate statistic based on the sample.
      
      Merging null fraction and average width from leaf partition statistics
      is trivial and does not involve significant challenge. We do calculate
      them first. Then, the remaining statistics information are:
      
      - Number of distinct values (NDV)
      
      - Most common values (MCV), and their frequencies termed as most common
      frequency (MCF)
      
      - Histograms that represent the distribution of the data values in the
      table
      
      **Merging NDV:**
      
      Hyperloglog provides a functionality to merge multiple
      `hyperloglog_counter`s into one and calculate the number of distinct
      values using the aggregated `hyperlog_counter`. This aggregated
      `hyperlog_counter` is sufficient only if the user chooses to run full
      scan for hyperloglog. In the sample based approach, without the
      hyperloglog algorithm, derivation of number of distinct values is not
      possible. Hyperloglog enables us to merge the `hyperloglog_counter`s
      from each partition and calculate the NDV on the merged
      `hyperloglog_counter` with an acceptable error rate. However, it does
      not give us the ultimate NDV of the root partition, it provides us the
      NDV of the union of the samples from each partition.
      
      The rest of the NDV interpolation depends on four metrics in postgres
      and based on the formula used in postgres: NDV in the sample, number of
      multiple values in the sample, sample size and total rows in the table.
      Using these values the algorithm calculates the approximate NDV for the
      table. While merging the statistics from the leaf partitions, with the
      help of hyperloglog we can accurately generate NDV for the sample,
      sample size and total rows, however, number of multiples in the
      accumulated sample is unknown since we do not have an access to the
      accumulated sample at this point.
      
      _Number of Multiples_
      
      Our approach to estimate the number of multiples in the aggregated
      sample (which itself is unavailable) for the root requires the
      availability of NDVs, number of multiples and size of each leaf sample.
      The NDVs in each sample is trivial to calculate using the partition's
      `hyperloglog_counter`. The number of multiples and sample size for each
      partition is saved in the `hyperloglog_counter` of the partition to be
      used in the merge during the leaf statistics gathering.
      
      Estimating the number of multiples in the aggregate sample for the root
      partition is a two step process. First, we accurately estimate the
      number of values that reside in more than one partition's sample. Then,
      we estimate the number of multiples that uniquely exists in a single
      partition. Finally, we add these values to estimate the overall number
      of multiples in the aggregate sample of the root partition.
      
      To count the number of values that uniquely exists in one single
      partition, we utilize hyperloglog functionality. We can easily estimate
      how many values appear only on a specific partition _i_. We call the NDV
      of overall aggregate of the entire partition as `NDV_all` and NDV of
      aggregate of all partitions but _i_ as `NDV_minus_i`. The difference of
      `NDV_all` and  `NDV_minus_i` would result in the values that appear in
      only one partition. The rest of the values will contribute to the
      overall number of multiples in the root’s aggregated sample, and we call
      them as `nMultiple_inter` as the number of values that appear in more
      than one partition.
      
      However, that is not enough since even a single value only resides in
      one partition, the partition might have multiple of them. We need a way
      to express the possibility of existence of these values. Remember that
      we also account the number of multiples that uniquely in partition
      sample. We already know the number of multiples inside a partition
      sample, however we need to normalize this value with the proportion of
      the number of values unique to the partition sample to the number of
      distinct values of the partition sample. The normalized value would be
      partition sample i’s contribution to the overall calculation of the
      nMultiple.
      
      Finally, `nMultiple_root` would be the sum of the `nMultiple_inter` and
      `normalized_m_i` for each partition sample.
      
      **Merging MCVs:**
      
      We utilize the merge functionality we imported from the 4.3 version of
      the greenplum DB. The algorithm is trivial. We convert each MCV’s
      frequency into count and add them up if they appear in more than one
      partition. After every possible candidate’s count has been calculated,
      we sort the candidate values and pick the top ones which is defined by
      the `default_statistics_target`. 4.3 previously blindly picks the top
      values with the highest count. We however incorporated the same logic
      used in the current greenplum and postgres and test if a values is a
      real MCV by running some tests. Therefore, even after the merge, the
      logic totally aligns with the postgres.
      
      **Merging Histograms:**
      
      One of the main novel contribution of this commit comes in how we merge
      the histograms from the leaf partitions. In 4.3 we use priority queue to
      merge the histogram from the leaf partition. However, that approach is
      very naive and loses very important statistical information. In
      postgres, histogram is calculated over the values that did not qualify
      as an MCV. The merge logic for the histograms in 4.3, did not take this
      into consideration and significant statistical information is lost while
      we merge the MCV values.
      
      We introduce a novel approach to feed the MCV’s from the leaf partitions
      that did not qualify as a root MCV to the histogram merge logic. To
      fully utilize the previously implemented priority queue logic, we
      treated non-qualified MCV’s as the histograms of a so called `dummy`
      partitions. To be more previcate, if an MCV m1 is a non-qualified MCV we
      create a histogram [m1, m1] where it only has one bucket and the bucket
      size is the count of this non-qualified MCV. When we merge the
      histograms of the leaf partitions and these dummy partitions the merged
      histogram would not lose any statistical information.
      Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      9c1b1ae3
    • A
      Update .gitignore files · e0e8f475
      Adam Lee 提交于
      To have a clean `git status` output.
      e0e8f475
  7. 16 6月, 2018 1 次提交
    • A
      Fix incorrect modification of storageAttributes.compress. · 7c82d50f
      Ashwin Agrawal 提交于
      For CO table, storageAttributes.compress only conveys if should apply block
      compression or not. RLE is performed as stream compression within the block and
      hence storageAttributes.compress true or false doesn't relate to rle at all. So,
      with rle_type compression storageAttributes.compress is true for compression
      levels > 1 where along with stream compression, block compression is
      performed. For compress level = 1 storageAttributes.compress is always false as
      no block compression is applied. Now since rle doesn't relate to
      storageAttributes.compress there is no reason to touch the same based on
      rle_type compression.
      
      Also, the problem manifests more due the fact in datumstream layer
      AppendOnlyStorageAttributes in DatumStreamWrite (`acc->ao_attr.compress`) is
      used to decide block type whereas in cdb storage layer functions
      AppendOnlyStorageAttributes from AppendOnlyStorageWrite
      (`idesc->ds[i]->ao_write->storageAttributes.compress`) is used. Due to this
      difference changing just one that too unnecessarily is bound to cause issue
      during insert.
      
      So, removing the unnecessary and incorrect update to
      AppendOnlyStorageAttributes.
      
      Test case showcases the failing scenario without the patch.
      7c82d50f
  8. 11 6月, 2018 1 次提交
  9. 09 6月, 2018 2 次提交
  10. 08 6月, 2018 2 次提交
    • A
      Reduce runtimes for some qp_* tests. · 94c30fdb
      Ashwin Agrawal 提交于
      Before:
           qp_functions             ... ok (76.24 sec)  (diff:0.06 sec)
           qp_gist_indexes4         ... ok (88.46 sec)  (diff:0.07 sec)
           qp_with_clause           ... ok (130.70 sec)  (diff:0.32 sec)
      
      After:
           qp_functions             ... ok (4.49 sec)  (diff:0.06 sec)
           qp_gist_indexes4         ... ok (16.18 sec)  (diff:0.06 sec)
           qp_with_clause           ... ok (54.41 sec)  (diff:0.30 sec)
      94c30fdb
    • B
      Add tests to verify that dummy joins are created · fab372cb
      Bhuvnesh Chaudhary 提交于
      For semi join queries if the constraints can eliminate the scanned relations,
      the resulting relation should be marked as a dummy and the join using it should
      be a dummy join.
      fab372cb
  11. 07 6月, 2018 1 次提交
    • P
      Fix hang issue due to result node don't squelch outer node explicitly · 2c011ce4
      Pengzhou Tang 提交于
      For a result node with one-time filter, if it's outer plan is not
      empty and contains a motion node, then it needs to squelch the outer
      node explicitly if the one-time filter check is false. This is necessary
      espically for motion node under it, ExecSquelchNode() force a stop
      message so the interconnect sender don't stuck at recending or polling
      ACKs.
      2c011ce4
  12. 06 6月, 2018 2 次提交
    • A
      Improve query_finish_pending test, cut down 100 secs. · 94f716d4
      Ashwin Agrawal 提交于
      Commit 07ee8008 added test section in
      query_finish_pending.sql to validate case where if a query can be canceled when
      cancel signal arrives fast than the query dispatched. For the same uses sleep
      fault.
      
      But the test was incorrect due to usage of "begin", as begin sleeps for 50 secs
      instead of actual select query sleeping. Also, since the fault always trigger
      the reset fault sleeps for additional 50 secs. Instead remove begin and just set
      endoccurence to 1. Verified modified test fails/hangs without the fix and
      passes/completes in couple secs with the fix.
      94f716d4
    • A
      Drop role if exists in bfv_partition test. · 9b13e7df
      Ashwin Agrawal 提交于
      bfv_partition tests fail if ICW is run n times after creating the cluster, as
      the role is not dropped. With this commit now this test can be run n times
      successfully without re-creating the cluster.
      
      On the way also remove the suppression of warnings in role.sql.
      9b13e7df
  13. 05 6月, 2018 2 次提交
    • A
      SPI 64 bit changes for pl/Python (#4154) · ce22b327
      Andreas Scherbaum 提交于
      SPI 64 bit changes for pl/Python
      
      Includes fault injection tests
      ce22b327
    • J
      Implement CPUSET (#5023) · 0c0782fe
      Jialun 提交于
      * Implement CPUSET, a new management of cpu resource in resource
      group which can reserve the specified cores for specified
      resource group exclusively. This can ensure that there are always
      available cpu resources for the group which has set CPUSET.
      The most common scenario is allocating fixed cores for short
      queries.
      
      - One can use it by executing CREATE RESOURCE GROUP xxx WITH (
        cpuset='0-1', xxxx). 0-1 are the reserved cpu cores for
        this group. Or ALTER RESOURCE GROUP SET CPUSET '0,1' to modify
        the value.
      - The syntax of CPUSET is a combination of the tuples, each
        tuple represents one core number or the core numbers interval,
        separated by comma. E.g. 0,1,2-3. All the core in CPUSET must be
        available in system and the core numbers in each group cannot
        have overlap.
      - CPUSET and CPU_RATE_LIMIT are mutually exclusive. One cannot
        create a resource group with both CPUSET and CPU_RATE_LIMIT.
        But the CPUSET and CPU_RATE_LIMIT can be freely switched in
        one group by executing ALTER operation, that means if one
        feature has been set, the other is disabled.
      - The cpu cores will be returned to GPDB, when the group has been
        dropped, or the CPUSET value has been changed, or the CPU_RATE_LIMIT
        has been set.
      - If some of the cores have been allocated to the resource group,
        then the CPU_RATE_LIMIT in other groups only indicating the
        percentage of cpu resources of the left cpu cores.
      - If the GPDB is busy, all the other cores which have not be
        allocated to any resource groups exclusively through CPUSET
        have already been run out, the cpu cores in CPUSET will still
        not be allocated.
      - The cpu cores in CPUSET will be used exclusively only in GPDB
        level, the other non-GPDB processes in system may use them.
      - Add test cases for this new feature, and the test environment
        must contain at least two cpu cores, so we upgrade the configuration
        of instance_type in resource_group jobs.
      
      * - Compatible with the case that cgroup directory cpuset/gpdb
        does not exist
      - Implement pg_dump for cpuset & memory_auditor
      - Fix a typo
      - Change default cpuset value from empty string to -1, for
        the code in 5X assume that all the default value in
        resource group is integer, a non-integer value will make the
        system fail to start
      0c0782fe
  14. 04 6月, 2018 1 次提交
    • D
      Make resource queues object addressable · ace7a3e9
      Daniel Gustafsson 提交于
      In order to be able to set comments on resource queues, they must
      be object addressable, so fix by implement object addressing. Also
      add a small test for commenting on a resource queue.
      ace7a3e9
  15. 30 5月, 2018 1 次提交
    • T
      Refine the fault injector framework (#5013) · 723e5848
      Tang Pengzhou 提交于
      * Refine the fault injector framework
      
      * Add counting feature so a fault can be triggered N times.
      * Add a simpler version named gp_inject_fault_infinite.
      * Refine and make code cleaner include renaming sleepTimes
        to extraArg so it can be used by other fault types.
      
      Now 3 functions provided:
      
      1. gp_inject_fault(faultname, type, ddl, database, tablename,
      					start_occurrence, end_occurrence, extra_arg, db_id)
      startOccurrence: nth occurrence that a fault starts triggering
      endOccurrence: nth occurrence that a fault stops triggering,
      -1 means the fault is always triggered until it is reset.
      
      2. gp_inject_fault(faultname, type, db_id)
      simpler version for fault triggered only once.
      
      3. gp_inject_fault_infinite(faultname, type, db_id)
      simpler version for fault always triggered until it's reset.
      
      * fix bgwriter_checkpoint case
      
      * use gp_inject_fault_infinite here instead of gp_inject_fault so cache
        of pg_proc that contains gp_inject_fault_infinite is loaded before
        checkpoint and the following gp_inject_fault_infinite don't dirty the
        buffer again.
      * Add a matchsubs to ignore 5 or 6 times hits of fsync_counter.
      
      * Fix flaky twophase_tolerance_with_mirror_promotion test
      
      * use different session for  Scenario 2 and  Scenario 3 because
        the gang of session 2 is no longer valid.
      * wait for wanted fault to be triggered so no unexpected error occurs.
      
      * Add more segment status info to identify error quickly
      
      Some cases are right behind FTS test cases. If the segments are not
      in the desired status, those test cases will fail unexpectedly, this
      commit adds more debug info at the beginning of test cases to help
      to identify issues quickly.
      
      * Enhance cases to skip fts probe for sure
      
      * Do FTS probe request twice to guarantee fts error is triggered
      723e5848
  16. 29 5月, 2018 3 次提交
    • N
      Update RETURNING test cases of replicated tables. · 97fff0e1
      Ning Yu 提交于
      Some error messages were updated during the 9.1 merge, update the
      answers for the RETURNING test cases of replicated tables.
      97fff0e1
    • N
      Support RETURNING for replicated tables. · fb7247b9
      Ning Yu 提交于
      * rpt: reorganize data when ALTER from/to replicated.
      
      There was a bug that altering from/to a replicated table has no effect,
      the root cause is that we did not change gp_distribution_policy neither
      reorganize the data.
      
      Now we perform the data reorganization by creating a temp table with the
      new dist policy and transfering all the data to it.
      
      * rpt: support RETURNING for replicated tables.
      
      This is to support below syntax (suppose foo is a replicated table):
      
      	INSERT INTO foo VALUES(1) RETURNING *;
      	UPDATE foo SET c2=c2+1 RETURNING *;
      	DELETE * FROM foo RETURNING *;
      
      A new motion type EXPLICIT GATHER MOTION is introduced in EXPLAIN
      output, data will be received from one explicit sender in this motion
      type.
      
      * rpt: fix motion type under explicit gather motion.
      
      Consider below query:
      
      	INSERT INTO foo SELECT f1+10, f2, f3+99 FROM foo
      	  RETURNING *, f1+112 IN (SELECT q1 FROM int8_tbl) AS subplan;
      
      We used to generate a plan like this:
      
      	Explicit Gather Motion 3:1  (slice2; segments: 3)
      	  ->  Insert
      	        ->  Seq Scan on foo
      	        SubPlan 1  (slice2; segments: 3)
      	          ->  Gather Motion 3:1  (slice1; segments: 1)
      	                ->  Seq Scan on int8_tbl
      
      A gather motion is used for the subplan, which is wrong and will cause a
      runtime error.
      
      A correct plan is like below:
      
      	Explicit Gather Motion 3:1  (slice2; segments: 3)
      	  ->  Insert
      	        ->  Seq Scan on foo
      	        SubPlan 1  (slice2; segments: 3)
      	          ->  Materialize
      	                ->  Broadcast Motion 3:3  (slice1; segments: 3)
      	                      ->  Seq Scan on int8_tbl
      
      * rpt: add test case for with both PRIMARY and UNIQUE.
      
      On a replicated table we could set both PRIMARY KEY and UNIQUE
      constraints, test cases are added to ensure this feature during future
      development.
      
      (cherry picked from commit 72af4af8)
      fb7247b9
    • N
      Preserve persistence when reorganizing temp tables. · 0ce07109
      Ning Yu 提交于
      When altering a table's distribution policy we might need to reorganize
      the data by creating a __temp__ table, copying the data to it, then swap
      the underlying relation files.  However we always create the __temp__
      table as permanent, then when the original table is temp the underlying
      files can not be found in later queries.
      
      	CREATE TEMP TABLE t1 (c1 int, c2 int) DISTRIBUTED BY (c1);
      	ALTER TABLE t1 SET DISTRIBUTED BY (c2);
      	SELECT * FROM t1;
      0ce07109
  17. 28 5月, 2018 2 次提交
    • N
      Revert "Support RETURNING for replicated tables." · a74875cd
      Ning Yu 提交于
      This reverts commit 72af4af8.
      a74875cd
    • N
      Support RETURNING for replicated tables. · 72af4af8
      Ning Yu 提交于
      * rpt: reorganize data when ALTER from/to replicated.
      
      There was a bug that altering from/to a replicated table has no effect,
      the root cause is that we did not change gp_distribution_policy neither
      reorganize the data.
      
      Now we perform the data reorganization by creating a temp table with the
      new dist policy and transfering all the data to it.
      
      
      * rpt: support RETURNING for replicated tables.
      
      This is to support below syntax (suppose foo is a replicated table):
      
      	INSERT INTO foo VALUES(1) RETURNING *;
      	UPDATE foo SET c2=c2+1 RETURNING *;
      	DELETE * FROM foo RETURNING *;
      
      A new motion type EXPLICIT GATHER MOTION is introduced in EXPLAIN
      output, data will be received from one explicit sender in this motion
      type.
      
      
      * rpt: fix motion type under explicit gather motion.
      
      Consider below query:
      
      	INSERT INTO foo SELECT f1+10, f2, f3+99 FROM foo
      	  RETURNING *, f1+112 IN (SELECT q1 FROM int8_tbl) AS subplan;
      
      We used to generate a plan like this:
      
      	Explicit Gather Motion 3:1  (slice2; segments: 3)
      	  ->  Insert
      	        ->  Seq Scan on foo
      	        SubPlan 1  (slice2; segments: 3)
      	          ->  Gather Motion 3:1  (slice1; segments: 1)
      	                ->  Seq Scan on int8_tbl
      
      A gather motion is used for the subplan, which is wrong and will cause a
      runtime error.
      
      A correct plan is like below:
      
      	Explicit Gather Motion 3:1  (slice2; segments: 3)
      	  ->  Insert
      	        ->  Seq Scan on foo
      	        SubPlan 1  (slice2; segments: 3)
      	          ->  Materialize
      	                ->  Broadcast Motion 3:3  (slice1; segments: 3)
      	                      ->  Seq Scan on int8_tbl
      
      
      * rpt: add test case for with both PRIMARY and UNIQUE.
      
      On a replicated table we could set both PRIMARY KEY and UNIQUE
      constraints, test cases are added to ensure this feature during future
      development.
      72af4af8
  18. 22 5月, 2018 1 次提交
    • R
      Make text array type GPDB hashable. · e21f5efe
      Richard Guo 提交于
      By design, all array types should be hashable by GPDB.
      Issue 3741 shows that GPDB suffers from text array type
      being not hashable error.
      
      This commit fixes that.
      e21f5efe
  19. 18 5月, 2018 1 次提交
    • D
      Fix flakiness in fts_recovery_in_progress · 4eb96801
      David Kimura 提交于
      Issue is that during recovery in progress mirror can take longer than default
      timeout to finish promotion. Because the timeout is sometimes too short
      gprecoverseg would intermittently throw 'error 'can't start transaction' in
      'BEGIN'. This commit leverages GUCS gp_gang_creation_retry_count and
      gp_gang_creation_retry_timer to increase the timeout alotted for gang retriable
      errors to approximately 30 seconds.
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      4eb96801
  20. 17 5月, 2018 1 次提交
    • O
      Create dummy stats for type mismatch · 88b2ab36
      Omer Arap 提交于
      If the column statistics in `pg_statistic` has values with type
      different than column type, metadata accessor should not translate the
      stats and create a dummy stats instead.
      
      This commit also reorders stats collection from the `pg_statistic` to
      align with how analyze generates stats. MCV and Histogram translation is
      moved to the end after NDV, nullfraction and column width extraction.
      Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>
      88b2ab36
  21. 16 5月, 2018 2 次提交
    • J
      Correctly set pg_exttable.logerrors (#4985) · a33b8fc6
      Jesse Zhang 提交于
      Consider the following SQL, we expect logging to be turned off for table
      `ext_error_logging_off`
      
      ```sql
      create external table ext_error_logging_off (a int, b int)
          location ('file:///tmp/test.txt') format 'text'
          segment reject limit 100;
      \d+ ext_error_logging_off
      ```
      And then in this next case we expect error logging to be turned on for
      table `ext_t2`:
      
      ```sql
      create external table ext_t2 (a int, b int)
          location ('file:///tmp/test.txt') format 'text'
          log errors segment reject limit 100;
      \d+ ext_t2
      ```
      
      Before this patch, we are making two mistakes in handling these external
      table DDL:
      
      1. We intend to enable error logging *whenever* the user specifies
      `SEGMENT REJECT` clause, completely ignoring whether he or she specifies
      `LOG ERRORS` 1. Even then, we make the mistake of implicitly coercing
      the OID (an unsigned 32-bit integer) to a bool (which is really just a C
      `char`): that means, 255/256 of the time (99.6%) the result is `true`,
      and 0.4% of the time we get a `false` instead.
      
      The `OID` to `bool` implicit conversion could have been caught by a
      `-Wconversion` GCC/Clang flag. It's most likely a leftover from commit
      8f6fe2d6.
      
      This bug manifests itself in the `dsp` regression test mysteriously
      failing about once every 200 runs -- with the only diff on a `\d+` of an
      external table that should have error logging turned on, but the
      returned definition has it turned off.
      
      While working on this we discovered that all of our existing external
      tables have both `LOG ERRORS` and `SEGMENT REJECT`, which is why this
      bug wasn't caught in the first place.
      
      This patch fixes the issue by properly setting the catalog column
      `pg_exttable.logerrors` according to the user input.
      
      While we were at it, we also cleaned up a few dead pieces of code and
      made the `dsp` test a bit friendlier to debug.
      Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
      Co-authored-by: NDavid Kimura <dkimura@pivotal.io>
      a33b8fc6
    • A
      Readers should check abort status of their subxids · b8d95865
      Asim R P 提交于
      QE readers incorrectly return true for TransactionIdIsCurrentTransactionId()
      when passed with an xid that is an aborted subtransaction of current
      transaction.  The end effect is wrong results because tuples inserted by the
      aborted subtransaction are seen (treated as visible according to MVCC rules) by
      a reader.  Current patch fixes the bug by looking up abort status of an XID
      from pg_clog.  In a QE writer, just like in upstream PostgreSQL, subtransaction
      information is available in CurrentTransactionState (even when subxip cache has
      overflown).  This information is not maintained in shared memory, making it
      unavailable to a reader.  Readers must resort to a longer route to get the same
      information - pg_subtrans and pg_clog.
      
      The patch does not use TransactionIdDidAbort() to check abort status.  The
      interface is designed to work with all transaction IDs.  It walks up the
      transaction hierarchy to look for an aborted parent if status of the given
      transaction is found to be SUB_COMMITTED.  This is a wasted effort when a QE
      reader wants to test if its own subtransaction has aborted or not.  A new
      interface is introduced to avoid this wasted effort for QE readers.  We choose
      to rely on AbortSubTransaction()'s behavior to mark entire subtree under the
      aborted subtransaction to be aborted in pg_clog.  A SUB_COMMITTED status in
      pg_clog, therefore, allows us to conclude that the subtransaction is not
      aborted without having to walk up the hierarchy, provided, the subtransaction
      is child of our own transaction.
      
      The test case also needed a fix because the SQL query (insert into select *)
      didn't result in a reader gang being created.  The SQL is changed to a join on
      a non-distribution column so as to achieve reader gang creation.
      b8d95865
  22. 12 5月, 2018 1 次提交
    • A
      Remove Gp_segment, replace with GpIdentity.segindex. · 660d009a
      Ashwin Agrawal 提交于
      Code had these two variables (GUCs), serving same purpose. GpIdentity.segindex
      is set to content-id, based on command line argument at start-up and inherited
      by all processes from postmaster. Whereas Gp_segment, was session level guc only
      set for backends, by dispatching the same from QD. So. essentially Gp_segment
      was not available and had incorrect value in auxiliary processes.
      
      Hence replaced all usages with GpIdentity.segindex. As a side effect log files
      now have correct value reported for segment number (content-id) in each and
      every line of file, irrespective of which process generated the log message.
      
      Discussion:
      https://groups.google.com/a/greenplum.org/forum/#!msg/gpdb-dev/Yr8-LZIiNfA/ob4KLgmkAQAJ
      660d009a
  23. 11 5月, 2018 2 次提交
    • N
      resgroup: add back guc gp_resgroup_memory_policy. · f0d268e7
      Ning Yu 提交于
      This guc was removed by accident in
      5cc0ec50.
      
      In case we remove the gucs by accident in the future, we added a
      testcase to check for the existence of the resgroup gucs.
      f0d268e7
    • J
      Add guards to prevent negative and duplicate partition rule order values · fd9a83b1
      Jimmy Yih 提交于
      There were scenarios where adding a new partition to a partition table would
      cause a negative or duplicate partition rule order (parruleord) value to show
      up in the pg_partition_rule catalog table.
      
      1. Negative parruleord values could show up during parruleord gap closing when
         the new partition is inserted above a parruleord gap.
      2. Negative parruleord values could show up when the max number of partitions
         for that level has been reached (32767), and there is an attempt to add a
         new partition that would have been the highest ranked partition in that
         partition's partition range.
      3. Duplicate parruleord values could show up when the max number of partitions
         for that level has been reached (32767), and there is an attempt to add a
         new partition that would have been inserted between the partition table's
         sequence of parruleord values.
      Co-authored-by: NDavid Kimura <dkimura@pivotal.io>
      fd9a83b1
  24. 08 5月, 2018 1 次提交
  25. 03 5月, 2018 1 次提交
    • Z
      Add Global Deadlock Detector. · 03915d65
      Zhenghua Lyu 提交于
      To prevent distributed deadlock, in Greenplum DB an exclusive table lock is
      held for UPDATE and DELETE commands, so concurrent updates the same table are
      actually disabled.
      
      We add a backend process to do global deadlock detect so that we do not lock
      the whole table while doing UPDATE/DELETE and this will help improve the
      concurrency of Greenplum DB.
      
      The core idea of the algorithm is to divide lock into two types:
      
      - Persistent: the lock can only be released after the transaction is over(abort/commit)
      - Otherwise cases
      
      This PR’s implementation adds a persistent flag in the LOCK, and the set rule is:
      
      - Xid lock is always persistent
      - Tuple lock is never persistent
      - Relation is persistent if it has been closed with NoLock parameter, otherwise
        it is not persistent Other types of locks are not persistent
      
      More details please refer the code and README.
      
      There are several known issues to pay attention to:
      
      - This PR’s implementation only cares about the locks can be shown
        in the view pg_locks.
      - This PR’s implementation does not support AO table. We keep upgrading
        the locks for AO table.
      - This PR’s implementation does not take networking wait into account.
        Thus we cannot detect the deadlock of GitHub issue #2837.
      - SELECT FOR UPDATE still lock the whole table.
      Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
      Co-authored-by: NNing Yu <nyu@pivotal.io>
      03915d65
  26. 02 5月, 2018 2 次提交
    • H
      Re-enable MIN/MAX optimization. · 362fc756
      Heikki Linnakangas 提交于
      I'm not sure why it's been disabled. It's not very hard to make it work, so
      let's do it. Might not be a very common query type, but if you happen to
      have a query where it helps, it helps a lot.
      
      This adds a GUC, gp_enable_minmax_optimization, to enable/disable the
      optimization. There's no such GUC in upstream, but we need at least a flag
      in PlannerConfig for it, so that we can disable the optimization for
      correlated subqueries, along with some other optimizer tricks. Seems best
      to also have a GUC for it, for consistency with other flags in
      PlannerConfig.
      362fc756
    • A
      FTS detects when primary is in recovery avoiding config change · d453a4aa
      Ashwin Agrawal 提交于
      Previous behavior when primary is in crash recovery FTS probe fails and hence
      qqprimary is marked down. This change provides a recovery progress metric so that
      FTS can detect progress. We added last replayed LSN number inside the error
      message to determine recovery progress. This allows FTS to distinguish between
      recovery in progress and recovery hang or rolling panics. Only when FTS detects
      recovery is not making progress then FTS marks primary down.
      
      For testing a new fault injector is added to allow simulation of recovery hang
      and recovery in progress.
      
      Just fyi...this reverts the reverted commit 7b7219a4.
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Co-authored-by: NDavid Kimura <dkimura@pivotal.io>
      d453a4aa