1. 14 3月, 2019 1 次提交
  2. 12 3月, 2019 1 次提交
  3. 11 3月, 2019 1 次提交
    • N
      Retire the reshuffle method for table data expansion (#7091) · 1c262c6e
      Ning Yu 提交于
      This method was introduced to improve the data redistribution
      performance during gpexpand phase2, however per benchmark results the
      effect does not reach our expectation.  For example when expanding a
      table from 7 segments to 8 segments the reshuffle method is only 30%
      faster than the traditional CTAS method, when expanding from 4 to 8
      segments reshuffle is even 10% slower than CTAS.  When there are indexes
      on the table the reshuffle performance can be worse, and extra VACUUM is
      needed to actually free the disk space.  According to our experiments
      the bottleneck of reshuffle method is on the tuple deletion operation,
      it is much slower than the insertion operation used by CTAS.
      
      The reshuffle method does have some benefits, it requires less extra
      disk space, it also requires less network bandwidth (similar to CTAS
      method with the new JCH reduce method, but less than CTAS + MOD).  And
      it can be faster in some cases, however as we can not automatically
      determine when it is faster it is not easy to get benefit from it in
      practice.
      
      On the other side the reshuffle method is less tested, it is possible to
      have bugs in corner cases, so it is not production ready yet.
      
      In such a case we decided to retire it entirely for now, we might add it
      back in the future if we can get rid of the slow deletion or find out
      reliable ways to automatically choose between reshuffle and ctas
      methods.
      
      Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/8xknWag-SkI/5OsIhZWdDgAJReviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
      1c262c6e
  4. 07 12月, 2018 1 次提交
    • N
      Suppress a warning generated on inherited tables. · 8a8b7e5a
      Ning Yu 提交于
      The following WARNING is generated by ANALYZE when some sample tuples
      are from segments outside the [0, numsegments-1] range, however this
      does not indicate the data distribution is wrong.  Take inherited tables
      for example, when inherited tables has greater numsegments than parent
      this WARNING will be raised, and it is expected.  This could happen
      normally on the random_numsegments pipeline job, so ignore this WARNING.
      
          WARNING:  table "patest0" contains rows in segment 2,
                    which is outside the # of segments for the table's policy
                    (2 segments)
      
      Added this pattern to init_file to ignore it.
      8a8b7e5a
  5. 30 11月, 2018 1 次提交
  6. 29 11月, 2018 1 次提交
    • N
      Provide test hook to randomize default numsegments. · 968dfc41
      Ning Yu 提交于
      By loading this hook CREATE TABLE will create tables with random
      numsegments by using the gp_debug_numsegments extension.
      
      It can be enabled via make command like this:
      
          make installcheck EXTRA_REGRESS_OPTS=--prehook=randomize_create_table_default_numsegments
      
      However as the plans can be different with random numsegments it is
      recommended to also ignore the plan diffs, so the make command becomes
      this:
      
          make installcheck EXTRA_REGRESS_OPTS="--prehook=randomize_create_table_default_numsegments --ignore-plans"
      968dfc41
  7. 23 11月, 2018 1 次提交
    • N
      Reduce differences between reshuffle tests · 2eef2ba2
      Ning Yu 提交于
      There are 3 reshuffle tests, the ao one, the co one, and the heap one.
      They share almost the same cases, but different on table names and
      create table options.  There are also some differences caused when
      adding regression tests, they are only added in one file but not others.
      
      We want to keep minimal differences between these tests, so we ensure
      that a regression test for ao also covers similar case for heap.  And
      once we understand one of the test file we have almost the same
      knowledge on the others.
      
      Here is a list of changes to these tests:
      - reduce differences on table names by using schema;
      - reduce differences on CREATE TABLE options by setting default storage
        options;
      - simplify the creation of partially distributed tables by using the
        gp_debug_numsegments extension;
      - copy some regression tests to all the tests;
      - retire the no longer used helper function;
      - move the tests into an existing parallel test group;
      
      pg_regress test framework provides some @@ tokens for ao/co tests,
      however we still can not merge the ao and co tests into one file as
      WITH (OIDS) is only supported by ao but not co.
      2eef2ba2
  8. 22 9月, 2018 1 次提交
    • H
      Change pretty-printing of expressions in EXPLAIN to match upstream. · 4c54c894
      Heikki Linnakangas 提交于
      We had changed this in GPDB, to print less parens. That's fine and dandy,
      but it hardly seems worth it to carry a diff vs upstream for this. Which
      format is better, is a matter of taste. The extra parens make some
      expressions more clear, but OTOH, it's unnecessarily verbose for simple
      expressions. Let's follow the upstream on this.
      
      These changes were made to GPDB back in 2006, as part of backporting
      to EXPLAIN-related patches from PostgreSQL 8.2. But I didn't see any
      explanation for this particular change in output in that commit message.
      
      It's nice to match upstream, to make merging easier. However, this won't
      make much difference to that: almost all EXPLAIN plans in regression
      tests are different from upstream anyway, because GPDB needs Motion nodes
      for most queries. But every little helps.
      4c54c894
  9. 11 8月, 2018 1 次提交
    • A
      Adding GiST support for GPORCA · ec3693e6
      Ashuka Xue 提交于
      Prior to this commit, there was no support for GiST indexes in GPORCA.
      For queries involving GiST indexes, ORCA was selecting Table Scan paths
      as the optimal plan. These plans could take up to 300+ times longer than
      Planner, which generated a index scan plan using the GiST index.
      
      Example:
      ```
      CREATE TABLE gist_tbl (a int, p polygon);
      CREATE TABLE gist_tbl2 (b int, p polygon);
      CREATE INDEX poly_index ON gist_tbl USING gist(p);
      
      INSERT INTO gist_tbl SELECT i, polygon(box(point(i, i+2),point(i+4,
      i+6))) FROM generate_series(1,50000)i;
      INSERT INTO gist_tbl2 SELECT i, polygon(box(point(i+1, i+3),point(i+5,
      i+7))) FROM generate_series(1,50000)i;
      
      ANALYZE;
      ```
      With the query `SELECT count(*) FROM gist_tbl, gist_tbl2 WHERE
      gist_tbl.p <@ gist_tbl2.p;`, we see a performance increase with the
      support of GiST.
      
      Before:
      ```
      EXPLAIN SELECT count(*) FROM gist_tbl, gist_tbl2 WHERE gist_tbl.p <@ gist_tbl2.p;
                                                           QUERY PLAN
      ---------------------------------------------------------------------------------------------------------------------
       Aggregate  (cost=0.00..171401912.12 rows=1 width=8)
         ->  Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..171401912.12 rows=1 width=8)
               ->  Aggregate  (cost=0.00..171401912.12 rows=1 width=8)
                     ->  Nested Loop  (cost=0.00..171401912.12 rows=335499869 width=1)
                           Join Filter: gist_tbl.p <@ gist_tbl2.p
                           ->  Table Scan on gist_tbl2  (cost=0.00..432.25 rows=16776 width=101)
                           ->  Materialize  (cost=0.00..530.81 rows=49997 width=101)
                                 ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..525.76 rows=49997 width=101)
                                       ->  Table Scan on gist_tbl  (cost=0.00..432.24 rows=16666 width=101)
       Optimizer status: PQO version 2.65.1
      (10 rows)
      
      Time: 170.172 ms
      SELECT count(*) FROM gist_tbl, gist_tbl2 WHERE gist_tbl.p <@ gist_tbl2.p;
       count
      -------
       49999
      (1 row)
      
      Time: 546028.227 ms
      ```
      
      After:
      ```
      EXPLAIN SELECT count(*) FROM gist_tbl, gist_tbl2 WHERE gist_tbl.p <@ gist_tbl2.p;
                                                        QUERY PLAN
      ---------------------------------------------------------------------------------------------------------------
       Aggregate  (cost=0.00..21749053.24 rows=1 width=8)
         ->  Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..21749053.24 rows=1 width=8)
               ->  Aggregate  (cost=0.00..21749053.24 rows=1 width=8)
                     ->  Nested Loop  (cost=0.00..21749053.24 rows=335499869 width=1)
                           Join Filter: true
                           ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..526.39 rows=50328 width=101)
                                 ->  Table Scan on gist_tbl2  (cost=0.00..432.25 rows=16776 width=101)
                           ->  Bitmap Table Scan on gist_tbl  (cost=0.00..21746725.48 rows=6667 width=1)
                                 Recheck Cond: gist_tbl.p <@ gist_tbl2.p
                                 ->  Bitmap Index Scan on poly_index  (cost=0.00..0.00 rows=0 width=0)
                                       Index Cond: gist_tbl.p <@ gist_tbl2.p
       Optimizer status: PQO version 2.65.1
      (12 rows)
      
      Time: 617.489 ms
      
      SELECT count(*) FROM gist_tbl, gist_tbl2 WHERE gist_tbl.p <@ gist_tbl2.p;
       count
      -------
       49999
      (1 row)
      
      Time: 7779.198 ms
      ```
      
      GiST support was implemented by sending over GiST index information to
      GPORCA in the metadata using a new index enum specifically for GiST.
      Signed-off-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      ec3693e6
  10. 03 8月, 2018 1 次提交
  11. 02 8月, 2018 1 次提交
    • R
      Merge with PostgreSQL 9.2beta2. · 4750e1b6
      Richard Guo 提交于
      This is the final batch of commits from PostgreSQL 9.2 development,
      up to the point where the REL9_2_STABLE branch was created, and 9.3
      development started on the PostgreSQL master branch.
      
      Notable upstream changes:
      
      * Index-only scan was included in the batch of upstream commits. It
        allows queries to retrieve data only from indexes, avoiding heap access.
      
      * Group commit was added to work effectively under heavy load. Previously,
        batching of commits became ineffective as the write workload increased,
        because of internal lock contention.
      
      * A new fast-path lock mechanism was added to reduce the overhead of
        taking and releasing certain types of locks which are taken and released
        very frequently but rarely conflict.
      
      * The new "parameterized path" mechanism was added. It allows inner index
        scans to use values from relations that are more than one join level up
        from the scan. This can greatly improve performance in situations where
        semantic restrictions (such as outer joins) limit the allowed join orderings.
      
      * SP-GiST (Space-Partitioned GiST) index access method was added to support
        unbalanced partitioned search structures. For suitable problems, SP-GiST can
        be faster than GiST in both index build time and search time.
      
      * Checkpoints now are performed by a dedicated background process. Formerly
        the background writer did both dirty-page writing and checkpointing. Separating
        this into two processes allows each goal to be accomplished more predictably.
      
      * Custom plan was supported for specific parameter values even when using
        prepared statements.
      
      * API for FDW was improved to provide multiple access "paths" for their tables,
        allowing more flexibility in join planning.
      
      * Security_barrier option was added for views to prevents optimizations that
        might allow view-protected data to be exposed to users.
      
      * Range data type was added to store a lower and upper bound belonging to its
        base data type.
      
      * CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The
        SELECT query is planned during the execution of the utility. To conform to
        this change, GPDB executes the utility statement only on QD and dispatches
        the plan of the SELECT query to QEs.
      Co-authored-by: NAdam Lee <ali@pivotal.io>
      Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Co-authored-by: NAsim R P <apraveen@pivotal.io>
      Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
      Co-authored-by: NGang Xiong <gxiong@pivotal.io>
      Co-authored-by: NHaozhou Wang <hawang@pivotal.io>
      Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
      Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
      Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
      Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
      Co-authored-by: NPaul Guo <paulguo@gmail.com>
      Co-authored-by: NRichard Guo <guofenglinux@gmail.com>
      Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
      Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
      Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
      4750e1b6
  12. 19 6月, 2018 1 次提交
    • O
      Utilize hyperloglog and merge utilities to derive root table statistics · 9c1b1ae3
      Omer Arap 提交于
      This commit introduces an end-to-end scalable solution to generate
      statistics of the root partitions. This is done by merging the
      statistics of leaf partition tables to generate the statistics of the
      root partition. Therefore, ability to merge leaf table statistics for
      the root table makes analyze very incremental and stable.
      
      **CHANGES IN LEAF TABLE STATS COLLECTION:**
      
      Incremental analyze will create sample for each partition as the
      previous version. While analyzing the sample and generating statistics
      for the partition, it will also create a `hyperloglog_counter` data
      structure and add values from the sample to the `hyperloglog_counter`
      such as number of multiples and sample size. Once the entire sample is
      processed, analyze will save the `hyperloglog_counter` as a byte array
      in `pg_statistic` catalog table. We reserve a slot for the
      `hyperlog_counter` in the table and signify this as a specific type of
      statistic kind which is `STATISTIC_KIND_HLL`. We only keep the
      `hyperloglog_counter` in the `pg_catalog` for the leaf partitions. If
      the user chooses to run FULL scan for HLL, we signify the kind as
      `STATISTIC_KIND_FULLHLL`.
      
      **MERGING LEAF STATISTICS**
      
      Once all the leaf partitions are analyzed, we analyze the root
      partition. Initially, we check if all the partitions have been analyzed
      properly and have all the statistics available to us in the
      `pg_statistic` catalog table. If there is a partition with no tuples,
      even though it has no entry in `pg_catalog`, we consider it as analyzed.
      If for some reason a single partition is not analyzed, we fall back to
      the original analyze algorithm that requires to acquire sample for the
      root partition and calculate statistic based on the sample.
      
      Merging null fraction and average width from leaf partition statistics
      is trivial and does not involve significant challenge. We do calculate
      them first. Then, the remaining statistics information are:
      
      - Number of distinct values (NDV)
      
      - Most common values (MCV), and their frequencies termed as most common
      frequency (MCF)
      
      - Histograms that represent the distribution of the data values in the
      table
      
      **Merging NDV:**
      
      Hyperloglog provides a functionality to merge multiple
      `hyperloglog_counter`s into one and calculate the number of distinct
      values using the aggregated `hyperlog_counter`. This aggregated
      `hyperlog_counter` is sufficient only if the user chooses to run full
      scan for hyperloglog. In the sample based approach, without the
      hyperloglog algorithm, derivation of number of distinct values is not
      possible. Hyperloglog enables us to merge the `hyperloglog_counter`s
      from each partition and calculate the NDV on the merged
      `hyperloglog_counter` with an acceptable error rate. However, it does
      not give us the ultimate NDV of the root partition, it provides us the
      NDV of the union of the samples from each partition.
      
      The rest of the NDV interpolation depends on four metrics in postgres
      and based on the formula used in postgres: NDV in the sample, number of
      multiple values in the sample, sample size and total rows in the table.
      Using these values the algorithm calculates the approximate NDV for the
      table. While merging the statistics from the leaf partitions, with the
      help of hyperloglog we can accurately generate NDV for the sample,
      sample size and total rows, however, number of multiples in the
      accumulated sample is unknown since we do not have an access to the
      accumulated sample at this point.
      
      _Number of Multiples_
      
      Our approach to estimate the number of multiples in the aggregated
      sample (which itself is unavailable) for the root requires the
      availability of NDVs, number of multiples and size of each leaf sample.
      The NDVs in each sample is trivial to calculate using the partition's
      `hyperloglog_counter`. The number of multiples and sample size for each
      partition is saved in the `hyperloglog_counter` of the partition to be
      used in the merge during the leaf statistics gathering.
      
      Estimating the number of multiples in the aggregate sample for the root
      partition is a two step process. First, we accurately estimate the
      number of values that reside in more than one partition's sample. Then,
      we estimate the number of multiples that uniquely exists in a single
      partition. Finally, we add these values to estimate the overall number
      of multiples in the aggregate sample of the root partition.
      
      To count the number of values that uniquely exists in one single
      partition, we utilize hyperloglog functionality. We can easily estimate
      how many values appear only on a specific partition _i_. We call the NDV
      of overall aggregate of the entire partition as `NDV_all` and NDV of
      aggregate of all partitions but _i_ as `NDV_minus_i`. The difference of
      `NDV_all` and  `NDV_minus_i` would result in the values that appear in
      only one partition. The rest of the values will contribute to the
      overall number of multiples in the root’s aggregated sample, and we call
      them as `nMultiple_inter` as the number of values that appear in more
      than one partition.
      
      However, that is not enough since even a single value only resides in
      one partition, the partition might have multiple of them. We need a way
      to express the possibility of existence of these values. Remember that
      we also account the number of multiples that uniquely in partition
      sample. We already know the number of multiples inside a partition
      sample, however we need to normalize this value with the proportion of
      the number of values unique to the partition sample to the number of
      distinct values of the partition sample. The normalized value would be
      partition sample i’s contribution to the overall calculation of the
      nMultiple.
      
      Finally, `nMultiple_root` would be the sum of the `nMultiple_inter` and
      `normalized_m_i` for each partition sample.
      
      **Merging MCVs:**
      
      We utilize the merge functionality we imported from the 4.3 version of
      the greenplum DB. The algorithm is trivial. We convert each MCV’s
      frequency into count and add them up if they appear in more than one
      partition. After every possible candidate’s count has been calculated,
      we sort the candidate values and pick the top ones which is defined by
      the `default_statistics_target`. 4.3 previously blindly picks the top
      values with the highest count. We however incorporated the same logic
      used in the current greenplum and postgres and test if a values is a
      real MCV by running some tests. Therefore, even after the merge, the
      logic totally aligns with the postgres.
      
      **Merging Histograms:**
      
      One of the main novel contribution of this commit comes in how we merge
      the histograms from the leaf partitions. In 4.3 we use priority queue to
      merge the histogram from the leaf partition. However, that approach is
      very naive and loses very important statistical information. In
      postgres, histogram is calculated over the values that did not qualify
      as an MCV. The merge logic for the histograms in 4.3, did not take this
      into consideration and significant statistical information is lost while
      we merge the MCV values.
      
      We introduce a novel approach to feed the MCV’s from the leaf partitions
      that did not qualify as a root MCV to the histogram merge logic. To
      fully utilize the previously implemented priority queue logic, we
      treated non-qualified MCV’s as the histograms of a so called `dummy`
      partitions. To be more previcate, if an MCV m1 is a non-qualified MCV we
      create a histogram [m1, m1] where it only has one bucket and the bucket
      size is the count of this non-qualified MCV. When we merge the
      histograms of the leaf partitions and these dummy partitions the merged
      histogram would not lose any statistical information.
      Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      9c1b1ae3
  13. 09 11月, 2017 1 次提交
    • A
      Fix cases which are unpredictable (#3797) · 195aaf54
      Adam Lee 提交于
      * Several small fixes of the tests
      
      1, ignore two generated test files.
      2, remove the string containing unpredictable segment numbers.
      3, drop tables in external_table case, so we could run multiple times of it once.
      
      * Fix cases which are unpredictable
      
      > commit 3bbedbe9
      > Author: Heikki Linnakangas <hlinnakangas@pivotal.io>
      > Date:   Thu Nov 2 10:04:58 2017 +0200
      >
      >     Wake up faster, if a segment returns an error.
      >     Previously, if a segment reported an error after starting up the
      >     interconnect, it would take up to 250 ms for the main thread in the QD
      >     process to wake up and poll the dispatcher connections, and to see that
      >     there was an error. Shorten that time, by waking up immediately if the
      >     QD->QE libpq socket becomes readable while we're waiting for data to
      >     arrive in a Motion node.
      >     This isn't a complete solution, because this will only wake up if one
      >     arbitrarily chosen connection becomes readable, and we still rely on
      >     polling for the others. But this greatly speeds up many common scenarios.
      >     In particular, the "qp_functions_in_select" test now runs in under 5 s
      >     on my laptop, when it took about 60 seconds before.
      
      > Before this commit, the master would only check every 250 ms if one of the
      > segments had reported an error. Now it wakes up and cancels the whole query as
      > soon as it receives an error from the first segment. That makes it more likely
      > that the other segments have not yet reached the same number of errors as what
      > is memorized in the expected output.
      
      These two cases check:
      
      1, when selecting from a cte fails, one of the external table of the cte
      reached the error limit, how many errors happened in the other external
      table of the cte, which would not reached the limit.
      
      2, when selecting from an external table with two locations mapped to
      two segments each, one segment reached the reject limit, the other also
      reached the same.
      
      We could not predict these two results without special test files, even
      without that commit actually. This commit removes the cte case and
      checks at least one segment failed in case readable_query26.
      195aaf54
  14. 18 10月, 2017 1 次提交
  15. 14 8月, 2017 1 次提交
    • N
      Make ICW pass when resgroup is enabled. · e1eed831
      Ning Yu 提交于
      * resgroup: increase max slots for isolation tests.
      * ICW: ignore resgroup related warnings.
      * ICW: try to load resgroup variant of answers when resgroup enabled.
      * ICW: provide resgroup variant of answers.
      * ICW: check whether resqueue is enabled in UDF.
      * ICR: substitude usrname in gpconfig output.
      * ICR: explicitly set max_connections.
      * isolation2: increase resgroup concurrency for max_concurrency tests.
      e1eed831
  16. 09 8月, 2017 2 次提交
    • H
      Replace gpfaultinject binary with gp_inject_fault extension in tests. · 5104ca08
      Heikki Linnakangas 提交于
      This replaces all places in regression tests, where the gpfaultinject binary
      was used, with the SQL-callable function in the new gp_inject_fault
      extension. The SQL function is more forgivin about the dev environemnt, and
      doesn't need gpfaultinject to be in $PATH, for starters. Also, it's just
      good to harmonize and have just one way of injecting faults.
      
      More uses of gpfaultinject remain in the TINC tests, so we cannot get rid
      of it any time soon, but this is a step in that direction, anyway.
      5104ca08
    • H
      Enhance gp_inject_fault. · f52fbe57
      Heikki Linnakangas 提交于
      * Turn it into an extension, for easier installation.
      
      * Add a simpler variant of the gp_inject_fault function, with less options.
        This is applicable to almost all the calls in the regression suite, so it's
        nice to make them less verbose.
      
      * Change the dbid argument from smallint to int4. For convenience, so that
        you don't need a cast when calling the function.
      f52fbe57
  17. 04 8月, 2017 1 次提交
    • D
      Set correct errcode for COPY .. ON SEGMENT ereport · f160a47d
      Daniel Gustafsson 提交于
      Using errcode 0 will cause ereport() to treat it as an internal
      error and print the filename/line. Since this is a userfacing
      error it should have a proper errcode to avoid this. This also
      allows the gpdiff rule to be removed.
      f160a47d
  18. 10 5月, 2017 1 次提交
  19. 03 5月, 2017 1 次提交
    • A
      Support COPY ON SEGMENT command · 49b12f18
      Adam Lee 提交于
      Support COPY statement that exports the table directly from segment
      to local file parallelly.
      
      This commit adds a keyword "on segment" to save the copied file on
      "segment" instead of on "master".
      
      Two place holders are used, which are "<SEG_DATA_DIR>" and "<SEGID>"
      and will be replced to segment datadir and segment id.
      
      E.g.
      
      ```
      COPY tbl TO '/tmp/<SEG_DATA_DIR>filename<SEGID>.txt' ON SEGMENT;
      ```
      Signed-off-by: NYuan Zhao <yuzhao@pivotal.io>
      Signed-off-by: NHaozhou Wang <hawang@pivotal.io>
      Signed-off-by: NAdam Lee <ali@pivotal.io>
      49b12f18
  20. 28 2月, 2017 1 次提交
  21. 27 1月, 2017 1 次提交
  22. 24 1月, 2017 2 次提交
    • H
      Move patterns used only by a particular test out of global init_file. · ae5bccd1
      Heikki Linnakangas 提交于
      This reduces the risk of accidentally masking out messages in a test that's
      not supposed to produce such messages in the first place, and is just
      nicer in general, IMHO.
      
      While we're at it, add a brief comment to init_file to explain what it's
      for. Also, remove a few more matchsubs from atmsort.pm that seem to be
      unused.
      ae5bccd1
    • H
      Rewrite table redistribution with dropped types in ALTER TABLE · 62d66c06
      Heikki Linnakangas 提交于
      When a table which had an attribute whose type has been dropped process
      the ALTER TABLE command queue, a "hidden" type will be created, and
      immediately dropped, during ALTER TABLE processing for table
      redistribution. This will emit several NOTICEs which can be confusing to
      the user as it's an autogenerated name and the DROP TYPE can have happened
      at a previous time. Below is an example of the output:
      
        create table <tablename> (a integer, b <typename>);
        drop type <typename>;
        ...
        alter table <tablename> set with(reorganize = true) distributed randomly;
        NOTICE:  return type pg_atsdb_<oid>_2_3 is only a shell
        NOTICE:  argument type pg_atsdb_<oid>_2_3 is only a shell
        NOTICE:  drop cascades to function pg_atsdb_<oid>_2_3_out(pg_atsdb_<oid>_2_3)
        NOTICE:  drop cascades to function pg_atsdb_<oid>_2_3_in(cstring)
      
      The reason for adding the hidden types is that the redistribution is
      performed with a CTAS doing SELECT *. To fix, change the way the CTAS is
      done, to not create hidden types.
      
      The temp table that we create still needs to include dropped columns at the
      same positions as the old one. Otherwise, when we swap the relation files,
      a tuple's representation on-disk won't match the catalogs. However, we
      cannot easily re-construct a dropped column with the same attlen, attalign,
      etc. as the original dropped column. Instead, create it as if it was an
      INT4 column, and just before swapping the relation files, update the
      attlen, attalign fields in pg_attribute entries of the dropped columns to
      match that of INT4. That way, the original table's catalog entries match
      that of the temp table.
      
      Alternatively, we could build the temp table without the dropped columns,
      and remove them from pg_attribute altogether. However, we'd need to update
      the attnum field of all following columns, and cascade that change to at
      least pg_attrdef and pg_depend. That seems more complicated.
      
      Also remove output from expected testfiles and perform minor cleanups.
      
      Original patch by Daniel Gustafsson, with the int4-placeholder mechanism
      added by me.
      62d66c06
  23. 19 12月, 2016 2 次提交
    • D
      Downgrade buffer capacity WARNING to LOG · d355e78e
      Daniel Gustafsson 提交于
      While it should be rare (and the original ticket referred indicates
      that it is), it's perfectly legal for a UDP buffer to fill up. Set
      the messagelevel to LOG rather than WARNING.
      d355e78e
    • D
      Make NOTICE for table distribution consistent · bbed4116
      Daniel Gustafsson 提交于
      The different kinds of NOTICE messages regarding table distribution
      were using a mix of upper and lower case for 'DISTRIBUTED BY'. Make
      them consistent by using upper case for all messages and update the
      test files, and atmsort regexes, to match.
      bbed4116
  24. 18 11月, 2016 2 次提交
    • H
      Use proper error code for errors. · 0bf31cd6
      Heikki Linnakangas 提交于
      Attach a suitable error code for many errors that were previously reported
      as "internal errors". GPDB's elog.c prints a source file name and line
      number for any internal errors, which is a bit ugly for errors that are
      in fact unexpected internal errors, but user-facing errors that happen
      as a result of e.g. an invalid query.
      
      To make sure we don't accumulate more of these, adjust the regression tests
      to not ignore the source file and line number in error messages. There are
      a few exceptions, which are listed explicitly.
      0bf31cd6
    • H
      Remove unnecessary ignore-directive for COptTasks.cpp. · 459db892
      Heikki Linnakangas 提交于
      With commit 61972775, we use a proper SQLSTATE for the errors that
      needed this before.
      459db892
  25. 19 9月, 2016 1 次提交
    • P
      Make dispatch testcases stable and independent of compile configuration · 1c5c12a6
      Pengzhou Tang 提交于
      When process_startup_packets is triggered, gp_debug_linger was not set to 0 which cause anoying message "HINT: process xxxx" exist in output file and make tests unstable. This commit change the fault injection location to send_qe_details_init_backend where gp_debug_linger has been set to 0, so no hint message is generated in output file.
      1c5c12a6
  26. 13 9月, 2016 1 次提交
  27. 06 9月, 2016 1 次提交
  28. 05 9月, 2016 1 次提交
  29. 20 8月, 2016 1 次提交
  30. 17 8月, 2016 1 次提交
  31. 16 7月, 2016 1 次提交
    • H
      Clean up gpdiff's init_file and built-in ignore- and subs- patterns · c425899d
      Heikki Linnakangas 提交于
      * Anchor all the ERROR, WARNING etc. messages to beginning of line, with
        "/^..."
      
      * Remove obsolete substititions, for error messages that don't appear
        anywhere in the code anymore.
      
      * Remove redundant replacements of source line numbers in error messages,
        like "(xact.c:%d)". There is a special rule that replaces all of those
        with (SOMEFILE:SOMEFUNC).
      
      * Replace case-insensitive rules with case-sensitive ones.
      
      * Replace sloppy use of "\s+", with the actuala mount of whitespace in
        the error messages.
      
      * Remove unnecessary "s/.../" lines from the matchignore block in init_file.
        You don't need those with "matchignore", only with "matchsubs".
      
      Aside from being tidier, these changes make the diffing significantly
      faster. There are less regular expressions to parse, and the remaining ones
      are faster to evaluate.
      c425899d
  32. 22 6月, 2016 1 次提交
    • D
      Avoid throwing an error in bfz_close() during aborting a transaction · 5f5ef9e0
      Daniel Gustafsson 提交于
      When bfz_close() is called in the codepath during the abortion of a
      transaction we must avoid throwing even more errors unless the
      situation calls for it. For bfz_close() it's fine to lower the
      ereport level to WARNING in this case. Longer term we should move
      this, and other, codepaths away from calling unlink() directly and
      instead use the API provided but this closes a current issue in
      ICG so better to close this immediately and refactor all callsites
      when having a clean ICG.
      5f5ef9e0
  33. 12 3月, 2016 1 次提交
    • J
      Add more installcheck test coverage for AO/CO tables. · 7d0325e4
      Jimmy Yih 提交于
      Most of these test additions are inspired from Pivotal's internal
      testing and needed to be added to the open source installcheck to
      give the community more test coverage on AO/CO tables.  This commit
      mostly adds extra coverage for indexes and partition tables.
      7d0325e4
  34. 11 3月, 2016 1 次提交
  35. 08 3月, 2016 1 次提交
  36. 05 12月, 2015 1 次提交
    • A
      Fix the case where VACUUM FULL on an appendonly table would cause its · 65400193
      Abhijit Subramanya 提交于
      auxiliary tables to not get shrunk and generate a notice to the user.
      
      The AppendOnlyCompaction_IsRelationEmpty() function incorrectly assumed that
      the column number for tupcount column was the same in pg_aoseg and pg_aocsseg
      tables. This cause it to incorrectly return true even when the CO relation was
      not empty. This method is used in vacuum to determine if the auxiliary
      relations need to be vacuumed. Due to the bug, vacuum would update the
      pg_aocsseg relation and vacuum it within the same transaction and hence
      generate the NOTICE that it can't shrink the relation because transaction is
      already in progress and would not shrink the relation.
      
      Also make sure that we do a vacuum on the auxiliary relations only in two cases :-
      1. Vacuum cleanup phase
      2. Relation is empty and we are in prepare phase
      Otherwise we will end up with the same issue above if some of the segments have
      zero rows
      65400193