1. 29 11月, 2018 7 次提交
    • N
      New pg_regress argument to hook the tests. · d1031682
      Ning Yu 提交于
      This is a preparation of an upcoming random ICG pipeline job.  In that
      pipeline job we will have tables created on different segments, so far
      the only way to hack this behavior is by using the gp_debug_numsegments
      extension.  We want to load and execute that extension for all the tests
      without modifying them directly, so a hook mechanism is needed.
      
      Added a pg_regress argument --prehook to set a hook.  A hook script
      should be put under src/test/regress/{sql,input}/hooks/ directory,
      depending on whether it need pg_regress to substitute the @@ tokens.  At
      most one hook can be specied.  It can be set via make command like this:
      
          # suppose there is src/test/regress/sql/hooks/hookname.sql
          make installcheck EXTRA_REGRESS_OPTS=--prehook=hookname
      d1031682
    • N
      New pg_regress argument to ignore plan diffs. · aea6d06d
      Ning Yu 提交于
      This is a preparation of an upcoming random ICG pipeline job.  In that
      pipeline job we will have tables created on different segments, so the
      plans might be different expectation.  In atmsort.pm there is already an
      argument -gpd_ignore_plans to ignore the plan diffs, but there was no
      equivalence in pg_regress.
      
      Added a pg_regress argument --ignore-plans to ignore plan diffs.  It can
      be enabled via make command like this:
      
          make installcheck EXTRA_REGRESS_OPTS=--ignore-plans
      aea6d06d
    • R
      Update test case for ORCA corresponding to 95f9d9ca · 799abec1
      Rahul Iyer 提交于
      799abec1
    • H
      Remove 'gpxlogloc' datatype. · 909d47d7
      Heikki Linnakangas 提交于
      The PostgreSQL 9.4 brought us the 'pg_lsn' datatype, which is functionally
      equivalent. And we weren't actually using 'gpxlogloc' for anything,
      anyway.
      909d47d7
    • A
      Remove gp_num_contents_in_cluster GUC. · 5f10a924
      Ashwin Agrawal 提交于
      Given the online gpexpand work, the gp_num_contents_in_cluster GUC is
      unused. So, delete the same from code to avoid confusions and eliminate this
      long argument required to start a postgres instance in gpdb.
      5f10a924
    • E
      Add tests for unnamed indexes on partitioned tables. · 83260744
      Ekta Khanna 提交于
      Co-authored-by: NShoaib Lari <slari@pivotal.io>
      83260744
    • E
      Avoid lookup for relationOid for QE if already exists · 7562e1ed
      Ekta Khanna 提交于
      Prior to this commit, we always did a lookup for the relationOid in QE.
      The original intent of the code was to avoid lookup for relationOid by QE if it
      has already been stashed in the IndexStmt.  This commit updates code for it
      along with `partition_locking` test expected file.
      Co-authored-by: NShoaib Lari <slari@pivotal.io>
      7562e1ed
  2. 28 11月, 2018 2 次提交
    • B
      Bump ORCA to v3.10.0 and update tests · cf0a74d5
      Bhuvnesh Chaudhary 提交于
      cf0a74d5
    • P
      Fix intermittent failure for select_into case · 9c21fd76
      Pengzhou Tang 提交于
      Recently, ICW case 'select_into' failed intermittently, the error looks like
      
      "CREATE TABLE selinto_schema.tmp3 (a,b,c)
       	   AS SELECT oid,relname,relacl FROM pg_class
       	   WHERE relname like '%c%';	-- OK
      ERROR:  role "30973" does not exist"
      
      The RCA is, case 'lock' should not run concurrently with 'select_into',
      a minirepo is:
      
      1. in 'lock' case
      CREATE TABLE lock_tbl1 (a BIGINT);
      CREATE ROLE regress_rol_lock1;
      GRANT UPDATE ON TABLE lock_tbl1 TO regress_rol_lock1;
      select oid, relname, relacl from pg_class where relname like '%c%';
        oid   |  relname  |                        relacl
      --------+-----------+-------------------------------------------------------
       180367 | lock_tbl1 | {gpadmin=arwdDxt/gpadmin,regress_rol_lock1=w/gpadmin}
      
      2. in 'select_into' test case
      CREATE TABLE selinto_schema.tmp3 (a,b,c)
             AS SELECT oid,relname,relacl FROM pg_class
             WHERE relname like '%c%';
      
      3. in 'lock' test case
      DROP TABLE lock_tbl1;
      DROP ROLE regress_rol_lock1;
      
      4. in 'select_into' test case
      select * from tmp3;
         a    |     b     |                     c
      --------+-----------+--------------------------------------------
       180367 | lock_tbl1 | {gpadmin=arwdDxt/gpadmin,180370=w/gpadmin}
      
      create table tmp4 as select * from tmp3;
      NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy entry.
      ERROR:  role "180370" does not exist
      
      In Step4, role 'regress_rol_lock1' has been dropped, so when it tries to parse
      acl, it finds role "180370" does not exist, the error seems to be reasonable.
      
      In upstream, test 'lock' doesn't run concurrently with 'select_into', I also notice
      that we have two 'lock' test within parallel_schedule, so remove the first one
      to match upstream and avoid the intermittent failure of 'select_into'
      9c21fd76
  3. 27 11月, 2018 2 次提交
  4. 26 11月, 2018 1 次提交
    • N
      Always create fully distributed tables · e90a3bdd
      Ning Yu 提交于
      In CREATE TABLE we used to decide numsegments from LIKE, INHERITS and
      DISTRIBUTED BY clauses.  However we do not want partially distributed
      tables to be created by end users, so change the logic to always create
      tables with DEFAULT as numsegments.  We still allow developers to hack
      the DEFAULT numsegments with the gp_debug_numsegments extension.
      e90a3bdd
  5. 25 11月, 2018 1 次提交
    • D
      Remove /static/ from PostgreSQL doc links · ac753aed
      Daniel Gustafsson 提交于
      Commit 17f9b7f070dbe17b2844a8b4dd428 in the pgweb repository removed
      the /static/ portion on all doc URLs, leaving a redirect in place. To
      avoid incurring a needless redirect, remove the /static/ part in the
      links to the PostgreSQL documentation.
      
      The /static/ URLs stem from a time when there were interactive docs
      that had comment functionality. These docs were removed a very long
      time ago, but the static differentiator was left in place until now.
      
      Reviewed-by: Mel Kiyama
      ac753aed
  6. 24 11月, 2018 1 次提交
  7. 23 11月, 2018 5 次提交
    • B
      Revert `Remove Master/Standby SyncRepWait Greenplum hack` and two other relevent commit (#6307) · b8f98fca
      BaiShaoqi 提交于
      Here revert four commits
      
      1. Remove Master/Standby SyncRepWait Greenplum hack: 7f6066ea
      2. Add alter system synchronous_standby_names to * when gpinitstandby -n 1136f2fb
      3. Hot fix in gpinitstandby behave test failure: b6c77b2f
      4. Remove unused variables, to silence compiler warnings 88a185a5
      
      The first commit should revert, because gpinitstandby did not change correctly comform to the this commit.
      It will cause If the standby is down, and the synchronous_standby_names is *, the cluster will not start and hang on
      
      The second, third, fourth should revert, because it is relavent to the first commit.
      b8f98fca
    • P
      Fix a bug of replicated table · 0e461e16
      Pengzhou Tang 提交于
      Previously, when creating join path between CdbLocusType_SingleQE path
      and CdbLocusType_SegmentGeneral path, we always add a motion on top
      of CdbLocusType_SegmentGeneral path so even the join path is promoted
      to executed on QD, the CdbLocusType_SegmentGeneral path can still be
      executed to segments.
                           join (CdbLocusType_SingleQE)
      					/    \
                         /      \
      CdbLocusType_SingleQE     Gather Motion
                                  \
                                CdbLocusType_SegmentGeneral
      
      For example:
      (select * from partitioned_table limit 1) as t1
      Nested Loop
          ->  Gather Motion 1:1
      	     ->  Seq Scan on replicated_table
          ->  Materialize
      		 ->  Subquery Scan on t1
      		    ->  Limit
      			   ->  Gather Motion 3:1
      	               ->  Limit
      		               ->  Seq Scan on partitioned_table
      replicated_table only store tuples on segments, so without
      the gather motion, seq scan of replicated_table doesn't
      provide tuples.
      
      There is another problem, if join path is not promoted to
      QD, the gather motion might be redundant, For example:
      
        (select * from replicated_table, (select * from
        partitioned_table limit 1) t1) sub1;
      
      Gather Motion 3:1
        -> Nested Loop
            ->  Seq Scan on partitioned_table_2
            ->  Materialize
                ->  Broadcast Motion 1:3
                    -> Nested Loop
                       ->  Gather Motion 1:1 (redundant motion)
      	                 ->  Seq Scan on replicated_table
                    ->  Materialize
      		         ->  Subquery Scan on t1
      		            ->  Limit
      			             ->  Gather Motion 3:1
      	                       ->  Limit
      		                      ->  Seq Scan on partitioned_table
      
      So in apply_motion_mutator(), we omit such redundant motion if
      it's not gathered to top slice (QD). sliceDepth == 0 means it
      is top slice, however, sliceDepth now is shared by both init
      plans and main plan, so if main plan increased the sliceDepth,
      init plan may omit the gather motion unexpectedly which create
      a wrong results.
      
      The fix is simple to reset sliceDepth for init plans
      0e461e16
    • P
      update test with new syntax · 64cdd47c
      Pengzhou Tang 提交于
      64cdd47c
    • N
      Fix numsegments when appending multiple SingleQEs · fa86f160
      Ning Yu 提交于
      When Append node contains SingleQE subpath we used to put Append on ALL
      the segments, however if the SingleQE is partially distributed then
      apparently we could not put the SingleQE on ALL the segments, this
      conflict could results in runtime or incorrect results.
      
      To fix this we should put Append on SingleQE's segments.
      
      In the other hand when there are multiple SingleQE subpaths we should
      put Append on the common segments of SingleQEs.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      fa86f160
    • N
      Reduce differences between reshuffle tests · 2eef2ba2
      Ning Yu 提交于
      There are 3 reshuffle tests, the ao one, the co one, and the heap one.
      They share almost the same cases, but different on table names and
      create table options.  There are also some differences caused when
      adding regression tests, they are only added in one file but not others.
      
      We want to keep minimal differences between these tests, so we ensure
      that a regression test for ao also covers similar case for heap.  And
      once we understand one of the test file we have almost the same
      knowledge on the others.
      
      Here is a list of changes to these tests:
      - reduce differences on table names by using schema;
      - reduce differences on CREATE TABLE options by setting default storage
        options;
      - simplify the creation of partially distributed tables by using the
        gp_debug_numsegments extension;
      - copy some regression tests to all the tests;
      - retire the no longer used helper function;
      - move the tests into an existing parallel test group;
      
      pg_regress test framework provides some @@ tokens for ao/co tests,
      however we still can not merge the ao and co tests into one file as
      WITH (OIDS) is only supported by ao but not co.
      2eef2ba2
  8. 22 11月, 2018 7 次提交
    • H
      Pick a smarter Hashed locus for LEFT and RIGHT JOINs. · 3d6c78c9
      Heikki Linnakangas 提交于
      When determining the locus for a LEFT or RIGHT JOIN, we can use the outer
      side's distribution key as is. The EquivalenceClasses from the nullable
      side are not of interest above the join, and the outer side's distribution
      key can lead to better plans, because it can be made a Hashed locus,
      rather than HashedOJ. A Hashed locus can be used for grouping, for
      example, unlike a HashedOJ.
      
      This buys back better plans for some INSERT and CTAS queries, that started
      to need Redistribute Motions after the previous commit.
      Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
      3d6c78c9
    • H
      Fix confusion with distribution keys of queries with FULL JOINs. · a25e2cd6
      Heikki Linnakangas 提交于
      There was some confusion on how NULLs are distributed, when CdbPathLocus
      is of Hashed or HashedOJ type. The comment in cdbpathlocus.h suggested
      that NULLs can be on any segment. But the rest of the code assumed that
      that's true only for HashedOJ, and that for Hashed, all NULLs are stored
      on a particular segment. There was a comment in cdbgroup.c that said "Or
      would HashedOJ ok, too?"; the answer to that is "No!". Given the comment
      in cdbpathlocus.h, I'm not suprised that the author was not very sure
      about that. Clarify the comments in cdbpathlocus.h and cdbgroup.c on that.
      
      There were a few cases where we got that actively wrong. repartitionPlan()
      function is used to inject a Redistribute Motion into queries used for
      CREATE TABLE AS and INSERT, if the "current" locus didn't match the target
      table's policy. It did not check for HashedOJ. Because of that, if the
      query contained FULL JOINs, NULL values might end up on all segments. Code
      elsewhere, particularly in cdbgroup.c, assumes that all NULLs in a table
      are stored on a single segment, identified by the cdbhash value of a NULL
      datum. Fix that, by adding a check for HashedOJ in repartitionPlan(), and
      forcing a Redistribute Motion.
      
      CREATE TABLE AS had a similar problem, in the code to decide which
      distribution key to use, if the user didn't specify DISTRIBUTED BY
      explicitly. The default behaviour is to choose a distribution key that
      matches the distribution of the query, so that we can avoid adding an
      extra Redistribute Motion. After fixing repartitionPlan, there was no
      correctness problem, but if we chose the key based on a HashedOJ locus,
      there is no performance benefit because we'd need a Redistribute Motion
      anyway. So modify the code that chooses the CTAS distribution key to
      ignore HashedOJ.
      
      While we're at it, refactor the code to choose the CTAS distribution key,
      by moving it to a separate function. It had become ridiculously deeply
      indented.
      
      Fixes https://github.com/greenplum-db/gpdb/issues/6154, and adds tests.
      Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
      a25e2cd6
    • H
      Add tests for deriving distribution keys from query in CREATE TABLE AS. · 9457fe71
      Heikki Linnakangas 提交于
      The case where some, but not all, of the query's distribution keys were
      present in the result set, was not covered by any existing tests.
      
      Per Paul Guo's observation.
      9457fe71
    • Z
      Remove dead code in ATExecSetDistributedBy · 50f2e3bb
      Zhenghua Lyu 提交于
      This commit is the first step to refactor ATExecSetDistributedBy. Its
      main purpose is to remove some dead code in this function and during
      the process we find some helper functions can also be simplified so
      the simplification is also in this commit.
      
      According to MPP-7770, we should disable changing storage options for now.
      It is ugly to just throw an error when encounter `appendonly` option but
      without removing the code. In this commit remove all related logic.
      
      Because of with clause can only contain reshuffle|reorganize, we only
      new_rel_opts if the table itself is ao|aoco. No need to deduce it from with
      clause.
      
      We also remove the unnecessary checks at the start of this function. Because
      These checks have been already done in the function `ATPrepCmd`.
      
      Co-authored-by: Shuejie Zhang <shzhang@pivotal.io >
      50f2e3bb
    • J
      Remove Master/Standby SyncRepWait Greenplum hack · 7f6066ea
      Jimmy Yih 提交于
      When the standby master is unavailable, the master will not block on commits
      even though we enable synchronous replication. This is because we have a
      Greenplum hack which checks if the WAL stream with the standby master is
      valid. If the stream is invalid, the master will quickly skip the SyncRepWait
      and continue on its commit.
      
      Remove this hack in order to make Master/Standby and Primary/Mirror WAL
      replication more similar.
      Co-authored-by: NShaoqi Bai <sbai@pivotal.io>
      7f6066ea
    • N
      Use max numsegments of subpaths for Append node · 1b2f7bcd
      Ning Yu 提交于
      Suppose t1 has numsegments=1 and t2 has numsegments=2, then below query
      will have incorrect plan:
      
          explain (costs off) select * from t2 a join t2 b using(c2)
                    union all select * from t1 c join t1 d using(c2);
                                         QUERY PLAN
          ------------------------------------------------------------------------
           Gather Motion 1:1  (slice3; segments: 1)
             ->  Append
                   ->  Hash Join
                         Hash Cond: (a.c2 = b.c2)
                         ->  Redistribute Motion 2:2  (slice1; segments: 2)
                               Hash Key: a.c2
                               ->  Seq Scan on t2 a
                         ->  Hash
                               ->  Redistribute Motion 2:2  (slice2; segments: 2)
                                     Hash Key: b.c2
                                     ->  Seq Scan on t2 b
                   ->  Hash Join
                         Hash Cond: (c.c2 = d.c2)
                         ->  Seq Scan on t1 c
                         ->  Hash
                               ->  Seq Scan on t1 d
           Optimizer: legacy query optimizer
          (17 rows)
      
      slice2 has a 2:2 redistribute motion to slice3, however slice3 only has
      1 segment, this is due to Append's numsegments is decided from the last
      subpath.
      
      To fix the issue we should use max numsegments of subpaths for Append.
      
      The issue was already fixed in 39856768,
      we are only adding tests for it now.
      1b2f7bcd
    • N
      New extension to debug partially distributed tables · 3119009a
      Ning Yu 提交于
      Introduced a new debugging extension gp_debug_numsegments to get / set
      the default numsegments when creating tables.
      
      gp_debug_get_create_table_default_numsegments() gets the default
      numsegments.
      
      gp_debug_set_create_table_default_numsegments(text) sets the default
      numsegments in text format, valid values are:
      - 'FULL': all the segments;
      - 'RANDOM': pick a random set of segments each time;
      - 'MINIMAL': the minimal set of segments;
      
      gp_debug_set_create_table_default_numsegments(integer) sets the default
      numsegments directly, valid range is [1, gp_num_contents_in_cluster].
      
      gp_debug_reset_create_table_default_numsegments(text) or
      gp_debug_reset_create_table_default_numsegments(integer) reset the
      default numsegments to the specified value, and the value can be reused
      later.
      
      gp_debug_reset_create_table_default_numsegments() resets the default
      numsegments to the value passed last time, if there is no previous call
      to it the value is 'FULL'.
      
      Refactored ICG test partial_table.sql to create partial tables with this
      extension.
      3119009a
  9. 20 11月, 2018 4 次提交
    • D
      Remove limitation on duplicate col aliases in CTE · 2c5116dc
      Daniel Gustafsson 提交于
      There is no restriction in the SQL specification regarding duplicate
      column aliases in a CTE. Remove the specific check and align us with
      how queries work in upstream. Also fix test fallout to match the new
      behavior with slightly tweaked queries to keep output volume down.
      2c5116dc
    • N
      Fix data lost during data reorganization · 1c0707b9
      Ning Yu 提交于
      A temp table is created during data reorganization, its numsegments must
      be the same with the original table, otherwise some data will be lost
      after the reorganization.
      Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
      Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
      Reviewed-by: NShujie Zhang <shzhang@pivotal.io>
      1c0707b9
    • B
      Update alternate expectfile for select_view (#6259) · eb097d27
      BaiShaoqi 提交于
      Commit 4c54c894 eliminates our divergence in deparsing filter
      expressions mostly by adding parentheses. However only two out of four
      expectfiles were updated, mostly likely because the author didn't run
      alternate systems (i.e. macOS), . This leads to one of those
      duh-it-fails-on-my-mac failures that can be easily fixed.
      
      This commit brings the two expectfiles up-to-date.
      Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
      
      Reviewed by Paul Guo, Heikki  Linnakangas 
      eb097d27
    • H
      Fix calculation of AO table's reltuples in new ANALYZE. · 287ac29b
      Heikki Linnakangas 提交于
      The total number of rows in table, stored in pg_class.reltuples, was
      calculated incorrectly. It was set to the number of rows in the sample,
      not the (estimated?) total number of rows in the table. We didn't notice,
      because most of the existing tests use small tables, so that the whole
      table fit in the sample. Put back code like we used to have, for
      collecting the totals.
      
      This papers over the assertion failure we're seeing in the 'aocs' test in
      the concourse pipeline, with ORCA, by changing the plan to one that doesn't
      crash. It needs a proper fix, of course, but we're tracking that separately
      as https://github.com/greenplum-db/gpdb/issues/6274.
      287ac29b
  10. 19 11月, 2018 5 次提交
    • H
      Use block-level sampling for ANALYZE on heap tables. · baa76536
      Heikki Linnakangas 提交于
      The user-visible feature here is to speed up ANALYZE on heap tables, by
      using the block-level sampling method from PostgreSQL, instead of the
      scanning the whole table. This refactors the ANALYZE code, to make it
      possible to call the upstream function to do the block-level sampling
      again.
      
      A secondary feature is that this enables the ANALYZE callback for Foreign
      Data Wrappers, introduced in PostgreSQL 9.2. It was disabled in the 9.2
      merge, because the function signature had been changed in GPDB, making
      it cumbersome to call it. This patch reverts acquire-function's signature
      to the same as in upstream, so the FDW callback now works again.
      
      This also moves the logic to decide which rels to analyze, based on
      optimizer_analyze_root_partition or optimizer_analyze_midlevel_partition
      GUCs and the ROOTPARTITION option, wholly to get_rel_oids() in vacuum.c.
      It was mostly done there already, but there were some warts in analyze.c.
      None of the code in analyze.c should now treat partitions differently
      from normal inherited tables, which helps to minimize the diff vs
      upstream. (Except that the HLL code still heavily uses partition-related
      functions. That's a TODO.)
      baa76536
    • H
      Fix test case for changed data distribution with Jump Consistent hash. · da0f7a74
      Heikki Linnakangas 提交于
      Commit 4a174240 changed the hash method, which changed the data
      distribution of data. It adjusted many test cases that were dependent on
      the data distribution, but missed this one. This one isn't failing at the
      moment, but it was clearly mean to look at the relfrozenxid of the segment
      that it inserts all the data to. Change the value of the distribution key
      so that all the data falls into segment 0 again.
      da0f7a74
    • H
    • Z
      Correct behavior for reshuffling partition tables · 8bf413d6
      ZhangJackey 提交于
      The previous code makes UPDATE statement for root
      and its children partitions when we reshuffle a partition
      table. It not only involves redundant work but also will
      lead to an error while reshuffling a two-level partition
      table(because the mid-level partitions have no data).
      
      The commit does the following work:
      
      * Only make UPDATE statement for leaf partition or
         non-partition table.
      * Refactor the reshuffle test cases. We remove the
         python udf code and use `gp_execute_on_server`
         and `gp_dist_random` to test replicated table.
      
      Co-authored-by: Shujie Zhang shzhang@pivotal.io
      Co-authored-by: Zhenghua Lyu zlv@pivotal.io
      8bf413d6
    • A
      Add support for executing foreign tables on master, any or all segments · 3c6c6ab2
      Adam Lee 提交于
      This commit adds the support and option of `mpp_execute 'MASTER | ANY |
      ALL SEGMENTS'` for foreign tables.
      
      MASTER is the default, FDW requests for data from master.
      
      ANY, FDW requests for data from master or one any segment, depends on
      which path costs less.
      
      ALL SEGMENTS, FDW requests for data from all segments, wrappers need to
      have a policy matching the segments to data.
      
      For instance, file_fdw probes the mpp_execute vaule, then load different
      files based on the segment number. But something like gpfdist on the
      foreign side doesn't need this, which hands out a different slice of the
      data to each request, all segments could request the same location.
      3c6c6ab2
  11. 18 11月, 2018 1 次提交
  12. 17 11月, 2018 1 次提交
  13. 16 11月, 2018 3 次提交