1. 24 9月, 2019 3 次提交
    • H
      Omit slice information for SubPlans that are not dispatched separately. · 96c6d318
      Heikki Linnakangas 提交于
      Printing the slice information makes sense for Init Plans, which are
      dispatched separately, before the main query. But not so much for other
      Sub Plans, which are just part of the plan tree; there is no dispatching
      or motion involved at such SubPlans. The SubPlan might *contain* Motions,
      but we print the slice information for those Motions separately. The slice
      information was always just the same as the parent node's, which adds no
      information, and can be misleading if it makes the reader think that there
      is inter-node communication involved in such SubPlans.
      96c6d318
    • A
      Avoid gp_tablespace_with_faults test failure by pg_switch_xlog() · efd76c4c
      Ashwin Agrawal 提交于
      gp_tablespace_with_faults test writes no-op record and waits for
      mirror to replay the same before deleting the tablespace
      directories. This step fails sometime in CI and causes flaky
      behavior. The is due to existing code behavior in startup and
      walreceiver process. If primary writes big (means spanning across
      multiple pages) xlog record, flushes only partial xlog record due to
      XLogBackgroundFlush() but restarts before commiting the transaction,
      mirror only receives partial record and waits to get complete
      record. Meanwhile after recover, no-op record gets written in place of
      that big record, startup process on mirror continues to wait to
      receive xlog beyond previously received point to proceed further.
      
      Hence, as temperory workaround till the actual code problem is not
      resolved and to avoid failures for this test, switch xlog before
      emitting no-op xlog record, to have no-op record at far distance from
      previously emitted xlog record.
      efd76c4c
    • J
      Fix CTAS with gp_use_legacy_hashops GUC · 9040f296
      Jimmy Yih 提交于
      When gp_use_legacy_hashops GUC was set, CTAS would not assign the
      legacy hash class operator to the new table. This is because CTAS goes
      through a different code path and uses the first operator class of the
      SELECT's result when no distribution key is provided.
      9040f296
  2. 23 9月, 2019 2 次提交
    • Z
      Make estimate_hash_bucketsize MPP-correct · d6a567b4
      Zhenghua Lyu 提交于
      In Greenplum, when estimating costs, most of the time we are
      in a global view, but sometimes we should shift to a local
      view. Postgres does not suffer from this issue because everything
      is in one single segment.
      
      The function `estimate_hash_bucketsize` is from postgres and
      it plays a very important role in the cost model of hash join.
      It should output a result based on locally view. However, the
      input parameters like, rows in a table, and ndistinct of the
      relation, are all taken from a global view (from all segments).
      So, we have to do some compensation for it. The logic is:
        1. for broadcast-like locus, the global ndistinct is the same
           as the local one, we do the compensation by `ndistinct*=numsegments`.
        2. for the case that hash key collcated with locus, on each
           segment, there are `ndistinct/numsegments` distinct groups, so
           no need to do the compensation.
        3. otherwise, the locus has to be partitioned and not collocated with
           hash keys, for these cases, we first estimate the local distinct
           group number, and then do do the compensation.
      Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
      d6a567b4
    • H
  3. 21 9月, 2019 1 次提交
    • H
      Enable Init Plans in queries executed locally in QEs. · 98c8b550
      Heikki Linnakangas 提交于
      I've been wondering for some time why we have disabled constructing Init
      Plans in queries that are planned in QEs, like in SPI queries that run in
      user-defined functions. So I removed the diff vs upstream in
      build_subplan() to see what happens. It turns out it was because we always
      ran the ExtractParamsFromInitPlans() function in QEs, to get the InitPlan
      values that the QD sent with the plan, even for queries that were not
      dispatched from the QD but planned locally. Fix the call in InitPlan to
      only call ExtractParamsFromInitPlans() for queries that were actually
      dispatched from the QD, and allow QE-local queries to build Init Plans.
      
      Include a new test case, for clarity, even though there were some existing
      ones that incidentally covered this case.
      98c8b550
  4. 20 9月, 2019 3 次提交
  5. 19 9月, 2019 5 次提交
  6. 17 9月, 2019 1 次提交
    • W
      Single set command failed, rollback guc value · 73cab2cf
      Weinan WANG 提交于
      * Single set command failed, rollback guc value
      
      In gpdb, guc set flow is that:
      	1. QD set guc
      	2. QD dispatch job to all QE
      	3. QE set guc
      
      For single set command, in gpdb, it is not 2pc safe. If the set command
      failed in QE, only QD guc value can rollback. For the guc which has
      GUC_GPDB_NEED_SYNC flag, it requires guc value is same in the whole session-level.
      
      To deal with it, record rollback guc in AbortTransaction to a restore
      guc list. Re-set these guc when next query coming.
      However, if it failed again, destroy all QE, since we can not have the same
      value in that session. Hopefully, guc value can synchronize success in
      later creategang stage using '-c' command.
      73cab2cf
  7. 13 9月, 2019 1 次提交
    • A
      Replace wait_for_trigger_fault() with gp_wait_until_triggered_fault() · 854233bb
      Ashwin Agrawal 提交于
      Old wait_for_trigger_fault() in setup.sql is no more
      needed. gp_wait_until_triggered_fault() now provides the same
      functionality in much better shape. Hence, deleting
      wait_for_trigger_fault().
      
      Replacing only existing usage of wait_for_trigger_fault() with
      gp_wait_until_triggered_fault().
      854233bb
  8. 12 9月, 2019 8 次提交
    • A
      Avoid flakiness for vacuum_drop_phase_ao test · 84461b97
      Ashwin Agrawal 提交于
      The "0U: End" should only be executed after vacuum has reached waiting
      for lock. If executed before then vacuum will not wait for the lock
      and invalidates the test, means makes it flaky as will fail with
      following diff
      
      --- \/tmp\/build\/e18b2f02\/gpdb_src\/src\/test\/isolation2\/expected\/vacuum_drop_phase_ao\.out    2019-09-05 00:14:41.580197372 +0000
      +++ \/tmp\/build\/e18b2f02\/gpdb_src\/src\/test\/isolation2\/results\/vacuum_drop_phase_ao\.out    2019-09-05 00:14:41.580197372 +0000
      @@ -32,11 +32,12 @@
       DELETE 4
       -- We should see that VACUUM blocks while the QE holds the access shared lock
       1&: VACUUM ao_test_drop_phase;  <waiting ...>
      +FAILED:  Forked command is not blocking; got output: VACUUM
      
       0U: END;
       END
       1<:  <... completed>
      -VACUUM
      +FAILED:  Execution failed
      
      This happens because "&:" in isolation2 framework instructs that step
      to be run in background, means the next step in separate session will
      get executed in parallel with this step.
      
      To resolve the situation adding helper
      wait_until_waiting_for_required_lock() to check if query has reached
      the intended blocking point, which is waiting for lock to be
      granted. Only after this state is reached, we wish to execute next
      command to unblock the same.
      
      Currently, the isolation2 framework lacks to provide direct support
      for such blocking behavior. Hence, for short term I feel adding the
      common helper function is good step. In long term we can similar to
      isolation framework add something simple and generic for this. Though
      I like the explicit checking for exact lock type and relation
      etc.. the current helpful function provides.
      84461b97
    • J
      Optimize the cost model of multi-stage AGG · 9936ca3b
      Jinbao Chen 提交于
      There are some problems on the old multi-stage agg cost model:
      1. We use global group number and global workmemmory to estimate
      the number of spilled tuple. But the situation between the first
      stage agg and the second stage agg is completely different.
      Tuples are randomly distributed on the group key on first stage
      agg. The number of groups on each segment is almost equal to the
      number of global groups. But in second stage agg, Distribution key
      is a subset of group key, so the number of groups on each segment
      is equal to (number of global groups / segment number). So the lld
      code can cause huge cost deviation.
      2. Using ((group number + input rows) / 2) as spilled tupule is
      3. Using global group number * 1.3 as the output rows of streaming
      agg node is very wrong. The out put row of streaming agg node
      should be group number * segment number * param.
      too rough.
      4. We use numGroups to estimate the initial size of the hash table
      in exec node. But numGroups is global group number.
      
      So we made the following changes:
      1. Use a funtion 'groupNumberPerSegemnt' to estimate the group
      number per segment on first stage agg. Use numGroups/segment number
      as the group number per segment on second stage agg.
      2. Use funtion 'spilledGroupNumber' to estimate spilled tuple number.
      3. Use spilled tuple number * segment number as output tuple number
      of streaming agg node.
      4. Use numGroups as group number per segment.
      
      Also, we have information on the number of tuples in top N groups. So we
      can predict the maximum number of tuples in the biggest segment
      when the skew occurs. When we can predict skew, enable the 1 phase agg.
      Co-authored-by: NZhenghua Lyu <kainwen@gmail.com>
      9936ca3b
    • N
      Do not hard code distrib values in partition deadlock tests · a53169ce
      Ning Yu 提交于
      The partition deadlock tests use hard coded distribution values to form
      waiting relations on different segments, we can't easily tell whether
      two rows are on the same segment or not.  Even worse, hard coded values
      are only correct when the cluster has default size (count of primaries)
      and uses default hash & reduce methods.
      
      In GDD tests we should use a helper function, segid(segid, nth), it
      returns the nth value on segment segid.  It's easier to design and
      understand the tests with it.
      
      Also put more rows to the testing tables, so segid() could always return
      a valid row.
      a53169ce
    • N
      Fix flaky partition deadlock tests · 09d82f2e
      Ning Yu 提交于
      To trigger a deadlock we need to construct several waiting relations,
      once the last waiting relation is formed the deadlock is detectable by
      the deadlock detector.  In update-deadlock-root-leaf-concurrent-op and
      delete-deadlock-root-leaf-concurrent-op we used to use `2&:` for the
      last waiting relation, the isolation2 framework will check that the
      query blocks, that is, it does not return a result in 0.5 seconds.
      However it's possible that the deadlock detector is triggered just
      within that 0.5 seconds, so the isolation2 framework will report a
      failure which makes the tests flaky.  To make these tests deterministic
      we should use `2>:` for the last waiting query, it puts the query
      background without checking.
      09d82f2e
    • A
      Revert "Avoid flakiness for vacuum_drop_phase_ao test" · 19182958
      Ashwin Agrawal 提交于
      This reverts commit 265bc393. Need to
      push the version of commit which has movement of the function to setup
      and avoid including the server_helpers.sql in these two tests. Will
      push fresh commit with that change.
      19182958
    • A
      27ba026f
    • A
      Replace "@gpcurusername@" with "@curusername@" · 8bfa2d6f
      Ashwin Agrawal 提交于
      8bfa2d6f
    • A
      Avoid flakiness for vacuum_drop_phase_ao test · 265bc393
      Ashwin Agrawal 提交于
      Add helper wait_until_waiting_for_required_lock() to check if query
      has reached the intended blocking point, which is waiting for lock to
      be granted. Only after this state is reached, we wish to execute next
      command to unblock the same.
      
      Currently, the isolation2 framework lacks to provide direct support
      for such blocking behavior. Hence, for short term I feel adding the
      common helper function is good step. In long term we can similar to
      isolation framework add something simple and generic for this. Though
      I like the explicit checking for exact lock type and relation
      etc.. the current helpful function provides.
      265bc393
  9. 11 9月, 2019 2 次提交
  10. 10 9月, 2019 1 次提交
    • A
      Avoid extra argument to pg_regress for row and column file generation · 3ff02a8f
      Ashwin Agrawal 提交于
      If "./pg_regress --init-file init_file uao_ddl/compresstype_row" is
      executed, it fails. As extra argument `--ao-dir=uao` needs to be
      specified to pg_regress, to convey, to convert these .source files to
      row and column .sql/.out files and not use regular standard logic on
      them. This approach has following shortfalls:
      
      - additional developer overhead to remember to add that option
      - without that option .sql/.out files are still generated but are not
        usable / incorrect
      - row and column test files directories will always need same prefix
      - the check for ao-dir prefix was checked for each and every file in
        input/output directories, which is unnecessary
      
      To improve the situation, modifying the logic in pg_regress. No more
      need extra argument to pg_regress, instead, presence of special file
      named "GENERATE_ROW_AND_COLUMN_FILES" conveys to pg_regress to apply
      special conversion rule on that directory. This way always correct
      conversion rule is applied and developer don't need to specify any
      extra option.
      3ff02a8f
  11. 06 9月, 2019 1 次提交
    • A
      Speedup test string substitutions from .source to .sql/.out files · c8907e82
      Ashwin Agrawal 提交于
      In GPDB, gpstringsubs.pl is used to replace tokens in .source test
      files when creating .sql/.out files for pg_regress. pg_regress itself
      also performs the substitutions. The order is pg_regress performs
      substitutions and after that remaining ones are performed by
      gpstringsubs.pl if required.
      
      The logic to check if there are still tokens remaining to be replaced
      after pg_regress has performed replacement was based on checking if
      '@' character is present in file or not. If '@' is present in file
      then gpstringsubs.pl is invoked for that file. gpstringsubs.pl is
      super slow and takes long time. In many .source files keywords exist
      like `@Description` or `@db_name` (specially for UAO related tests),
      which don't need any substitution. But due to presence of these '@'
      charater keywords, gpstringsubs.pl was invoked for such files
      unnecessarily, causing the slow down.
      
      So, modifying the check to invoke gpstringsubs.pl only on presence of
      "@gp" keyword instead. So, now any token replacements handled by
      gpstringsubs.pl should start with "@gp" in the name. Hence, renamed
      `@syslocale@` to "@gp_syslocale@".
      
      Also, moved the logic to perform replacement for `@hostname@` and
      `@gpcurusername@` to pg_regress itself, as it's much faster and easier
      to do the same in C.
      
      Given the whole testsuite takes very long time in GPD. Need to iterate
      on single test routinely. These substitutions are invoked on every run
      of pg_regress (mainly isolation2 also), hence helps to reduce the
      iteration time for running single test in test suite during
      development, along with helping cut down pg_regress file conversion
      time in CI. For regress directory cuts down time from 8 secs and
      isolation2 from 25 secs to few msecs.
      
      In long run we should complete move all the logic in gpstringsubs.pl
      to pg_regress and kill gpstringsubs.pl.
      c8907e82
  12. 05 9月, 2019 4 次提交
    • P
      Fix answer file of test sql/resgroup/resgroup_unassign_entrydb · 89c55b08
      Paul Guo 提交于
      ci pipeline uses two segment nodes for resgroup testing which is different than
      ususal three segment node test configuration. Modifying the test to make it
      independent of segment numbers.
      89c55b08
    • P
    • P
      Fix potential shared memory corruption in resource group slot management when... · 8913827b
      Paul Guo 提交于
      Fix potential shared memory corruption in resource group slot management when query involves entrydb.
      
      We've observed that when we terminate a query that involves entrydb, if QD
      detaches from a resource group before the entrydb backend does that in
      UnassignResGroup(), data corruption on shared slot pool could happen.  In a
      cassert enabled gpdb version, you could see below FATAL message as below when
      terminating the query using pg_terminate_backend().
      
      FATAL:  Unexpected internal error (resgroup.c:1545)
      DETAIL:  FailedAssertion("!(slot->nProcs == 0)", File: "resgroup.c", Line: 1545)
      HINT:  Process 9903 will wait for gp_debug_linger=120 seconds before termination.
      Note that its locks and other resources will not be released until then.
      
      While on product release that without cassert enabled some subsequent queries
      could lead to panic.
      
      We fix this by let the final process (either QD or entrydb process) do clean up
      when the nProcs reference number is zero. This is more robust and is less bug
      prone cross future code change and upstream merge.
      
      This patch refactors the related resource code a bit and also moves the fault
      inject related negative tests ahead so that we could capture some potential
      errors which happen later (typically the case in this patch).
      
      Besides, it fixes another panic caused by potential MySessionState NULL
      resetting (see code comment for details). That was revealed when running tests
      in my dev machine so it's a potential flaky testing part.
      Co-authored-by: NPaul Guo <pguo@pivotal.io>
      Co-authored-by: NNing Yu <nyu@pivotal.io>
      Co-authored-by: NGang Xiong <gxiong@pivotal.io>
      8913827b
    • S
      Port: Test to prevent out-of-sync AO segfile state between QD & QE · 795b146d
      Soumyadeep Chakraborty 提交于
      This test was added to 5X_STABLE as a part of aee8cac8.
      795b146d
  13. 30 8月, 2019 4 次提交
    • D
      Extend Travis pipeline to a build matrix · de256d12
      Daniel Gustafsson 提交于
      This extends our coverage by defining a build matrix in the Travis CI
      pipeline rather than just having a single job. The idea is to build
      PRs against combinations of common platforms and compilers in order
      to catch compiler errors and warnings early. The gist of the changes
      to the previous CI configuration is:
      
        - static configuration is replaced by per build config in a matrix
        - ccache is removed to avoid caching issues
        - macOS is again built on
        - the version printing is more selective to only print versions
          for binaries built in every job
        - notifications are removed to minimize spamming
        - builds with various un-orthodox flags are supported
        - build silently to make output more manageable
        - parts of the regression tests are executed via new installcheck
          target in src/test/regress
      
      Future tasks is to extend this to more combinations and also to build
      clientside tooling on Windows as well as extended the test coverage
      to include more parts of ICW.
      Reviewed-by: NAsim R P <apraveen@pivotal.io>
      de256d12
    • J
      Fix trailing space diff in gporca.out · c8857fe9
      Jesse Zhang 提交于
      This must have been a typo committed in commit 6b79a578 . The
      original commit missed a trailing space in the output file after a newly
      added comment (which causes a regress diff when optimizer=off) I was
      going to fix it by adding the trailing space back, but on a second
      thought, I decided to gut the comment from all 3 files (one .sql and two
      .out files) to prevent future self-loathing.
      
      While we were at it, also remove the ever-expanding list of DROP
      statements that are ignored from the end of gporca.out. Honestly the
      `DROP` itself kinda offends me, but that's for another day. Let's get
      the build green for now.
      c8857fe9
    • P
      Fix flaky test pg_views_concurrent_drop. (#8526) · dd44426d
      Paul Guo 提交于
      It should proceed only after the 'forked' connection is really blocking at the
      lock. & means blocking but the test framework has no way to really know the sql is blocking at lock waiting so test case should take the responsibility. 
      Reviewed-by: NAsim R P <apraveen@pivotal.io>
      dd44426d
    • B
      Consider predicate of type <cast(ident)> array cmp <const array> · 6b79a578
      Bhuvnesh Chaudhary 提交于
      While processing constraint interval, also consider predicate of type
      <cast(ident)> array cmp else we lose the opportunity
      to generate implied quals. This is corresponding to the ORCA changes.
      6b79a578
  14. 29 8月, 2019 4 次提交
    • A
      Cleanup .gitignore files for regress (sub)directory · 37c1ce5b
      Ashwin Agrawal 提交于
      - arrange them alphabetically
      - separate upsteam from GPDB specific ones
      - add few missing ones
      - also reflect to ignore only in current directory or sub-directories
      37c1ce5b
    • C
      Fix crash when the qual under DynamicTableScan has a subplan · 81a671f6
      Chris Hajas 提交于
      Consider the query below:
      
      ```
      test=# explain select * from foo, jazz where foo.c = jazz.e and jazz.f = 10 and a in (select b+1 from bar);
      						 QUERY PLAN
      --------------------------------------------------------------------------------------------------------------------
      Gather Motion 3:1  (slice4; segments: 3)  (cost=0.00..1324469.30 rows=1 width=20)
      ->  Hash Join  (cost=0.00..1324469.30 rows=1 width=20)
           Hash Cond: foo.c = jazz.e
           ->  Dynamic Table Scan on foo (dynamic scan id: 1)  (cost=0.00..1324038.29 rows=34 width=12)
      	   Filter: a = (b + 1) AND ((subplan))
      	   SubPlan 1
      	     ->  Materialize  (cost=0.00..431.00 rows=1 width=4)
      		   ->  Broadcast Motion 1:3  (slice2)  (cost=0.00..431.00 rows=3 width=4)
      			 ->  Limit  (cost=0.00..431.00 rows=1 width=4)
      			       ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=4)
      				     ->  Limit  (cost=0.00..431.00 rows=1 width=4)
      					   ->  Table Scan on bar  (cost=0.00..431.00 rows=34 width=4)
           ->  Hash  (cost=100.00..100.00 rows=34 width=4)
      	   ->  Partition Selector for foo (dynamic scan id: 1)  (cost=10.00..100.00 rows=34 width=4)
      		 ->  Broadcast Motion 3:3  (slice3; segments: 3)  (cost=0.00..431.00 rows=1 width=8)
      		       ->  Table Scan on jazz  (cost=0.00..431.00 rows=1 width=8)
      			     Filter: f = 10
      Optimizer status: PQO version 3.65.0
      (18 rows)
      ```
      
      Previously, since the subplan was in a qual, we did not populate the
      qual properly when executing a dynamic table scan node. Thus the
      subplan attribute in PlanState of the dynamic table scan was incorrectly
      set to NULL, causing a later crash.
      
      We now populate this similarly to how we do it for dynamic index/bitmap
      scans.
      Co-authored-by: NSambitesh Dash <sdash@pivotal.io>
      Co-authored-by: NChris Hajas <chajas@pivotal.io>
      Co-authored-by: NAshuka Xue <axue@pivotal.io>
      81a671f6
    • A
      Avoid full cluster restarts in GDD tests and other cleanup. · a4b2fea3
      Ashwin Agrawal 提交于
      To enable or disable the GUC gp_enable_global_deadlock_detector,
      restart is required. But this GUC is only used on master, so just
      restart master instead of full cluster. This helps to cut-down the
      test time by a min. Also, in process remove the pg_sleep(2) calls, as
      GUCs gp_enable_global_deadlock_detector and
      gp_global_deadlock_detector_period can be set sametime and hence don't
      need separate time to reload the config and waste time.
      
      Also, removing prepare-for-local as only one test exists for local
      locks which is local-deadlock-03, hence directly prepare for the same
      inside that sql file.
      a4b2fea3
    • A
      Rearrange GDD tests to cut down runtime. · 261d4d64
      Ashwin Agrawal 提交于
      prepare-for-local tests resets the gp_global_deadlock_detector_period
      guc to default 2mins time. Due to it any GDD test after that will take
      atleast 2mins. Instead flip the order.
      
      Ideally prepare-for-local test don't need to reset the guc. Instead
      can use fault injector in local-deadlock-03 to avoid GDD from
      kicking-in, but that's for separate commit.
      
      This shaves off 6mins of isolation2 test time on my laptop.
      261d4d64