1. 13 7月, 2020 4 次提交
    • D
      Docs - remove HCI warning · 9eb9c2ac
      David Yozie 提交于
      9eb9c2ac
    • T
      Update linux installation guide · ba5792fa
      Tyler Ramer 提交于
      Issue #10069 noted some problems with the linux documentation.
      
      Updating this documentation to be more accurate and direct configuration
      steps to the appropriate documentation.
      Co-authored-by: NTyler Ramer <tramer@vmware.com>
      Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>
      ba5792fa
    • Z
      Remove unused function pathnode_walk_node. · 7339a178
      Zhenghua Lyu 提交于
      Previously, `cdbpath_dedup_fixup` is the only function that
      will invoke `pathnode_walk_node`. And it was removed by the
      commit 9628a332.
      
      So in this commit we remove these unused functions.
      7339a178
    • (
      Fix flaky test for replication_keeps_crash. (#10423) · db60b003
      (Jerome)Junfeng Yang 提交于
      Remove the set `gp_fts_probe_retries to 1` which may cause FTS probe failed.
      This was first added to reduce the test time, but set a lower retry
      value may cause the test failed to probe FTS update segment
      configuration. Since reduce the `gp_fts_replication_attempt_count` also
      save the test time, so skip alter ``gp_fts_probe_retries`.
      
      Also find an assertion may not match when mark mirror down happens before
      walsender exit, which will free the replication status before walsender
      exit and try to record disconnect info. Which lead the segment crash
      and starts recover.
      db60b003
  2. 10 7月, 2020 11 次提交
    • N
      ic-proxy: enable ic-proxy with --enable-ic-proxy · 81810a20
      Ning Yu 提交于
      We used to use the option --with-libuv to enable ic-proxy, it is not
      staightforward to understand the purpose of that option, though.  So we
      renamed it to --enable-ic-proxy, and the default setting is changed to
      "disable".
      
      Suggested by Kris Macoskey <kmacoskey@pivotal.io>
      81810a20
    • N
      ic-proxy: let backends connect to the proxy bgworker · 94c9d996
      Ning Yu 提交于
      Only in proxy mode, of course.  Currently the ic-proxy mode shares most
      of the backend logic with ic-tcp mode, so instead of copying the code we
      actually embed the ic-proxy specific logic in ic_tcp.c .
      94c9d996
    • N
      ic-proxy: launch as a bgworker · 5b60069c
      Ning Yu 提交于
      5b60069c
    • N
      ic-proxy: new value "proxy" in GUC gp_interconnect_type · 245ca266
      Ning Yu 提交于
      It is for the ic-proxy mode.
      245ca266
    • N
      ic-proxy: make gp_interconnect_proxy_addresses a GUC · 3140a44f
      Ning Yu 提交于
      3140a44f
    • N
      ic-proxy: implement the core logic · 6188fb1f
      Ning Yu 提交于
      The interconnect proxy mode, a.k.a. ic-proxy, is a new interconnect
      mode, all the backends communicate via a proxy bgworker, all the
      backends on the same segment share the same proxy bgworker, so every two
      segments only need one network connection between them, which reduces
      the network flows as well the ports.
      
      To enable the proxy mode we need to first configure the guc
      gp_interconnect_proxy_addresses, for example:
      
          gpconfig \
            -c gp_interconnect_proxy_addresses \
            -v "'1:-1:10.0.0.1:2000,2:0:10.0.0.2:2001,3:1:10.0.0.3:2002'" \
            --skipvalidation
      
      Then restart to take effect.
      6188fb1f
    • N
      Store dbid in CdbProcess · 8804bf39
      Ning Yu 提交于
      It is a preparation for the ic-proxy mode, we need this information to
      distinguish a primary segment with its mirror.
      8804bf39
    • P
      Fix pyyaml windows build (#10451) · 3daafd2f
      Peifeng Qiu 提交于
      Local fork at gpMgmt/bin/ext/yaml was removed by 8d6c3059. Unpack
      it from gpMgmt/bin/pythonSrc/ext just like pygresql.
      3daafd2f
    • A
      [Refactor] Pull out KHeap into CKHeap.h · 9e8f261d
      Ashuka Xue 提交于
      Pull out the implementation for binary heap into its own templated h
      file.
      9e8f261d
    • A
      Make histograms commutative when merging · 9b427611
      Ashuka Xue 提交于
      Prior to this commit, merging two histograms was not commutative.
      Meaning histogram1->Union(histogram2) could result in a row estimate of
      1500 rows, but histogram2->Union(histogram1) could result in a row
      estimate of 600 rows.
      
      Now, MakeBucketMerged has been renamed to SplitAndMergeBuckets. This
      function, which calculates the statistics for the merged bucket, now
      consistently return the same histogram buckets regardless of the order
      of input. This in turn, makes MakeUnionHistogramNormalize and
      MakeUnionAllHistogramNormalize commutative.
      
      Once we have successfully split the buckets and merged them as
      necessary, we may have generated up to 3X the number of buckets that
      were originally present. Thus we cap the number of buckets to be either
      the max size of the two incoming buckets, or, 100 buckets.
      
      CombineBuckets will then reduce the size of the histogram by combining
      consecutive buckets that have similar information. It does this by using
      a combination of two ratios: freq/ndv and freq/bucket_width. These two
      ratios were decided based off the following examples:
      
      Assuming that we calculate row counts for selections like the following:
      - For a predicate col = const: rows * freq / NDVs
      - For a predicate col < const: rows * (sum of full or fractional frequencies)
      
      Example 1 (rows = 100), freq/width, ndvs/width and ndvs/freq are all the same:
        ```
        Bucket 1: [0, 4)   freq .2  NDVs 2  width 4  freq/width = .05 ndv/width = .5 freq/ndv = .1
        Bucket 2: [4, 12)  freq .4  NDVs 4  width 8  freq/width = .05 ndv/width = .5 freq/ndv = .1
        Combined: [0, 12)  freq .6  NDVs 6  width 12
        ```
      
      This should give the same estimates for various predicates, with separate or combined buckets:
      ```
      pred          separate buckets         combined bucket   result
      -------       ---------------------    ---------------   -----------
      col = 3  ==>  100 * .2 / 2           = 100 * .6 / 6    = 10 rows
      col = 5  ==>  100 * .4 / 4           = 100 * .6 / 6    = 10 rows
      col < 6  ==>  100 * (.2 + .25 * .4)  = 100 * .5 * .6   = 30 rows
      ```
      
      Example 2 (rows = 100), freq and ndvs are the same, but width is different:
      ```
      Bucket 1: [0, 4)   freq .4  NDVs 4  width 4  freq/width = .1 ndv/width = 1 freq/ndv = .1
      Bucket 2: [4, 12)  freq .4  NDVs 4  width 8  freq/width = .05 ndv/width = .5 freq/ndv = .1
      Combined: [0, 12)  freq .8  NDVs 8  width 12
      ```
      
      This will give different estimates with the combined bucket, but only for non-equal preds:
      ```
      pred          separate buckets         combined bucket   results
      -------       ---------------------    ---------------   --------------
      col = 3  ==>  100 * .4 / 4           = 100 * .8 / 8    = 10 rows
      col = 5  ==>  100 * .4 / 4           = 100 * .8 / 8    = 10 rows
      col < 6  ==>  100 * (.4 + .25 * .4) != 100 * .5 * .8     50 vs. 40 rows
      ```
      
      Example 3 (rows = 100), now NDVs / freq is different:
      ```
      Bucket 1: [0, 4)   freq .2  NDVs 4  width 4  freq/width = .05 ndv/width = 1 freq/ndv = .05
      Bucket 2: [4, 12)  freq .4  NDVs 4  width 8  freq/width = .05 ndv/width = .5 freq/ndv = .1
      Combined: [0, 12)  freq .6  NDVs 8  width 12
      ```
      
      This will give different estimates with the combined bucket, but only for equal preds:
      ```
      pred          separate buckets         combined bucket   results
      -------       ---------------------    ---------------   ---------------
      col = 3  ==>  100 * .2 / 4          != 100 * .6 / 8      5 vs. 7.5 rows
      col = 5  ==>  100 * .4 / 4          != 100 * .8 / 8      10 vs. 7.5 rows
      col < 6  ==>  100 * (.2 + .25 * .4)  = 100 * .5 * .6   = 30 rows
      ```
      
      This commit also adds an attribute to the statsconfig for MaxStatsBuckets
      and changes the scaling method when creating singleton buckets.
      9b427611
    • A
      [Refactor] Update MakeStatsFilter, Rename CreateHistMashMapAfterMergingDisjPreds -> · c14fbb92
      Ashuka Xue 提交于
      MergeHistogramMapsforDisjPreds
      
      This commit refactors MakeStatsFilter to use
      MakeHistHashMapConjOrDisjFilter instead of individually calling
      MakeHistHashMapConj and MakeHistHashMapDisj.
      
      This commit also modifies MergeHistogramMapsForDisjPreds to avoid copy
      and creating unnecessary histogram buckets.
      c14fbb92
  3. 09 7月, 2020 4 次提交
  4. 08 7月, 2020 8 次提交
  5. 07 7月, 2020 5 次提交
    • X
      Alter table add column on AOCS table inherits the default storage settings · 9a574915
      xiong-gang 提交于
      When alter table add a column to AOCS table, the storage setting (compresstype,
      compresslevel and blocksize) of the new column can be specified in the ENCODING
      clause; it inherits the setting from the table if ENCODING is not specified; it
      will use the value from GUC 'gp_default_storage_options' when the table dosen't
      have the compression configuration.
      9a574915
    • X
      Fix flaky test gp_replica_check · a1a0af55
      xiong-gang 提交于
      When there is a big lag between primary and mirror replay, gp_replica_check
      will fail if the checkpoint is not replayed in about 60 seconds. Extend the
      timeout to 600 seconds to reduce the chance of flaky.
      a1a0af55
    • H
      Disallow the replicated table inherit or to be inherited (#10344) · dc4b839e
      Hao Wu 提交于
      Currently, replicated tables are not allowed to inherit a parent
      table. But ALTER TABLE .. INHERIT can pass around the restriction.
      
      On the other hand, a replicated table is allowed to be inherited
      by a hash distributed table. It makes things much complicated.
      When the parent table is declared as a replicated table inherited by
      a hash distributed table, its data on the parent is replicated
      but the data on the child is hash distributed. When running
      `select * from parent;`, the generated plan is:
      ```
      gpadmin=# explain select * from parent;
                                       QUERY PLAN
      -----------------------------------------------------------------------------
       Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..4.42 rows=14 width=6)
         ->  Append  (cost=0.00..4.14 rows=5 width=6)
               ->  Result  (cost=0.00..1.20 rows=4 width=7)
                     One-Time Filter: (gp_execution_segment() = 1)
                     ->  Seq Scan on parent  (cost=0.00..1.10 rows=4 width=7)
               ->  Seq Scan on child  (cost=0.00..3.04 rows=2 width=4)
       Optimizer: Postgres query optimizer
      (7 rows)
      ```
      It's not particularly useful for the parent table to be replicated.
      So, we disallow the replicated table to be inherited.
      Reported-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
      dc4b839e
    • C
      Move slack command trigger repo · 8e6a46f7
      Chris Hajas 提交于
      We've moved the repo that holds trigger commits to a private repo since
      there wasn't anything interesting there.
      8e6a46f7
    • A
      Fix vacuum on temporary AO table · 327abdb5
      Ashwin Agrawal 提交于
      The path constructed in OpenAOSegmentFile() didn't take into account
      "t_" semantic of filename. Ideally, the correct filename is passed to
      function, so no need to construct the same.
      
      Would be better if can move MakeAOSegmentFileName() inside
      OpenAOSegmentFile(), as all callers call it except
      truncate_ao_perFile(), which doesn't fit that model.
      327abdb5
  6. 06 7月, 2020 1 次提交
    • (
      Fix bitmap scan crash issue for AO/AOCS table. (#10407) · cb5d18d1
      (Jerome)Junfeng Yang 提交于
      When ExecReScanBitmapHeapScan get executed, bitmap state (tbmiterator
      and tbmres) gets freed in freeBitmapState. So the tbmres is NULL, and we
      need to reinit bitmap state to start scan from the beginning and reset AO/AOCS
      bitmap pages' flags(baos_gotpage, baos_lossy, baos_cindex and baos_ntuples).
      
      Especially when ExecReScan happens on the bitmap append only scan and
      not all the matched tuples in bitmap are consumed, for example, Bitmap
      Heap Scan as inner plan of the Nest Loop Semi Join. If tbmres not get init,
      and not read all tuples in last bitmap, BitmapAppendOnlyNext will assume the
      current bitmap page still has data to return. but bitmap state already freed.
      
      From the code, for Nest Loop Semi Join, when a match find, a new outer slot is
      requested, and then `ExecReScanBitmapHeapScan` get called, `node->tbmres` and
      `node->tbmiterator` set to NULL. `node->baos_gotpage` still keeps true.
      When execute `BitmapAppendOnlyNext`, it skip create new `node->tbmres`.
      And jump to access `tbmres->recheck`.
      Reviewed-by: NJinbao Chen <jinchen@pivotal.io>
      Reviewed-by: NAsim R P <pasim@vmware.com>
      cb5d18d1
  7. 03 7月, 2020 4 次提交
  8. 02 7月, 2020 2 次提交
    • M
      docs - add GUC -write_to_gpfdist_timeout (#10391) · 86a53828
      Mel Kiyama 提交于
      * docs - add GUC -write_to_gpfdist_timeout
      
      Add GUC and add link to GUC in gpfdist reference
      
      * docs - correct default value 600 --> 300. Fix xref.
      86a53828
    • H
      Fix Orca optimizer search stage couldn't measure elapsed time correctly · db25c3c8
      Haisheng Yuan 提交于
      Previously, CTimerUser didn't initialize timer, so the elapsed time provided by
      Orca was not meaningful, sometimes confusing.
      
      When traceflag T101012 is turned on, we can see the following trace message:
      
      [OPT]: Memo (stage 0): [20 groups, 0 duplicate groups, 44 group expressions, 4 activated xforms]
      [OPT]: stage 0 completed in 860087 msec,  plan with cost 1028.470667 was found
      [OPT]: <Begin Xforms - stage 0>
      ......
      [OPT]: <End Xforms - stage 0>
      [OPT]: Search terminated at stage 1/1
      [OPT]: Total Optimization Time: 67ms
      
      As shown above, the stage 0 elapsed timer is much greater than the total
      optimization time, which is obviously incorrect.
      db25c3c8
  9. 01 7月, 2020 1 次提交
    • (
      Let FTS mark mirror down if replication keeps crash. (#10327) · 252ba888
      (Jerome)Junfeng Yang 提交于
      For GPDB FTS, if the primary, mirror replication keeps crash
      continuously and attempt to crate replication connection too many times,
      FTS should mark the mirror down. Otherwise, it may block other
      processes.
      If the WAL starts streaming, clear the attempt count to 0. This is because the blocked
      transaction can only be released once the WAL in streaming state.
      
      The solution for this is:
      
      1. Use ` FTSReplicationStatus` which under `gp_replication.c`  to track current primary-mirror
      replication status. This includes:
          - A continuous failure counter. The counter gets reset once the replication
          starts streaming, or replication restarted.
          - A record of the last disconnect timestamp which is refactored from
          `WalSnd` slot.
          The reason for moving this is: When FTS probe happens, the `WalSnd`
          slot may already get freed. And `WalSnd` slot is designed reusable.
          It's hacky to read value from a freed slot in shared memory.
      
      2. When handling each probe query, `GetMirrorStatus` will check the current
      mirror status and the failure count from walsender's application ` FTSReplicationStatus`.
      If the count exceeds the limit, the retry test will ignore the last replication
      disconnect time since it gets refreshed when new walsender starts. (Since
      in the current case, the walsender keeps restart.)
      
      3. On FTS bgworker. If mirror down and retry set to false, mark the mirror
      down.
      
      A `gp_fts_replication_attempt_count` GUC is added. When the replication failure count
      exceed this GUC, ignore the last replication disconnect time when checking for mirror
      probe retry.
      
      The life cycle of a ` FTSReplicationStatus`:
      1. It gets created when first enable replication during the replication
      start phase. Each replication's sender should have a unique
      `application_name`, which also used to specify the replication priority
      in multi-mirror env. So ` FTSReplicationStatus` uses the `application_name` mark
      itself.
      
      2. The ` FTSReplicationStatus` for replication will exist until FTS detects
      failure and stop the replication between primary and mirror. Then
      ` FTSReplicationStatus` for that `application_name` will be dropped.
      
      Now the `FTSReplicationStatus` is used only for GPDB primary-mirror replication.
      252ba888