1. 17 11月, 2020 1 次提交
    • A
      Avoid checking distributed snapshot for visibility checks on QD · 48b13271
      Ashwin Agrawal 提交于
      This is partial cherry-pick from commit
      b3f300b9.  In the QD, the distributed
      transactions become visible at the same time as the corresponding
      local ones, so we can rely on the local XIDs. This is true because the
      modification of local procarray and globalXactArray are protected by
      lock and hence a atomic operation during transaction commit.
      
      We have seen many situations where catalog queries run very slow on QD
      and potential reason is checking distributed logs. Process local
      distributed log cache fall short for this usecase as most of XIDs are
      unique and hence get frequent cache misses. Shared memory cache falls
      short as only caches 8 pages and many times need many more pages to be
      cached to be effective.
      Co-authored-by: NHubert Zhang <hzhang@pivotal.io>
      Co-authored-by: NGang Xiong <gangx@vmware.com>
      48b13271
  2. 16 11月, 2020 1 次提交
  3. 06 11月, 2020 1 次提交
    • J
      Faithfully parse inheritance-recursion for external tables · 0387161b
      Jesse Zhang 提交于
      When SQL standard table inheritance was added in upstream (by commit
      2fb6cc90 in Postgres 7.1), mentioning a table in the FROM clause of a
      query would necessarily mean traversing through the inheritance
      hierarchy. The need to distinguish between the (legacy, less common, but
      legitimate nonetheless) intent of not recursing into child tables gave
      rise to two things: the guc `sql_inheritance` which toggles the default
      semantics of parent tables, and the `ONLY` keyword used in front of
      parent table names to explicitly skip descendant tables.
      
      ORCA doesn't like queries that skip descendant tables: it falls back to
      the legacy planner as soon as it detects that intent.
      
      Way way back in Greenplum-land, when external tables were given a
      separate designation in relstorage (RELSTORAGE_EXTERNAL), we seemed to
      have added code in parser (parse analysis) so that queries on external
      tables *never* recurse into their child tables, regardless of what the
      user specifies -- either via `ONLY` or `*` in the query, or via guc
      `sql_inheritance`. Technically, that process scrubs the range table
      entries to hard-code "do not recurse".
      
      The combination of those two things -- hard coding "do not recurse" in
      the RTE for the analyzed parse tree and ORCA detecting intent of `ONLY`
      through RTE -- led ORCA to *always* fall back to planner when an
      external table is mentioned in the FROM clause. commit 013a6e9d tried
      fixing this by *detecting harder* whether there's an external table.
      
      The behavior of the parse-analyzer hard coding a "do not recurse" in the
      RTE for an external table seems wrong for several reasons:
      
        1. It seems unnecessarily defensive
      
        2. It doesn't seem to belong in the parser.
      
           a. While changing "recurse" back to "do not recurse" abounds, all
           other occurrences happen in the planner as an optimization for
           childless tables.
      
           b. It deprives an optimizer of the actual intent expressed by the
           user: because of this hardcoding, neither ORCA nor planner would
           have a way of knowing whether the user specified `ONLY` in the
           query.
      
           c. It deprives the user of the ability to use child tables with an
           external table, either deliberately or coincidentally.
      
           d. A corollary is that any old views created as `SELECT a,b FROM
           ext_table` will be perpetuated as `SELECT a,b FROM ONLY ext_table`.
      
      This commit removes this defensive setting in the parse analyzer. As a
      consequence, we're able to reinstate the simpler RTE check before commit
      013a6e9d. Queries and new views will include child tables as expected.
      
      Note that this commit will introduce a behavior change:
      (taken from https://github.com/greenplum-db/gpdb/pull/5455#issuecomment-412247709)
      
      1. a (not external) table by default means "me and my descendants" even if it's childless
      2. an external table with child tables previously would never recurse into child tables
      2. after this patch you need to use ONLY to exclude descendant tables.
      
      (cherry picked from commit 2371cb3b)
      0387161b
  4. 03 11月, 2020 1 次提交
    • X
      Fix resgroup unusable if its dropping failed · b7c42625
      xiong-gang 提交于
      In function DropResourceGroup(), group->lockedForDrop is set
      to true by calling ResGroupCheckForDrop, however, it can only
      be set to false inside dropResgroupCallback. This callback is
      registered at the ending of function DropResourceGroup. If an
      error occured between them, group->lockedForDrop would be true
      forever.
      
      Fix it by putting the register process ahead of the lock call.
      To prevent Assert(group->nRunning* > 0) if ResGroupCheckForDrop
      throws an error, return directly if group->lockedForDrop did
      not change.
      
      See:
      
      ```
      gpconfig -c gp_resource_manager -v group
      gpstop -r -a
      
      psql
                      CPU_RATE_LIMIT=20,
                      MEMORY_LIMIT=20,
                      CONCURRENCY=50,
                      MEMORY_SHARED_QUOTA=80,
                      MEMORY_SPILL_RATIO=20,
                      MEMORY_AUDITOR=vmtracker
              );
      
      psql -U user_test
      > \d -- hang
      ```
      Co-authored-by: Ndh-cloud <60729713+dh-cloud@users.noreply.github.com>
      b7c42625
  5. 26 10月, 2020 1 次提交
    • X
      Skip setting PT information in WAL during recovery · b8ac48ad
      xiong-gang 提交于
      If the system crashes when splitting a btree page and the WAL of downlink to
      the parent is not flushed, crash recovery will complete this step in
      'btree_xlog_cleanup' and it may led to split the parent, which will write a new
      WAL record in '_bt_split'. We skip populating the PT information during
      recovery in function 'RelationNeedToFetchGpRelationNodeForXLog' and it will
      cause PANIC in this case. This is fixed in 6X and later version by commit
      '40dae7ec'. This commit only set invalid PT information in the new WAL record.
      Co-authored-by: NGang Xiong <gangx@vmware.com>
      b8ac48ad
  6. 21 10月, 2020 1 次提交
    • J
      The inner relation of LASJ_NOTIN should not have partition locaus · 343f8826
      Jinbao Chen 提交于
      The result of NULL not in an unempty set is false. The result of
      NULL not in an empty set is true. But if an unempty set has
      partitioned locus. This set will be divided into several subsets.
      Some subsets may be empty. Because NULL not in empty set equals
      true. There will be some tuples that shouldn't exist in the result
      set.
      
      The patch disable the partitioned locus of inner table by removing
      the join clause from the redistribution_clauses.
      
      this commit cherry pick from 6X_STABLE 8c93db54f3d93a890493f6a6d532f841779a9188
      Co-authored-by: NHubert Zhang <hubertzhang@apache.org>
      Co-authored-by: NRichard Guo <riguo@pivotal.io>
      343f8826
  7. 16 10月, 2020 1 次提交
    • S
      Increment ExternalScan::scancounter across queries in ORCA · 78024fbc
      Shreedhar Hardikar 提交于
      gpfdist uses the global xid & timestamp to distinguish whether each
      connection belongs to the same external scan or not.
      
      ORCA generates a unique scan number for each ExternalScan within the
      same plan, but not accross plans. So, within a transaction, we may issue
      multiple external scans that do not get differentiated properly,
      producing different results.
      
      This commit patches that by using a different scan number accross plans,
      just like what planner does. Ideally gpfdist should also take into
      account the command-id of the query to prevent this problem for other
      cases such as prepared statements.
      78024fbc
  8. 29 9月, 2020 4 次提交
    • J
      Format ORCA and GPOPT. · 219fe0c4
      Jesse Zhang 提交于
      The canonical config file is in src/backend/gpopt/.clang-format (instead
      of under the non-existent src/backend/gporca), I've created one (instead
      of two) symlink, for GPOPT headers. Care has been taken to repoint the
      symlink to the canonical config under gpopt, instead of gpopt as it is
      under HEAD.
      
      This is spiritually a cherry-pick of commit 2f7dd76c.
      (cherry picked from commit 2f7dd76c)
      219fe0c4
    • J
      Adds a script to format and check formatting. · 310c3674
      Jesse Zhang 提交于
      This is intended for both local developer use and for CI.
      
      This depends on GNU parallel. One-time install:
      
      macOS: brew install parallel clang-format
      Debian: apt install parallel clang-format-10
      
      To format all ORCA / GPOPT code:
      
      $ src/tools/fmt fmt
      
      To check for formatting conformance:
      
      $ src/tools/fmt chk
      
      To modify the configuration, you'll need two steps:
      1. Edit clang-format.intent.yml
      2. Generate the expanded configuration file:
      
      $ src/tools/fmt gen
      
      This commit also adds a formatting README, To document some of the
      rationale behind tooling choice. Also mention the new `README.format.md`
      from both the style guide and ORCA's main README.
      
      (cherry picked from commit 57b744c1)
      310c3674
    • J
      Initial .clang-format. · 4390927b
      Jesse Zhang 提交于
      Generated using clang-format-10
      
      This is spiritually a cherry-pick of commit 16b48d24, but I have
      to tweak things a bit by moving the .clang-format file from
      src/backend/gporca to under src/backend/gpopt
      
      (cherry picked from commit 16b48d24)
      4390927b
    • S
      Project certain outrrefs in the targetlist of subqueries · 82fa6ef7
      Shreedhar Hardikar 提交于
      In a previous ORCA version (3.311) we added code to fall back gracefully
      when a subquery select list contains a single outer ref that is not part
      of an expression, such as in
      
      select * from foo where a is null or a = (select foo.b from bar)
      
      This commit adds a fix that allows us to handle such queries in ORCA
      by adding a project in the translator that will echo the outer ref
      from within the subquery, and using that projected value in the
      select list of the subquery. This ensures that we use a NULL value for
      the scalar subquery in the expression for the outer ref when the
      subquery returns no rows.
      
      Also note that this is still skipped for grouping cols in the target
      list. This was done to avoid regression for certain queries, such as:
      
      select *
      from A
      where not exists (select sum(C.i)
                        from C
                        where C.i = A.i
                        group by a.i);
      
      ORCA is currently unable to decorrelate sub-queries that contain project
      nodes, So, a `SELECT 1` in the subquery would also cause this
      regression. In the above query, the parser adds `a.i` to the target list
      of the subquery, that would get an echo projection (as described above),
      and thus would prevent decorelation by ORCA. For this reason, we decided
      to maintain existing behavior until ORCA is able to handle projections
      in subqueries better.
      
      Also add ICG tests.
      Co-authored-by: NHans Zeller <hzeller@pivotal.io>
      Co-authored-by: NShreedhar Hardikar <shardikar@pivotal.io>
      82fa6ef7
  9. 26 9月, 2020 1 次提交
  10. 21 9月, 2020 1 次提交
    • J
      Fix interconnect hung issue (#10757) · 3bef5530
      Jinbao Chen 提交于
      We hit interconnect hung issue many times in many cases, all have
      the same pattern: the downstream interconnect motion senders keep
      sending the tuples and they are blind to the fact that upstream
      nodes have finished and quitted the execution earlier, the QD
      then get enough tuples and wait all QEs to quit which cause a
      deadlock.
      
      Many nodes may quit execution earlier, eg, LIMIT, HashJoin, Nest
      Loop, to resolve the hung issue, they need to stop the interconnect
      stream explicitly by calling ExecSquelchNode(), however, we cannot
      do that for rescan cases in which data might lose, eg, commit
      2c011ce4. For rescan cases, we tried using QueryFinishPending to
      stop the senders in commit 02213a73 and let senders check this
      flag and quit, that commit has its own problem, firstly, QueryFini
      shPending can only set by QD, it doesn't work for INSERT or UPDATE
      cases, secondly, that commit only let the senders detect the flag
      and quit the loop in a rude way (without sending the EOS to its
      receiver), the receiver may still be stuck inreceiving tuples.
      
      This commit revert the QueryFinishPending method firstly.
      
      To resolve the hung issue, we move TeardownInterconnect to the
      ahead of cdbdisp_checkDispatchResult so it guarantees to stop
      the interconnect stream before waiting and checking the status
      of QEs.
      
      For UDPIFC, TeardownInterconnect() remove the ic entries, any
      packets for this interconnect context will be treated as 'past'
      packets and be acked with STOP flag.
      
      For TCP, TeardownInterconnect() close all connection with its
      children, the children will treat any readable data in the
      connection as a STOP message include the closure operation.
      
      this commit backport from master ec1d9a70
      3bef5530
  11. 19 9月, 2020 2 次提交
    • A
      Refactor query string truncation on top of 889ba39e · e393c88b
      Asim R P 提交于
      Commit 889ba39e fixed the query string truncation in dispatcher to
      make it locale-aware.  This patch refactors that change so as to avoid
      accessing a string beyond its length.
      
      Reviewed by: Heikki, Ning Yu and Polina Bungina
      
      (cherry picked from commit abf6b330)
      e393c88b
    • P
      Fix query string truncation while dispatching to QE · b76d049b
      Polina Bungina 提交于
      Execution of a long enough query containing multi-byte characters can cause incorrect truncation of the query string. Incorrect truncation implies an occasional cut of a multi-byte character and (with log_min_duration_statement set to 0 ) subsequent write of an invalid symbol to segment logs. Such broken character present in logs produces problems when trying to fetch logs info from gp_toolkit.__gp_log_segment_ext  table - queries fail with the following error: «ERROR: invalid byte sequence for encoding…».
      This is caused by buildGpQueryString function in `cdbdisp_query.c`, which prepares query text for dispatch to QE. It does not take into account character length when truncation is necessary (text is longer than QUERY_STRING_TRUNCATE_SIZE).
      
      (cherry picked from commit f31600e9)
      b76d049b
  12. 18 9月, 2020 2 次提交
    • X
      Don't dispatch client_encoding to QE · 9a6cd1ee
      xiong-gang 提交于
      When client_encoding is dispatch to QE, error messages generated in QEs were
      converted to client_encoding, but QD assumed that they were in server encoding,
      it will leads to corruption.
      
      This is fixed in 6X in a6c9b4, but this skips the gpcopy changes since 5X
      doesn't support syntax 'COPY...ENCODING'.
      
      Fix issue: https://github.com/greenplum-db/gpdb/issues/10815
      9a6cd1ee
    • D
      Align Orca relhasindex behavior with Planner (#10788) · 8083a046
      David Kimura 提交于
      Function `RelationGetIndexList()` does not filter out invalid indexes.
      That responsiblity is left to the caller (e.g. `get_relation_info()`).
      Issue is that Orca was not checking index validity.
      
      This commit also introduces an optimization to Orca that is already used
      in Planner whereby we first check relhasindex before checking pg_index.
      
      (cherry picked from commit b011c351)
      8083a046
  13. 17 9月, 2020 2 次提交
    • A
      Do not read a persistent tuple after it is freed · 5f765a8e
      Asim R P 提交于
      This bug was found in a production environment where vacuum on
      gp_persistent_relation was concurrently running with a backend
      performing end-of-xact filesystem operations.  And the GUC
      debug_persistent_print was enabled.
      
      The *_ReadTuple() function was called on a persistent TID after the
      corresponding tuple was deleted with frozen transaction ID.  The
      concurrent vacuum recycled the tuple and it led to a SIGSEGV when the
      backend tried to access values from the tuple.
      
      Fix it by avoiding the debug log message in case when the persistent
      tuple is freed (transitioning to FREE state).  All other state
      transitions are logged.
      
      In absence of concurrent vacuum, things worked just fine because the
      *_ReadTuple() interface reads tuples from persistent tables directly
      using TID.
      5f765a8e
    • W
      Skip FK check when do relation truncate · b50c134b
      Weinan WANG 提交于
      GPDB does not support FK, but keep FK grammar in DDL, since it
      reduce DB migration manual workload from others.
      Hence, we do not need FK check for truncate command, rid of it.
      b50c134b
  14. 10 9月, 2020 1 次提交
    • D
      Allow direct dispatch in Orca if predicate on column gp_segment_id (#10679) (#10785) · b52d5b9e
      David Kimura 提交于
      This approach special cases gp_segment_id enough to include the column
      as a distributed column constraint. It also updates direct dispatch info
      to be aware of gp_segment_id which represents the raw value of the
      segment where the data resides. This is different than other columns
      which hash the datum value to decide where the data resides.
      
      After this change the following DDL shows Gather Motion from 2 segments
      on a 3 segment demo cluster.
      
      ```
      CREATE TABLE t(a int, b int) DISTRIBUTED BY (a);
      EXPLAIN SELECT gp_segment_id, * FROM t WHERE gp_segment_id=1 or gp_segment_id=2;
                                        QUERY PLAN
      -------------------------------------------------------------------------------
       Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..431.00 rows=1 width=12)
         ->  Seq Scan on t  (cost=0.00..431.00 rows=1 width=12)
               Filter: ((gp_segment_id = 1) OR (gp_segment_id = 2))
       Optimizer: Pivotal Optimizer (GPORCA)
      (4 rows)
      
      ```
      
      (cherry picked from commit 10e2b2d9)
      
      * Bump ORCA version to 3.110.0
      b52d5b9e
  15. 04 9月, 2020 1 次提交
  16. 03 9月, 2020 3 次提交
    • H
      Using lwlock to protect resgroup slot in session state · 1e24b618
      Hubert Zhang 提交于
      Resource group used to access resGroupSlot in SessionState without
      lock. This is correct when session only access resGroupSlot by itself.
      But as we introduced runaway feature, we need to traverse the current
      session array to find the top consumer session when redzone is reached.
      This requires:
      1. runaway detector should hold shared resgroup lock to avoid resGroupSlot
      is detached from a session concurrently when redzone is reached.
      2. normal session should hold exclusive lock when modifying resGroupSlot
      in SessionState.
      Reviewed-by: NNing Yu <nyu@pivotal.io>
      
      (cherry picked from commit a4cb06b4)
      1e24b618
    • H
      Fix resource group runaway rounding issue · e9223710
      Hubert Zhang 提交于
      When calculating safeChunksThreshold of runaway in resource group,
      we used to divide by 100 to get the number of safe chunks. This may
      lead to small chunk numbers to be rounded to zero. Fix it by storing
      safeChunksThreshold100(100 times bigger than the real safe chunk) and
      do the computation on the fly.
      Reviewed-by: NNing Yu <nyu@pivotal.io>
      (cherry picked from commit 757184f9)
      e9223710
    • P
      Correctly use atomic variable in ResGroupControl.freeChunks. (#8434) · 1557fd13
      Paul Guo 提交于
      This variable was used mixing with atomic api functions and direct access.
      This is not wrong usually in real scenario but is not a good implementation
      since 1) that depends on compiler and H/W to ensure the correctness of direct
      access. 2) code is not graceful.
      
      Changing to all use atomic api functions.
      Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
      (cherry picked from commit f59307f5)
      1557fd13
  17. 29 8月, 2020 2 次提交
    • J
      Fix double deduction of FREEABLE_BATCHFILE_METADATA · 567025bd
      Jesse Zhang 提交于
      Earlier, we always deducted FREEABLE_BATCHFILE_METADATA inside
      closeSpillFile() regardless of whether the spill file was already
      suspended. This deduction, is already performed inside
      suspendSpillFiles(). This double accounting leads to
      hashtable->mem_for_metadata becoming negative and we get:
      
      FailedAssertion("!(hashtable->mem_for_metadata > 0)", File: "execHHashagg.c", Line: 2019)
      Co-authored-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>
      567025bd
    • J
      Fix assert condition in spill_hash_table() · 679ed508
      Jesse Zhang 提交于
      This commit fixes the following assertion failure message reported in:
      (#9902) https://github.com/greenplum-db/gpdb/issues/9902
      
      FailedAssertion("!(hashtable->nbuckets > spill_set->num_spill_files)", File: "execHHashagg.c", Line: 1355)
      
      hashtable->nbuckets can actually end up being equal to
      spill_set->num_spill_files, which causes the failure. This is because:
      
      hashtable->nbuckets is set with HashAggTableSizes->nbuckets, which can
      end up being equal to: gp_hashagg_default_nbatches. Refer:
      nbuckets = Max(nbuckets, gp_hashagg_default_nbatches);
      
      Also, spill_set->num_spill_files is set with
      HashAggTableSizes->nbatches, which is further set to
      gp_hashagg_default_nbatches.
      
      Thus, these two entities can be equal.
      Co-authored-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>
      (cherry picked from commit 067bb350)
      679ed508
  18. 27 8月, 2020 3 次提交
    • (
      Error out when changing datatype of column with constraint. (#10712) · 9ebc0423
      (Jerome)Junfeng Yang 提交于
      Raise a meaningful error message for this case.
      GPDB doesn't support alter type on primary key and unique
      constraint column. Because it requires to drop - recreate logic.
      The drop currently only performs on master which lead error when
      recreating index (since recreate index will dispatch to segments and
      there still an old constraint index exists).
      
      This fixes the issue https://github.com/greenplum-db/gpdb/issues/10561.
      Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
      (cherry picked from commit 32446a32)
      9ebc0423
    • G
      Fix assertion failures in BackoffSweeper · c9f2a816
      ggbq 提交于
      Previous commit ab74e1c6, c7befb1d did not completely solve its race
      condition, it did not test for last iteration of the while/for loop.
      This could result in failed assertion in the following loop. The patch
      moves the judgement to the ending of the for loop, it is safe, because
      the first iteration will never trigger: Assert(activeWeight > 0.0).
      
      Also, the other one race condition can trigger this assertion
      Assert(gl->numFollowersActive > 0). Consider this situation:
      
          Backend A, B belong to the same statement.
      
          Timestamp1: backend A's leader is A, backend B's leader is B.
      
          Timestamp2: backend A's numFollowersActive remains zero due to timeout.
      
          Timestamp3: Sweeper calculates leader B's numFollowersActive to 1.
      
          Timestamp4: backend B changes it's leader to A even if A is inactive.
      
      We stop sweeping for this race condition just like commit ab74e1c6 did.
      
      Both Assert(activeWeight > 0.0) and Assert(gl->numFollowersActive > 0)
      are removed.
      
      (cherry picked from commit b1c19196)
      c9f2a816
    • P
      Minimize the race condition in BackoffSweeper() · a3233b6b
      Pengzhou Tang 提交于
      There is a long-standing race condition in BackoffSweeper() which
      triggers an error and then triggers another assertion failure for
      not reset sweeperInProgress to false.
      
      This commit doesn't resolve the race condition fundamentally with
      lock or other implementation, because the whole backoff mechanism
      did not ask for accurate control, so skipping some sweeps should
      be fine so far. We also downgrade the log level to DEBUG because
      a restart of sweeper backend is unnecessary.
      
      (cherry picked from commit ab74e1c6)
      a3233b6b
  19. 26 8月, 2020 1 次提交
    • X
      PANIC when the shared memory is corrupted · 4f5a2c23
      xiong-gang 提交于
      shmNumGxacts and shmGxactArray are accessed under the protection of
      shmControlLock, this commit add some defensive code and PANIC at the earliest
      when the shared memory is corrupted.
      4f5a2c23
  20. 25 8月, 2020 1 次提交
    • T
      Fix unexpected corrupt of persistent filespace table (#10623) · 424e382a
      Tang Pengzhou 提交于
      With a segment whose primary is down and its mirror is promoted to
      primary, we run gp_remove_segment_mirror to remove the mirror of
      the segment, we see the mirror related fields are cleaned up in
      gp_persistent_filespace_node. But when we run gp_remove_segment_mirror
      for the same segment again, the primary related fields are also
      cleaned up, this is wrong and not expected.
      
      Such a case was observed in production when gprecoverseg -F was
      interrupted in the middle of __updateSystemConfigRemoveAddMirror() and
      run again.
      Reviewed-by: NAsim R P <pasim@vmware.com>
      424e382a
  21. 13 8月, 2020 1 次提交
    • (
      Modify error context callback functions to not assume that they can fetch · dc572635
      (Jerome)Junfeng Yang 提交于
      catalog entries via SearchSysCache and related operations.  Although, at the
      time that these callbacks are called by elog.c, we have not officially aborted
      the current transaction, it still seems rather risky to initiate any new
      catalog fetches.  In all these cases the needed information is readily
      available in the caller and so it's just a matter of a bit of extra notation
      to pass it to the callback.
      
      Per crash report from Dennis Koegel.  I've concluded that the real fix for
      his problem is to clear the error context stack at entry to proc_exit, but
      it still seems like a good idea to make the callbacks a bit less fragile
      for other cases.
      
      Backpatch to 8.4.  We could go further back, but the patch doesn't apply
      cleanly.  In the absence of proof that this fixes something and isn't just
      paranoia, I'm not going to expend the effort.
      
      (cherry picked from commit a836abe9)
      Note the changes from the above commit in `inline_set_returning_function` is not
      included cause the function does not exist in 5X right now.
      Co-authored-by: NTom Lane <tgl@sss.pgh.pa.us>
      dc572635
  22. 12 8月, 2020 1 次提交
    • Z
      Print CTID when we detect data distribution wrong for UPDATE|DELETE. · 324b7834
      Zhenghua Lyu 提交于
      When update or delete statement errors out because of the CTID is
      not belong to the local segment, we should also print out the CTID
      of the tuple so that it will be much easier to locate the wrong-
      distributed data via:
        `select * from t where gp_segment_id = xxx and ctid='(aaa,bbb)'`.
      324b7834
  23. 03 8月, 2020 1 次提交
    • (
      Resolve high `CacheMemoryContext` usage for `ANALYZE` on large partition table.(#10555) · 3d41c361
      (Jerome)Junfeng Yang 提交于
      In some cases, merge stats logic for root partition table may consume
      very high memory usage in CacheMemoryContext.
      This may lead to `Canceling query because of high VMEM usage` when
      concurrently ANALYZE partition tables.
      
      For example, there are several root partition tables and they both have
      thousands of leaf tables. And these tables are all wide tables that may
      contain hundreds of columns.
      So when analyze()/auto_stats() leaf tables concurrently,
      `leaf_parts_analyzed` will consume lots of memory(catalog catch for
      pg_statistic and pg_attribute) under
      CacheMemoryContext for each backend, which may hit the protect VMEM
      limit.
      In `leaf_parts_analyzed`, a single backend's leaf table analysis for a
      root partition table, it may add cache entries up to
      number_of_leaf_tables * number_of_columns tuples from pg_statistic and
      number_of_leaf_tables * number_of_columns tuples from pg_arrtibute.
      Set guc `optimizer_analyze_root_partition` or
      `optimizer_analyze_enable_merge_of_leaf_stats` to false could skip merge
      stats for root table and `leaf_parts_analyzed` will not execute.
      
      To resolve this issue:
      1. When checking whether merge stats are available for a root table in
      `leaf_parts_analyzed`, check whether all leaf tables are ANALYZEd first,
      if they're still un-ANALYZE leaf table exists, return quickly to avoid touch
      columns' pg_attribute and pg_statistic per leaf table(this will save lots of time).
      And also don't rely on system catalog cache and use the
      index to fetch the stats tuple to avoid one-time cache usage(in common cases).
      
      2. When merging a stats in `merge_leaf_stats`, don't rely on system
      catalog cache and use the index to fetch the stats tuple.
      
      There are side-effects for not rely on system catalog cache(which are all **rare** situations).
      1. If insert/update/copy several leaf tables which under **same
      root partition** table in **same session** and all leaf tables are **analyzed**
      will be much slower since auto_stats will call `leaf_parts_analyzed` once the leaf
      table gets updated, and we don't rely on system catalog cache now.
      (`set optimizer_analyze_enable_merge_of_leaf_stats=false` could avoid
      this)
      
      2. ANALYZE the same root table several times in the same session is much
      slower than before since we don't rely on system catalog cache.
      
      Seems this solution improves the performance for ANALYZE, and
      it also makes ANALYZE won't hit the memory issue anymore.
      
      (cherry picked from commit 533a47dd)
      3d41c361
  24. 23 7月, 2020 1 次提交
    • A
      Allow merging of statistics for domain types · 33f71c07
      Ashuka Xue 提交于
      Prior to this commit, incremental analyze would error out when merging
      statistics of partition tables containing domain types with a message
      saying that the domain type is not hashable.
      
      We should not be trying to hash the domain type, instead we should be
      hashing the underlying base type of the domain. We noticed that domains
      of array types are not being merged in any circumstance due to logic in
      `isGreenplumDbHashable` inside analyze.c. (For example, `CREATE DOMAIN
      int32_arr int[]` will not be merged, instead we will compute scalar
      stats). We did not want to enable this functionality as it would have
      other ramifications in GPDB5.
      
      Example:
      ```
      CREATE DOMAIN int32 int;
      
      CREATE TABLE foo (x int32) PARTITION BY (START(1) END(5) EVERY(1));
      INSERT INTO foo SELECT i % 5 FROM (generate_series(1,20)i;
      ANALYZE foo;
      ```
      will no longer error out during ANALYZE, but will generate proper
      statistics for table foo.
      Co-authored-by: NAshuka Xue <axue@vmware.com>
      Co-authored-by: NChris Hajas <chajas@vmware.com>
      33f71c07
  25. 17 7月, 2020 2 次提交
  26. 15 7月, 2020 1 次提交
    • R
      Fix pulling up EXPR sublinks · a6ee98bf
      Richard Guo 提交于
      Currently GPDB tries to pull up EXPR sublinks to inner joins. For query
      
      select * from foo where foo.a >
          (select avg(bar.a) from bar where foo.b = bar.b);
      
      GPDB would transform it to:
      
      select * from foo inner join
          (select bar.b, avg(bar.a) as avg from bar group by bar.b) sub
      on foo.b = sub.b and foo.a > sub.avg;
      
      To do that, GPDB needs to recurse through the quals in sub-select and
      extract quals of form 'outervar = innervar' and then build new
      SortGroupClause items and TargetEntry items based on these quals for
      sub-select.
      
      But for quals of form 'function(outervar, innervar1) = innvervar2', GPDB
      handles them incorrectly and will cause wrong results issues as
      described in issue #9615.
      
      This patch fixes this issue by treating these kinds of quals as not
      compatible correlated and thus the sub-select would not be converted to
      join.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Reviewed-by: NAsim R P <apraveen@pivotal.io>
      (cherry picked from commit dcdc6c0b)
      a6ee98bf
  27. 13 7月, 2020 1 次提交
    • J
      Fix the assert failure on pullup flow in within group · 7246f370
      Jinbao Chen 提交于
      Flow in AggNode has wrong TargetList. AggNode has a different
      TargetList from its child nodes, so copying flow directly from the
      child node to AggNode is completely wrong. We need to use pullupflow to
      generate this TargetList in creating the within group plan with single
      QE.
      7246f370
  28. 02 7月, 2020 1 次提交
    • A
      Bump Orca version to v3.106.0 · 3a36b539
      Ashuka Xue 提交于
      This commit updates the following functions names
      - CHistogram::Buckets ->  CHistogram::GetNumBuckets
      - CHistogram::ParseDXLToBucketsArray -> CHistogram::GetBuckets
      for clarity on the GPORCA side.
      3a36b539