1. 17 11月, 2020 1 次提交
    • A
      Avoid checking distributed snapshot for visibility checks on QD · 48b13271
      Ashwin Agrawal 提交于
      This is partial cherry-pick from commit
      b3f300b9.  In the QD, the distributed
      transactions become visible at the same time as the corresponding
      local ones, so we can rely on the local XIDs. This is true because the
      modification of local procarray and globalXactArray are protected by
      lock and hence a atomic operation during transaction commit.
      
      We have seen many situations where catalog queries run very slow on QD
      and potential reason is checking distributed logs. Process local
      distributed log cache fall short for this usecase as most of XIDs are
      unique and hence get frequent cache misses. Shared memory cache falls
      short as only caches 8 pages and many times need many more pages to be
      cached to be effective.
      Co-authored-by: NHubert Zhang <hzhang@pivotal.io>
      Co-authored-by: NGang Xiong <gangx@vmware.com>
      48b13271
  2. 21 10月, 2020 1 次提交
    • J
      The inner relation of LASJ_NOTIN should not have partition locaus · 343f8826
      Jinbao Chen 提交于
      The result of NULL not in an unempty set is false. The result of
      NULL not in an empty set is true. But if an unempty set has
      partitioned locus. This set will be divided into several subsets.
      Some subsets may be empty. Because NULL not in empty set equals
      true. There will be some tuples that shouldn't exist in the result
      set.
      
      The patch disable the partitioned locus of inner table by removing
      the join clause from the redistribution_clauses.
      
      this commit cherry pick from 6X_STABLE 8c93db54f3d93a890493f6a6d532f841779a9188
      Co-authored-by: NHubert Zhang <hubertzhang@apache.org>
      Co-authored-by: NRichard Guo <riguo@pivotal.io>
      343f8826
  3. 19 9月, 2020 2 次提交
    • A
      Refactor query string truncation on top of 889ba39e · e393c88b
      Asim R P 提交于
      Commit 889ba39e fixed the query string truncation in dispatcher to
      make it locale-aware.  This patch refactors that change so as to avoid
      accessing a string beyond its length.
      
      Reviewed by: Heikki, Ning Yu and Polina Bungina
      
      (cherry picked from commit abf6b330)
      e393c88b
    • P
      Fix query string truncation while dispatching to QE · b76d049b
      Polina Bungina 提交于
      Execution of a long enough query containing multi-byte characters can cause incorrect truncation of the query string. Incorrect truncation implies an occasional cut of a multi-byte character and (with log_min_duration_statement set to 0 ) subsequent write of an invalid symbol to segment logs. Such broken character present in logs produces problems when trying to fetch logs info from gp_toolkit.__gp_log_segment_ext  table - queries fail with the following error: «ERROR: invalid byte sequence for encoding…».
      This is caused by buildGpQueryString function in `cdbdisp_query.c`, which prepares query text for dispatch to QE. It does not take into account character length when truncation is necessary (text is longer than QUERY_STRING_TRUNCATE_SIZE).
      
      (cherry picked from commit f31600e9)
      b76d049b
  4. 17 9月, 2020 1 次提交
    • A
      Do not read a persistent tuple after it is freed · 5f765a8e
      Asim R P 提交于
      This bug was found in a production environment where vacuum on
      gp_persistent_relation was concurrently running with a backend
      performing end-of-xact filesystem operations.  And the GUC
      debug_persistent_print was enabled.
      
      The *_ReadTuple() function was called on a persistent TID after the
      corresponding tuple was deleted with frozen transaction ID.  The
      concurrent vacuum recycled the tuple and it led to a SIGSEGV when the
      backend tried to access values from the tuple.
      
      Fix it by avoiding the debug log message in case when the persistent
      tuple is freed (transitioning to FREE state).  All other state
      transitions are logged.
      
      In absence of concurrent vacuum, things worked just fine because the
      *_ReadTuple() interface reads tuples from persistent tables directly
      using TID.
      5f765a8e
  5. 04 9月, 2020 1 次提交
  6. 26 8月, 2020 1 次提交
    • X
      PANIC when the shared memory is corrupted · 4f5a2c23
      xiong-gang 提交于
      shmNumGxacts and shmGxactArray are accessed under the protection of
      shmControlLock, this commit add some defensive code and PANIC at the earliest
      when the shared memory is corrupted.
      4f5a2c23
  7. 25 8月, 2020 1 次提交
    • T
      Fix unexpected corrupt of persistent filespace table (#10623) · 424e382a
      Tang Pengzhou 提交于
      With a segment whose primary is down and its mirror is promoted to
      primary, we run gp_remove_segment_mirror to remove the mirror of
      the segment, we see the mirror related fields are cleaned up in
      gp_persistent_filespace_node. But when we run gp_remove_segment_mirror
      for the same segment again, the primary related fields are also
      cleaned up, this is wrong and not expected.
      
      Such a case was observed in production when gprecoverseg -F was
      interrupted in the middle of __updateSystemConfigRemoveAddMirror() and
      run again.
      Reviewed-by: NAsim R P <pasim@vmware.com>
      424e382a
  8. 15 7月, 2020 1 次提交
    • R
      Fix pulling up EXPR sublinks · a6ee98bf
      Richard Guo 提交于
      Currently GPDB tries to pull up EXPR sublinks to inner joins. For query
      
      select * from foo where foo.a >
          (select avg(bar.a) from bar where foo.b = bar.b);
      
      GPDB would transform it to:
      
      select * from foo inner join
          (select bar.b, avg(bar.a) as avg from bar group by bar.b) sub
      on foo.b = sub.b and foo.a > sub.avg;
      
      To do that, GPDB needs to recurse through the quals in sub-select and
      extract quals of form 'outervar = innervar' and then build new
      SortGroupClause items and TargetEntry items based on these quals for
      sub-select.
      
      But for quals of form 'function(outervar, innervar1) = innvervar2', GPDB
      handles them incorrectly and will cause wrong results issues as
      described in issue #9615.
      
      This patch fixes this issue by treating these kinds of quals as not
      compatible correlated and thus the sub-select would not be converted to
      join.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Reviewed-by: NAsim R P <apraveen@pivotal.io>
      (cherry picked from commit dcdc6c0b)
      a6ee98bf
  9. 13 7月, 2020 1 次提交
    • J
      Fix the assert failure on pullup flow in within group · 7246f370
      Jinbao Chen 提交于
      Flow in AggNode has wrong TargetList. AggNode has a different
      TargetList from its child nodes, so copying flow directly from the
      child node to AggNode is completely wrong. We need to use pullupflow to
      generate this TargetList in creating the within group plan with single
      QE.
      7246f370
  10. 24 6月, 2020 1 次提交
    • X
      Fix a recursive AbortTransaction issue · 1a2454ab
      xiong-gang 提交于
      When the error happens after ProcArrayEndTransaction, it will recurse back to
      AbortTransaction, we need to make sure it will not generate extra WAL record
      and not fail the assertions.
      1a2454ab
  11. 17 4月, 2020 1 次提交
    • H
      Fix memory leak in checkpointer process. (#9730) · 7dee0229
      Hao Wu 提交于
      Checkpointer process is a long-live and the current memory context for
      the FOR loop is its own memory context. So any memory leak will lead to
      the checkpointer process hold more and more memory until the memory
      context is reset.
      The `rdata` to build xlog record has 5 pointers that allocated by
      palloc/palloc0 and only one pointer frees the memory. It's better to
      free memory here rather than reseting the memory context in the for loop
      of the checkpointer process.
      7dee0229
  12. 21 3月, 2020 1 次提交
    • (
      (5X)Enable external table's error log to be persistent for ETL. (#9759) · efe41b23
      (Jerome)Junfeng Yang 提交于
      For ETL user scenarios, there are some cases that may frequently create
      and drop the same external table. And once the external table gets
      dropped. All errors stored in the error log will lose.
      
      To enable error log persistent for external with the same
      "dbname"."namespace"."table".
      Bring in `LOG ERRORS PERSISTENTLY` clause. Parse it to
      `error_log_persistent=true` in options to avoid catalog change.
      If a user creates the external table with this clause, the external's
      error log will be name as "dbid_namespaceid_tablename"
      under "errlogpersistent" directory.
      
      And drop external table will ignore to delete the error log.
      
      Create separate `gp_read_persistent_error_log` function to read
      persistent error log.
      If the external table gets deleted, only the namespace owner
      has permission to delete the error log.
      
      Create separate `gp_truncate_persistent_error_log` function to delete
      persistent error log.
      If the external table gets deleted. Only the namespace owner has
      permission to delete the error log.
      It also supports wildcard input to delete error logs
      belong to a database or whole cluster.
      
      If drop an external table create with `error_log_persistent`. And then
      create the same "dbname"."namespace"."table" external table without
      persistent error log. It'll write errors to the normal error log.
      The persistent error log still exists.
      Reviewed-by: NHaozhouWang <hawang@pivotal.io>
      Reviewed-by: NAdam Lee <ali@pivotal.io>
      efe41b23
  13. 03 2月, 2020 1 次提交
    • P
      error out if FileWrite() fails in MirroredAppendOnly_Append(). · b1276803
      Paul Guo 提交于
      It is a safer coding. Some callers of MirroredAppendOnly_Append() seems to use
      Assert() to imply that FileWrite() could not be called by the code paths. I did
      not carefully check their call paths but just conservatively error out in
      MirroredAppendOnly_Append() directly for the wrong FileWrite() case,
      considering low level io function errors are dangerous and are hard to debug.
      b1276803
  14. 16 1月, 2020 2 次提交
    • A
      retryAbortPrepared only if previous attempt failed. · af9e0c83
      Ashwin Agrawal 提交于
      This was spotted by Pengzhou Tang, during code inspection. So, fixing
      this, I don't think had any ill effect but definitely is unnecessary.
      
      Cherry-picked from bfb925fa
      af9e0c83
    • A
      Reduce chances of master PANIC due to failure of phase 2 of 2PC. · 0fc8033d
      Ashwin Agrawal 提交于
      This commit increases retry count and adds small delay between retries
      for 2PC.
      
      Commit-Prepared or Abort-Prepared (phase 2) of 2PC perform retries if
      first attempt fails to complete the transaction. Default only 2
      retries were performed and also were with zero delay. Once retries are
      exhausted master PANIC's and has to continue retrying. Most of the
      times the phase 2 fails on first attempt if segment is undergoing
      recovery or failover happens on mirror. In such instances, just 2
      retries are attempted in msecs and seems to defeat the purpose of the
      retries.
      
      Hence, modifying default number to retries to 10. Also, adding 100msec
      delay between each retry to provide resonable opportunity to succeed
      on retries. This should help avoid master PANICs for not able to
      complete phase 2. I gave lot of thought but couldn't think of any
      downsides from incresing the number of retries.
      
      Also, maximum number allowed to be configured was only 15, which seems
      too restrictive. Mainly for the tests where sometime to avoid
      flakiness and avoiding master panics having higher number of retries
      is helpful. So, changing the guc `dtx_phase2_retry_count` maximum
      allowed to INT_MAX. Don't practically expect it to be set to any value
      higher than some thousands. But think we don't have to be so
      restrictive for the maximum.
      
      Cherry-picked from f66054cd with additional and
      needed test change.
      0fc8033d
  15. 14 1月, 2020 1 次提交
    • H
      Consider different cdbhash value in EC. · 652f8f7b
      Hubert Zhang 提交于
      Continue fix for issue 8918 on GPDB5X
      
      In GPDB5, items in an EC may have different cdbhash values,
      which means their distributed keys may be different as well.
      For example, in T.D = <constant expr>, if T.D is float4,
      while <constant expr> is float8. Even if they are both 1.0,
      we could not consider they could be distributed on the same segment after motion.
      So cdbhash_type_compatibility_table is added to fix this issue.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Reviewed-by: NJinbao Chen <jinchen@pivotal.io>
      652f8f7b
  16. 09 1月, 2020 1 次提交
    • H
      Print warning message in checkNetworkTimeout. · 86ca712a
      Hubert Zhang 提交于
      Packets will be dropped repeatly on some specific ports.
      We need a way to quickly identify this issue.
      But when network is bad, packets will also be dropped.
      In past checkNetworkTimeout will error out when a packet
      failed to receive ACK more than one hour(see GUC
      gp_interconnect_transmit_timeout), which is too strict.
      
      This commit introduces a warning message to report this
      possible problem, and DBA could examine the port in further.
      86ca712a
  17. 07 1月, 2020 1 次提交
  18. 24 12月, 2019 1 次提交
  19. 23 12月, 2019 2 次提交
    • H
      Remove Motion codepath to detoast HeapTuples, convert to MemTuple instead. · e2b63204
      Heikki Linnakangas 提交于
      The Motion sender code has four different codepaths for serializing a
      tuple from the input slot:
      
      1. Fetch MemTuple from slot, copy it out as it is.
      
      2. Fetch MemTuple from slot, re-format it into a new MemTuple by fetching
         and inlining any toasted datums. Copy out the re-formatted MemTuple.
      
      3. Fetch HeapTuple from slot, copy it out as it is.
      
      4. Fetch HeapTuple from slot, copy out each attribute separately, fetching
         and inlining any toasted datums.
      
      In addition to the above, there are "direct" versions of codepaths 1 and 3,
      used when the tuple fits in the caller-provided output buffer.
      
      As discussed in https://github.com/greenplum-db/gpdb/issues/9253, the
      fourth codepath is very inefficient, if the input tuple contains datums
      that are compressed inline, but not toasted. We decompress such tuples
      before serializing, and in the worst case, might need to recompress them
      again in the receiver if it's written out to a table. I tried to fix that
      in commit 4c7f6cf7, but it was broken and was reverted in commit
      774613a8.
      
      This is a new attempt at fixing the issue. This commit removes codepath 4.
      altogether, so that if the input tuple is a HeapTuple with any toasted
      attributes, it is first converted to a MemTuple and codepath 2 is used
      to serialize it. That way, we have less code to test, and materializing a
      MemTuple is roughly as fast as the old code to write out the attributes
      of a HeapTuple one by one, except that the MemTuple codepath avoids the
      decompression of already-compressed datums.
      
      While we're at it, add some tests for the various codepaths through
      SerializeTuple().
      
      To test the performance of the affected case, where the input tuple is
      a HeapTuple with toasted datums, I used this:
      
      ---
      CREATE temporary TABLE foo (a text, b text, c text, d text, e text, f text,
        g text, h text, i text, j text, k text, l text, m text, n text, o text,
        p text, q text, r text, s text, t text, u text, v text, w text, x text,
        y text, z text, large text);
      ALTER TABLE foo ALTER COLUMN large SET STORAGE external;
      INSERT INTO foo
        SELECT 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
               'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
               repeat('1234567890', 1000)
        FROM generate_series(1, 10000);
      
      -- verify that the data is uncompressed, should be about 110 MB.
      SELECT pg_total_relation_size('foo');
      
      \o /dev/null
      \timing on
      SELECT * FROM foo; -- repeat a few times
      ---
      
      The last select took about 380 ms on my laptop, with or without this patch.
      So the new codepath where the input HeapTuple is converted to a MemTuple
      first, is about as fast as the old method. There might be small differences
      in the serialized size of the tuple, too, but I didn't explicitly measure
      that. If you have a toasted but not compressed datum, the input must be
      quite large, so small differences in the datum header sizes shouldn't
      matter much.
      
      If the input HeapTuple contains any compressed datums, this avoids the
      recompression, so even if converting to a MemTuple was somewhat slower in
      that case, it should still be much better than before. I kept the
      HeapTuple codepath for the case that there are no toasted datums. I'm not
      sure it's significantly faster than converting to a MemTuple either; the
      caller has to slot_deform_tuple() the received tuple before it can do
      much with it, and that is slower with HeapTuples than MemTuples. But that
      codepath is straightforward enough that getting rid of it wouldn't save
      much code, and I don't feel like doing the performance testing to justify
      it right now.
      Reviewed-by: NAsim R P <apraveen@pivotal.io>
      e2b63204
    • H
      Remove unnecessary checks for NULL return from getChunkFromCache. · 7ffb084f
      Heikki Linnakangas 提交于
      It cannot return NULL. It will either return a valid pointer, or the
      palloc() will ERROR out.
      Reviewed-by: NAsim R P <apraveen@pivotal.io>
      7ffb084f
  20. 18 12月, 2019 1 次提交
    • A
      Revert "When serializing a tuple for Motion, don't decompress compressed datums." · 6fbe87ad
      Asim R P 提交于
      This reverts commit 788e3e7e.
      
      Thank you Ekta for finding this simple repro that demonstrates the
      problem with this patch and Jesse for initial analysis:
      
         CREATE TABLE foo(a text, b text);
         INSERT INTO foo SELECT repeat('123456789', 100000)::text as a,
                                repeat('123456789', 10)::text as b;
         SELECT * FROM foo;
      
      The motion receiver has no idea whether a datum it received is
      compressed or not, because the varlena header is stripped off before
      sending the data.  Heikki and I discussed two options to fix this:
      
      1. Include varlena header when sending.  This incurs at the most 8-byte
      overhead per variable length datum in a heap tuple.
      
      2. Always send tuples as MemTuples.  This is more desirable because it
      simplifies code, but also comes with performance cost.
      
      Let's evaluate the two options based on performance and then commit the
      best one.
      6fbe87ad
  21. 17 12月, 2019 1 次提交
  22. 03 12月, 2019 1 次提交
  23. 26 11月, 2019 1 次提交
    • A
      Use MVCC snapshot for gp_segment_configuration scan · 5c6f0e24
      Ashwin Agrawal 提交于
      SnapshotNow scans have the undesirable property that, in the face of
      concurrent updates, the scan can fail to see either the old or the new
      versions of the row. As a result, getCdbComponentInfo() using
      SnashotNow and AccessShareLock, may see duplicate entries for dbid, if
      concurrent updates are performed mostly by FTS. Hence, instead use
      MVCC snapshot similar to what's done in createdb() for pg_tablespace
      scans.
      
      With the change MVCC snapshot will be used all the times, when called
      inside the transaction. Only if getCdbComponentInfo() is called
      outside of transaction, which happens in phase 2 of 2PC. This happens
      if COMMIT_PREPARED or ABORT_PREPARED fails, and dispather disconnects
      and distroyies all gangs, and then do RETRY_COMMIT_PREPARED or
      RETRY_ABORT_PREPARED. Since in phase 2 we have marked current
      transaction state to TRANS_COMMIT/ABORT and MyProc->xmin is marked
      with 0, can't acquire transaction snapshot. In 6X_STABLE and higher
      versions this situation was avoided via commit
      eb036ac1 but currently it seems
      complicated to backport this change and hence continuing to use
      SnapshotNow for this special case. For gp_segment_configuration we
      perform sanity checks after scanning and can detect undesirable
      result. If this continues to become problem we can in future, code
      logic for FTS to write the contents and use the same for 5X_STABLE
      too.
      Reviewed-by: NDavid Kimura <dkimura@pivotal.io>
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Reviewed-by: NJimmy Yih <jyih@pivotal.io>
      Reviewed-by: NZhenghua Lyu <kainwen@gmail.com>
      5c6f0e24
  24. 25 11月, 2019 1 次提交
  25. 20 11月, 2019 1 次提交
  26. 15 11月, 2019 3 次提交
    • A
      Revert "Use MVCC snapshot for gp_segment_configuration scan" · cb2b74fa
      Ashwin Agrawal 提交于
      This reverts commit 8de85665. The ICW
      jobs are failing sometime with error "must be called before any
      query". Need probably different way to grab the snapshot.
      cb2b74fa
    • A
      Use MVCC snapshot for gp_segment_configuration scan · 8de85665
      Ashwin Agrawal 提交于
      SnapshotNow scans have the undesirable property that, in the face of
      concurrent updates, the scan can fail to see either the old or the new
      versions of the row. As a result, getCdbComponentInfo() using
      SnashotNow and AccessShareLock, may see duplicate entries for dbid, if
      concurrent updates are performed mostly by FTS. Hence, instead use
      MVCC snapshot similar to what's done in createdb() for pg_tablespace
      scans.
      
      With the change MVCC snapshot will be used all the times, when called
      inside the transaction. Only if getCdbComponentInfo() is called
      outside of transaction, which happens in phase 2 of 2PC. This happens
      if COMMIT_PREPARED or ABORT_PREPARED fails, and dispather disconnects
      and distroyies all gangs, and then do RETRY_COMMIT_PREPARED or
      RETRY_ABORT_PREPARED. Since in phase 2 we have marked current
      transaction state to TRANS_COMMIT/ABORT and MyProc->xmin is marked
      with 0, can't acquire transaction snapshot. In 6X_STABLE and higher
      versions this situation was avoided via commit
      eb036ac1 but currently it seems
      complicated to backport this change and hence continuing to use
      SnapshotNow for this special case. For gp_segment_configuration we
      perform sanity checks after scanning and can detect undesirable
      result. If this continues to become problem we can in future, code
      logic for FTS to write the contents and use the same for 5X_STABLE
      too.
      Reviewed-by: NDavid Kimura <dkimura@pivotal.io>
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Reviewed-by: NJimmy Yih <jyih@pivotal.io>
      Reviewed-by: NZhenghua Lyu <kainwen@gmail.com>
      8de85665
    • A
      Avoid memory-corruption during buildGangDefinition · 24ccc0c5
      Ashwin Agrawal 提交于
      If code found more primaries in cdb_component_dbs compared to size of
      the gang to be created, it would end up writing to incorrect memory
      addresses beyond allocated memory. Hence, add protection for it and
      ERROR out sooner instead of later for the same.
      Reviewed-by: NDavid Kimura <dkimura@pivotal.io>
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Reviewed-by: NJimmy Yih <jyih@pivotal.io>
      Reviewed-by: NZhenghua Lyu <kainwen@gmail.com>
      24ccc0c5
  27. 05 11月, 2019 1 次提交
  28. 24 10月, 2019 1 次提交
    • H
      Add missing break and fallthrough comment within switch-case (#8824) · 9dae1d96
      Hao Wu 提交于
      GCC 7+ has a compiling option, -Wimplicit-fallthrough, which will
      generate a warning/error if the code falls through cases(or default)
      implicitly. The implicity may cause some bugs that are hardly to catch.
      
      1. Append a comment line /* fallthrough */ or like at the end of case block.
      2. Add break clause at the end of case block, if the last statement is
        ereport(ERROR) or like.
      Reviewed-by: NAsim R P <apraveen@pivotal.io>
      9dae1d96
  29. 16 10月, 2019 1 次提交
    • N
      ic: tcp: init incoming conns before outgoing conns · 5b76f9b5
      Ning Yu 提交于
      In SetupTCPInterconnect() we initialize both incoming and outgoing
      connections, a state pointer sendingChunkTransportState is created to
      track the status of outgoing connections, it is an entry of the states
      array, we expect the pointer to be valid during the function.
      
      However, after we get this pointer we will initialize the incoming
      connections, they will resize the states array with repalloc(), so
      sendingChunkTransportState will point to invalid memory and crash at
      runtime.
      
      To fix that we should initialize the incoming connections before the
      outgoing ones, so the sendingChunkTransportState pointer stays valid
      during its lifecycle.
      
      Tests are not added as it has chances to be triggered by existing tests.
      
      (cherry picked from commit 296dba82)
      5b76f9b5
  30. 05 10月, 2019 1 次提交
    • C
      Bump ORCA version to 3.74.0, Introduce PallocMemoryPool for use in GPORCA (#8747) · a3266308
      Chris Hajas 提交于
      We introduce a new type of memory pool and memory pool manager:
      CMemoryPoolPalloc and CMemoryPoolPallocManager
      
      The motivation for this PR is to improve memory allocation/deallocation
      performance when using GPDB allocators. Additionally, we would like to
      use the GPDB memory allocators by default (change the default for
      optimizer_use_gpdb_allocators to on), to prevent ORCA from crashing when
      we run out of memory (OOM). However, with the current way of doing
      things, doing so would add around 10 % performance overhead to ORCA.
      
      CMemoryPoolPallocManager overrides the default CMemoryPoolManager in
      ORCA, and instead creates a CMemoryPoolPalloc memory pool instead of a
      CMemoryPoolTracker. In CMemoryPoolPalloc, we now call MemoryContextAlloc
      and pfree instead of gp_malloc and gp_free, and we don’t do any memory
      accounting.
      
      So where does the performance improvement come from? Previously, we
      would (essentially) pass in gp_malloc and gp_free to an underlying
      allocation structure (which has been removed on the ORCA side). However,
      we would add additional headers and overhead to maintain a list of all
      of these allocations. When tearing down the memory pool, we would
      iterate through the list of allocations and explicitly free each one. So
      we would end up doing overhead on the ORCA side, AND the GPDB side.
      However, the overhead on both sides was quite expensive!
      
      If you want to compare against the previous implementation, see the
      Allocate and Teardown functions in CMemoryPoolTracker.
      
      With this PR, we improve optimization time by ~15% on average and up to
      30-40% on some queries which are memory intensive.
      
      This PR does remove memory accounting in ORCA. This was only enabled
      when the optimizer_use_gpdb_allocators GUC was set. By setting
      `optimizer_use_gpdb_allocators`, we still capture the memory used when
      optimizing a query in ORCA, without the overhead of the memory
      accounting framework.
      
      Additionally, Add a top level ORCA context where new contexts are created
      
      The OptimizerMemoryContext is initialized in InitPostgres(). For each
      memory pool in ORCA, a new memory context is created in
      OptimizerMemoryContext.
      Co-authored-by: NShreedhar Hardikar <shardikar@pivotal.io>
      Co-authored-by: NChris Hajas <chajas@pivotal.io>
      a3266308
  31. 27 9月, 2019 1 次提交
    • A
      Fix crash in COPY FROM if error happens · 1bbbcc09
      Ashwin Agrawal 提交于
      If error happens in CopyFrom() it has defined cdbCopy=NULL, so when
      PG_CATCH() calls COPY_HANDLE_ERROR it triggers PANIC. Hence, check for
      null in cdbCopyEndAndFetchRejectNum().
      
      The crash was exposed by following SQL commands:
      
          CREATE TABLE public.heap01 (a int, b int) distributed by (a);
          INSERT INTO public.heap01 VALUES (generate_series(0,99), generate_series(0,98));
          ANALYZE public.heap01;
      
          COPY (select * from pg_statistic where starelid = 'public.heap01'::regclass) TO '/tmp/heap01.stat';
          DELETE FROM pg_statistic where starelid = 'public.heap01'::regclass;
          COPY pg_statistic from '/tmp/heap01.stat';
      
      Important note: Yes, it's known and strongly recommended to not touch
      the `pg_statistics` or any other catalog table this way. But it's no
      good to panic either. The copy to `pg_statictics` is going to ERROR
      out "correctly" and not crash after this change with `cannot accept a
      value of type anyarray`, as there just isn't any way at the SQL level
      to insert data into pg_statistic's anyarray columns. Refer:
      https://www.postgresql.org/message-id/12138.1277130186%40sss.pgh.pa.us
      1bbbcc09
  32. 19 9月, 2019 1 次提交
    • R
      Fix wrong results caused by NOT_EXISTS sublink elimination. · ff2582bb
      Richard Guo 提交于
      In GPDB 5X, we're trying to eliminate NOT_EXISTS sublink if there is
      'limit 0' in the subquery. To do that, we delete subquery's LIMIT and
      build (n <= 0) expr to be ANDed into the parent qual. In this way, the
      limit can be evaluated at run-time. However, in the case of when n is a
      positive number (or NULL), the expr (n <= 0) would be evaluated as
      false, which causes the whole parent qual to become false. As a result,
      we will get wrong results for query below:
      
      ```
      create table foo(a int);
      insert into foo values (1);
      
      create table bar(b int);
      
      select * from foo where not exists
      		(select 1 from bar where bar.b = foo.a limit 1);
      ```
      
      This patch fixes the wrong results by evaluating the limit at plan-time
      and returning true to be ANDed into the parent qual if n <= 0. If n is a
      positive value (or NULL), however, LIMIT doesn't affect the semantics of
      EXISTS, so this patch just ignores it.
      
      This patch fixes github issue #8369 .
      Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      ff2582bb
  33. 28 8月, 2019 1 次提交
    • P
      Do additional cleanup when setting udp interconnect fails to avoid potential panic. (#8430) · e8391480
      Paul Guo 提交于
      We've seen occasional test failure of icudp/icudp_full due to an unexpected
      panic of the QE process.  That happens when a QE main process elog(ERROR) in
      SetupUDPIFCInterconnect_Internal() while its rx pthread is handling rx packets.
      The memory (in memory context InterconnectContext) which is used to handle rx
      packets is soon reset in resource owner ReleaseCallback function
      destroy_interconnect_handle().
      
      Fixing this by removing connection entries from hash table when
      SetupUDPIFCInterconnect_Internal() errors out.
      
      Cherry-picked from 10dba408
      
      In addition, enable test DPICFullTestCases.test_icudp_full test.
      According to the error message it is really unreasonable to disable
      the test case.
      e8391480
  34. 21 8月, 2019 1 次提交
    • X
      Fix interconnect retransmission period · fd5af3ee
      xiong-gang 提交于
      For a query like:
      SELECT ... ORDER BY key LIMIT n;
      
      When the query running time is longer than UNACK_QUEUE_RING_LENGTH * TIMER_SPAN
      (By default is 10 seconds), and the first tuple send from OrderBy node to QD is
      not acknowledged, function putIntoUnackQueueRing will calculate an
      inappropriate retransmission period, which is UNACK_QUEUE_RING_LENGTH *
      TIMER_SPAN.
      fd5af3ee
  35. 16 8月, 2019 1 次提交
    • D
      Fix parser algorithm for setting distribution key column numbers · 8e6af8a7
      Denis Smirnov 提交于
      in gp_distribution_policy relation for inherited tables in GP5X
      (GP6X is ok).
      Query planes with GPORCA caused segmentation fault because of
      out of range column numbers and postgres optimizer simply
      returned error before current patch. For example:
      
      create table ta (a int) distributed randomly;
      create table tb (a int, b int) inherits (ta) distributed by (b);
      
      set optimizer=on;
      insert into tb values(0, 0);
      -- Segmentation fault
      
      set optimizer=off;
      insert into tb values(0, 0);
      -- ERROR: no tlist entry for key 3 (cdbmutate.c:1484)
      
      select attrnums from gp_distribution_policy where localoid = 'tb'::regclass;
       attrnums
      ----------
      {3}
      (1 row)
      
      Also a check for setting non-hashable distribution key from a
      parent table in an inherited one didn't work.
      
      create table tc (a point) distributed randomly;
      create table td (b int) inherits (tc) distributed by (a);
      
      select * from td;
      ERROR:  could not find mergejoinable = operator for type 600 (pathkeys.c:1174)
      Co-authored-by: NVasiliy Ivanov <7vasiliy@gmail.com>
      Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
      8e6af8a7