1. 30 7月, 2020 1 次提交
    • A
      Improve cardinality for joins using distribution columns in ORCA (#10479) · b0c1d810
      Ashuka Xue 提交于
      This commit only affects cardinality estimation in ORCA when the user
      sets `optimizer_damping_factor_join = 0`. It improves the square root
      algorithm first introduced by commit ce453cf2.
      
      In the original square root  algorithm, we assumed that distribution
      column predicates would have some correlation with other predicates in
      the join and therefore would be accordingly damped when calculating join
      cardinality.
      
      However, distribution columns are ideally unique in order to gain the
      best performance for GPDB. Under this assumption, distribution columns
      should not be correlated and thus needed to be treated as independent
      when calculating join cardinality. This is a best guess since we do not
      have a way to support correlated columns at this time.
      Co-authored-by: NAshuka Xue <axue@vmware.com>
      Co-authored-by: NChris Hajas <chajas@vmware.com>
      b0c1d810
  2. 29 7月, 2020 4 次提交
    • N
      ic-proxy: include postmaster pid in the domain socket path · 5c5a358a
      Ning Yu 提交于
      We used to store them under /tmp/, we include the postmaster port number
      in the file name in the hope that two clusters will not conflict with
      each other on this file.
      
      However the conflict still happen in the test src/bin/pg_basebackup.
      And it can also happen if a second cluster is missed configured by
      accident.  So to make things safe we also include the postmaster pid in
      the domain socket path, there is no chance for two postmasters to share
      the same pids.
      Reviewed-by: NPaul Guo <pguo@pivotal.io>
      5c5a358a
    • D
      d6362126
    • D
      Add Orca support for index only scan · 3b72df18
      David Kimura 提交于
      This commit allows Orca to select plans that leverage IndexOnlyScan
      node. A new GUC 'optimizer_enable_indexonlyscan' is used to enable or
      disable this feature. Index only scan is disabled by default, until the
      following issues are addressed:
      
        1) Implement cost comparison model for index only scans. Currently,
           cost is hard coded for testing purposes.
        2) Support index only scan using GiST and SP-GiST as allowed.
           Currently, code only supports index only scans on b-tree index.
      Co-authored-by: NChris Hajas <chajas@vmware.com>
      3b72df18
    • M
      docs - update ALTER TABLE ... ADD COLUMN for AOCO tables (#10518) · a4b5a80d
      Mel Kiyama 提交于
      Add information about the compression parameters used when
      a column is added to an AOCO table.
      a4b5a80d
  3. 28 7月, 2020 5 次提交
    • T
      Partial revert ic-proxy: enable ic-proxy in gpdb packages · 88d7d969
      Tyler Ramer 提交于
      The libuv source is in the build container, rather thank in a gcp
      bucket. This commit is a partial revert from getting source from a
      gcp bucket and instead uses the libuv available in the build
      container image.
      Co-authored-by: NTyler Ramer <tramer@vmware.com>
      Co-authored-by: NKris Macoskey <kmacoskey@vmware.com>
      Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>
      88d7d969
    • P
      Fix flaky test isolation2:pg_basebackup_with_tablespaces (#10509) · c8b00ac7
      Paul Guo 提交于
      Here is the diff output of the test result.
      
       drop database some_database_without_tablespace;
       -DROP
       +ERROR:  database "some_database_without_tablespace" is being accessed by other users
       +DETAIL:  There is 1 other session using the database.
       drop tablespace some_basebackup_tablespace;
       -DROP
       +ERROR:  tablespace "some_basebackup_tablespace" is not empty
      
      The reason is that after client connection to the database exits, the server
      needs some time (the process might be scheduled out soon, and the operation
      needs to content for the ProcArrayLock lock) to release the PGPROC in
      proc_exit()->ProcArrayRemove(). During dropdb() (for database drop), postgres
      will call CountOtherDBBackends() to see if there are still sessions that are
      using the database by checking proc->databaseId, and it will try at most 5 sec.
      This test quits the db connection of some_database_without_tablespace and then
      drops the database immediately. This should be mostly fine but if the system is
      in slow or in heavy load, this still could lead to test flakiness.
      
      This issue could be simulated using gdb. Let's poll until database drop
      commands succeeds for the affected database.  It seems that drop database sql
      command could not be in transaction block so I could not use plpgsql to
      implement, instead I use dropdb utility and bash command to implement that.
      Reviewed-by: NAsim R P <pasim@vmware.com>
      c8b00ac7
    • H
      Don't mock GPORCA functions unnecessarily. · e5fa0d7a
      Heikki Linnakangas 提交于
      None of the mock tests in the backend call into GPORCA functions, so
      no need to generate mock objects for them. Saves some time and space when
      running mock tests.
      Reviewed-by: NAshuka Xue <axue@pivotal.io>
      e5fa0d7a
    • T
      86bcab98
    • T
      Remove gphostcache · f61d35cd
      Tyler Ramer 提交于
      Gphostcache has numerous issues, and has been a pain point for some
      time. For this reason, we are removing it.
      
      This commit moves the useful function of gphostcache - the hostname
      deduping - to the gparray class, where a list of deduped hostnames is
      returned from gpArray.get_hostlist().
      
      There is a FIXME of correctly adding hostname to a newly added or
      recovered mirror. The hostname resolution from address was incorrect and
      faulty in its logic - an IP address never requires a hostname associated
      with it. However, the hostname field in gp_segment_configuration should
      be populated somehow - we recommend a "hostname" field addition to any
      configuration files that require it. For now, we simple set the
      "hostname" to "address" which ultimately delivers the same functionality
      as the gphostcache implementation.
      Co-authored-by: NTyler Ramer <tramer@vmware.com>
      Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>
      f61d35cd
  4. 25 7月, 2020 1 次提交
  5. 24 7月, 2020 3 次提交
    • H
      Fix and enable upstream pg_dump regression tests. · 91c8b403
      Heikki Linnakangas 提交于
      We skipped these during the PostgreSQL 9.6 merge. Time to pay down this
      little technical debt.
      
      We had copied the latest version of the t/002_pg_dump.pl test file from
      upstream REL9_6_STABLE branch. That included the test change from upstream
      commit e961341cc, but we didn't have the corresponding code changes yet.
      Revert the test change, so that it passes. We'll get the bug fix, along
      with the test change again, when we continue with the upstream merge, but
      until then let's just keep the test in sync with the code.
      Reviewed-by: NAshwin Agrawal <aashwin@vmware.com>
      91c8b403
    • (
      Resolve high `CacheMemoryContext` usage for `ANALYZE` on large partition table. (#10499) · 533a47dd
      (Jerome)Junfeng Yang 提交于
      In some cases, merge stats logic for root partition table may consume
      very high memory usage in CacheMemoryContext.
      This may lead to `Canceling query because of high VMEM usage` when
      concurrently ANALYZE partition tables.
      
      For example, there are several root partition tables and they both have
      thousands of leaf tables. And these tables are all wide tables that may
      contain hundreds of columns.
      So when analyze()/auto_stats() leaf tables concurrently,
      `leaf_parts_analyzed` will consume lots of memory(catalog catch for
      pg_statistic and pg_attribute) under
      CacheMemoryContext for each backend, which may hit the protect VMEM
      limit.
      In `leaf_parts_analyzed`, a single backend's leaf table analysis for a
      root partition table, it may add cache entries up to
      number_of_leaf_tables * number_of_columns tuples from pg_statistic and
      number_of_leaf_tables * number_of_columns tuples from pg_arrtibute.
      Set guc `optimizer_analyze_root_partition` or
      `optimizer_analyze_enable_merge_of_leaf_stats` to false could skip merge
      stats for root table and `leaf_parts_analyzed` will not execute.
      
      To resolve this issue:
      1. When checking whether merge stats are available for a root table in
      `leaf_parts_analyzed`, check whether all leaf tables are ANALYZEd first,
      if they're still un-ANALYZE leaf table exists, return quickly to avoid touch
      columns' pg_attribute and pg_statistic per leaf table(this will save lots of time).
      And also don't rely on system catalog cache and use the
      index to fetch the stats tuple to avoid one-time cache usage(in common cases).
      
      2. When merging a stats in `merge_leaf_stats`, don't rely on system
      catalog cache and use the index to fetch the stats tuple.
      
      There are side-effects for not rely on system catalog cache(which are all **rare** situations).
      1. If insert/update/copy several leaf tables which under **same
      root partition** table in **same session** and all leaf tables are **analyzed**
      will be much slower since auto_stats will call `leaf_parts_analyzed` once the leaf
      table gets updated, and we don't rely on system catalog cache now.
      (`set optimizer_analyze_enable_merge_of_leaf_stats=false` could avoid
      this)
      
      2. ANALYZE the same root table several times in the same session is much
      slower than before since we don't rely on system catalog cache.
      
      Seems this solution improves the performance for ANALYZE, and
      it also makes ANALYZE won't hit the memory issue anymore.
      533a47dd
    • N
      ic-proxy: get libuv on pr pipeline · ef887cfe
      Ning Yu 提交于
      We must install libuv on the PR pipeline to compile with ic-proxy
      enabled.  ICW tests are still run in ic-udpifc mode.
      ef887cfe
  6. 23 7月, 2020 6 次提交
  7. 21 7月, 2020 6 次提交
    • A
      Fix log_errors related issue while creating pxf_fdw extension · e6ec0e0a
      Amit Khandekar 提交于
      On an ARM64 machine, CREATE EXTENSION pxf_fdw fails with :
      ERROR:  the log_errors option cannot be set without reject_limit
      
      In pxf_fdw_validator(), the variable log_errors is declared bool, but
      it is initialized with -1. Since bool seems to be now a built-in type,
      it's definition is implementation dependent, and it's possible that on
      ARM, it is defined as unsigned char. Because, through debugger it
      could be seen that log_errors's value is 255 when it was assigned -1.
      And because (log_errors != -1) condition returns true, we get the
      error "log_errors option cannot be set without reject_limit" even when
      log_errors option was not specified while creating the extension.
      Due to this, all pxf_fdw tests were failing on ARM64.
      
      If log_errors is specified, set log_errors to true rather than to
      defGetBoolean(def) value. And rename log_errors to log_errors_set to
      reflect that its purpose is not to store the log_errors value, but
      rather to denote whether log_errors option was specified.
      e6ec0e0a
    • (
      Fix flaky for test instr_in_shmem_terminate. (#10469) · bbccf20c
      (Jerome)Junfeng Yang 提交于
      Enlarge the sleep time for a query which will be canceled later
      to avoid slow execution fails the test.
      
      Normally the test's running time should not get affected since the
      sleep query will get terminate immediately.
      bbccf20c
    • P
      Use postgres database for pg_rewind cleanly shutdown execution to avoid potential pg_rewind hang. · 288908f3
      Paul Guo 提交于
      During testing, I encountered an incremental gprecoverseg hang issue.
      Incremental gprecoverseg is based on pg_rewind.  pg_rewind launches a single
      mode postgres process and quits after crash recovery if the postgres instance
      was not cleanly shut down - this is used to ensure that the postgres is in a
      consistent state before doing incremental recovery. I found that the single
      mode postgres hangs with the below stack.
      
      \#1  0x00000000008cf2d6 in PGSemaphoreLock (sema=0x7f238274a4b0, interruptOK=1 '\001') at pg_sema.c:422
      \#2  0x00000000009614ed in ProcSleep (locallock=0x2c783c0, lockMethodTable=0xddb140 <default_lockmethod>) at proc.c:1347
      \#3  0x000000000095a0c1 in WaitOnLock (locallock=0x2c783c0, owner=0x2cbf950) at lock.c:1853
      \#4  0x0000000000958e3a in LockAcquireExtended (locktag=0x7ffde826aa60, lockmode=3, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001', locallockp=0x0) at lock.c:1155
      \#5  0x0000000000957e64 in LockAcquire (locktag=0x7ffde826aa60, lockmode=3, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:700
      \#6  0x000000000095728c in LockSharedObject (classid=1262, objid=1, objsubid=0, lockmode=3) at lmgr.c:939
      \#7  0x0000000000b0152b in InitPostgres (in_dbname=0x2c769f0 "template1", dboid=0, username=0x2c59340 "gpadmin", out_dbname=0x0) at postinit.c:1019
      \#8  0x000000000097b970 in PostgresMain (argc=5, argv=0x2c51990, dbname=0x2c769f0 "template1", username=0x2c59340 "gpadmin") at postgres.c:4820
      \#9  0x00000000007dc432 in main (argc=5, argv=0x2c51990) at main.c:241
      
      It tries to hold the lock for template1 on pg_database with lockmode 3 but
      it conflicts with the lock with lockmode 5 which was held by a recovered dtx
      transaction in startup RecoverPreparedTransactions(). Typically the dtx
      transaction comes from "create database" (by default the template database is
      template1).
      
      Fixing this by using the postgres database for single mode postgres execution.
      The postgres database is commonly used in many background worker backends like
      dtx recovery, gdd and ftsprobe. With this change, we do not need to worry
      about "create database" with template postgres, etc since they won't succeed,
      thus avoid the lock conflict.
      
      We may be able to fix this in InitPostgres() by bypassing the locking code in
      single mode but the current fix seems to be safer.  Note InitPostgres()
      locks/unlocks some other catalog tables also but almost all of them are using
      lock mode 1 (except mode 3 pg_resqueuecapability per debugging output).  It
      seems that it is not usual in real scenario to have a dtx transaction that
      locks catalog with mode 8 which conflicts with mode 1.  If we encounter this
      later we need to think out a better (might not be trivial) solution for this.
      For now let's fix the issue we encountered at first.
      
      Note in this patch the code fixes in buildMirrorSegments.py and twophase.c are
      not related to this patch. They do not seem to be strict bugs but we'd better
      fix them to avoid potential issues in the future.
      Reviewed-by: NAshwin Agrawal <aashwin@vmware.com>
      Reviewed-by: NAsim R P <pasim@vmware.com>
      288908f3
    • P
      Update pre-allocated shared snapshot slot number. · f6c59503
      Paul Guo 提交于
      Previously it used max_prepared_xacts for shared snapshot slot number. The
      reason that it does not use MaxBackends, per comment, is that ideally on QE we
      want to use QD MaxBackends for the slot number, and note usually QE MaxBackends
      should be greater than QD MaxBackends due to potential multiple gangs per
      query. The code previously used max_prepared_xacts finally for the shared
      snapshot slot number calculation. That is not correctly given we have read-only
      query, and we have one-phase commit now.  Let's use MaxBackends for shared
      snapshot slot number calculation for safety though this might waste some memory.
      Reviewed-by: Nxiong-gang <gxiong@pivotal.io>
      f6c59503
    • P
      Limit gxact number on master with MaxBackends. · 2a961e65
      Paul Guo 提交于
      Previously we assign it as max_prepared_xacts. It is used to initialize some
      2pc related shared memory. For example the array shmCommittedGxactArray is
      created with this length and that array is used to collect not-yet "forgotten"
      distributed transactions during master/standby recovery, but the array length
      might be problematic since:
      
      1. If master max_prepared_xacts is equal to segment max_prepared_xacts as
      usual.  It is possible some distributed transactions use just partial gang so
      the total distributed transactions might be larger (and even much larger) than
      max_prepared_xacts. The document says max_prepared_xacts should be greater than
      max_connections but there is no code to enforce that.
      
      2. Also it is possible that master max_prepared_xacts might be different than
      segment max_prepared_xacts (although the document does not suggest it there is
      no code to enforce that),
      
      To fix that we use MaxBackends for the gxact number on master. We may just use
      guc max_connections (MaxBackends includes number for autovacuum workers and bg
      workers additionally besides guc max_connections), but I'm conservatively using
      MaxBackends,  since this issue is annoying - standby can not recover due to the
      FATAL message as below even after postgres reboot unless we temporarily
      increase the guc max_prepared_transactions value.
      
      2020-07-17 16:48:19.178667
      CST,,,p33652,th1972721600,,,,0,,,seg-1,,,,,"FATAL","XX000","the limit of 3
      distributed transactions has been reached","It should not happen. Temporarily
      increase max_connections (need postmaster reboot) on the postgres (master or
      standby) to work around this issue and then report a bug",,,,"xlog redo at
      0/C339BA0 for Transaction/DISTRIBUTED_COMMIT: distributed commit 2020-07-17
      16:48:19.101832+08 gid = 1594975696-0000000009, gxid =
      9",,0,,"cdbdtxrecovery.c",571,"Stack trace:
      
      1    0xb3a30f postgres errstart (elog.c:558)
      2    0xc3da4d postgres redoDistributedCommitRecord (cdbdtxrecovery.c:565)
      3    0x564227 postgres <symbol not found> (xact.c:6942)
      4    0x564671 postgres xact_redo (xact.c:7080)
      5    0x56fee5 postgres StartupXLOG (xlog.c:7207)
      Reviewed-by: Nxiong-gang <gxiong@pivotal.io>
      2a961e65
    • P
      Make test function wait_for_replication_replay() a common UDF. · af942980
      Paul Guo 提交于
      We need that in more than one test.
      Reviewed-by: Nxiong-gang <gxiong@pivotal.io>
      af942980
  8. 17 7月, 2020 6 次提交
    • H
      Change log level in ExecChooseHashTableSize · 6b4d93c5
      Hubert Zhang 提交于
      ExecChooseHashTableSize() is a hot function which is not only called by executor,
      but also by planner. Planner will call this function when calcualting cost for
      each join path. The number of join path grow exponentially with the number of
      table. As a result, do not using elog(LOG) to avoid generating too many logs.
      6b4d93c5
    • A
      Add test concurrent_drop_truncate_tablespace to isolation2_schedule · 448d6aae
      Alexandra Wang 提交于
      Commit 362c48b6 added test
      concurrent_drop_truncate_tablespace but it never made it to the
      schedule.
      Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
      448d6aae
    • J
      Do not allocate MemoryPoolManager from a memory pool · dd3e7ff9
      Jesse Zhang 提交于
      Our implementations of memory pools have a hidden dependency on _the_
      global memory pool manager: typically GPOS_NEW and GPOS_DELETE will
      reach for the memory pool manager singleton. This makes GPOS_DELETE on a
      memory pool manager undefined behavior because we call member functions
      on an object after its destructor finishes.
      
      On the Postgres 12 merge branch, this manifests itself in a crash during
      initdb. More concerning is that it only crashed when we set max
      connections and shared buffers to a specific number.
      dd3e7ff9
    • A
      gporca: Use portable way to get frame address. · 7c1891fb
      Amit Khandekar 提交于
      GPOS_ASMFP() used x86_64 assembly instructions to get current frame
      address. This obviously doesn't compile on other architectures like
      ARM64. So instead use __builtin_frame_address(), which is available
      in gcc and presumably clang. Since gcc and clang are the two most
      common compilers, and since we don't want to support GPORCA on exotic
      architectures and compilers, don't bother to use any other way to get
      the frame address.
      
      Let configure fail if __builtin_frame_address() is not found, but
      don't do this check if gporca is disabled.
      
      GPORCA's CStackDescriptor::Backtrace() uses frame address. But there
      is also gp_backtrace() in the backend code that has similar
      functionality. This commit does not merge these two places. But it
      prepares the infrastructure to do the merge, including a new macro
      HAVE__BUILTIN_FRAME_ADDRESS defined in pg_config.h.
      
      Discussion: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/FgaR_4sGYrkReviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      7c1891fb
    • M
      docs - update utility docs with IP/hostname information. (#10379) · 54dbd926
      Mel Kiyama 提交于
      * docs - update utility docs with IP/hostname information.
      
      Add information to gpinitsystem, gpaddmirrors, and gpexpand ref. docs
      --Information about using hostnames vs. IP addresses
      --Information about configuring hosts that are configured with mulitple NICs
      
      Also updated some examples in gpinitsystem
      
      * docs - review comment updates. Add more information from dev.
      
      * docs - change examples to show valid configurations that support failorver.
      Also fix typos and minor edits.
      
      * docs - updates based on review comments.
      54dbd926
    • L
      docs - greenplumr input.signature (#10477) · 1c294e95
      Lisa Owen 提交于
      1c294e95
  9. 16 7月, 2020 3 次提交
  10. 15 7月, 2020 5 次提交
    • Z
      Remove deadcode contain_ctid_var_reference. · d229288a
      Zhenghua Lyu 提交于
      It was used to implement dedup plan which has been
      refactored by the commit 9628a332.
      
      So in this commit we remove these unused functions.
      d229288a
    • P
      Fix flaky test case 'gpcopy' · 9480d631
      Pengzhou Tang 提交于
      The failed test case is to test the command "copy lineitem to '/tmp/abort.csv'"
      can be cancelled after COPY is dispatched to QEs. To verify this, it checks that
      /tmp/abort.csv has fewer rows than lineitem.
      
      The cancel logical in codes is:
      
      QD dispatched the COPY command to QEs, then if QD get a cancel interrupt, it
      sends a cancel request to QEs, however, the QD will keep receiving data from
      QEs even QD already get a cancel interrupt. QD relies on QEs to receive the
      cancel request and explicitly stop copying data to QD.
      
      Obviously, QEs may already have copied out all data to QDs before they
      get cancel requests, so the test case cannot guarantee /tmp/aborted.csv
      has fewer rows than lineitem.
      
      To fix this, we just verify the COPY command can be aborted with message
      'ERROR:  canceling statement due to user request', the count
      verification looks pointless here.
      9480d631
    • H
      Cleanup idle reader gang after utility statements · d1ba4da5
      Hubert Zhang 提交于
      Reader gangs use local snapshot to access catalog, as a result, it will
      not synchronize with the sharedSnapshot from write gang which will
      lead to inconsistent visibility of catalog table on idle reader gang.
      Considering the case:
      
      select * from t, t t1; -- create a reader gang.
      begin;
      create role r1;
      set role r1;  -- set command will also dispatched to idle reader gang
      
      When set role command dispatched to idle reader gang, reader gang
      cannot see the new tuple t1 in catalog table pg_auth.
      To fix this issue, we should drop the idle reader gangs after each
      utility statement which may modify the catalog table.
      Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
      d1ba4da5
    • Z
      Correct plan of general & segmentGeneral path with volatiole functions. · d1f9b96b
      Zhenghua Lyu 提交于
      General and segmentGeneral locus imply that if the corresponding slice
      is executed in many different segments should provide the same result
      data set. Thus, in some cases, General and segmentGeneral can be
      treated like broadcast.
      
      But what if the segmentGeneral and general locus path contain volatile
      functions? volatile functions, by definition, do not guarantee results
      of different invokes. So for such cases, they lose the property and
      cannot be treated as *general. Previously, Greenplum planner
      does not handle these cases correctly. Limit general or segmentgeneral
      path also has such issue.
      
      The fix idea of this commit is: when we find the pattern (a general or
      segmentGeneral locus paths contain volatile functions), we create a
      motion path above it to turn its locus to singleQE and then create a
      projection path. Then the core job becomes how we choose the places to
      check:
      
        1. For a single base rel, we should only check its restriction, this is
           the at bottom of planner, this is at the function set_rel_pathlist
        2. When creating a join path, if the join locus is general or segmentGeneral,
           check its joinqual to see if it contains volatile functions
        3. When handling subquery, we will invoke set_subquery_pathlist function,
           at the end of this function, check the targetlist and havingQual
        4. When creating limit path, the check and change algorithm should also be used
        5. Correctly handle make_subplan
      
      OrderBy clause and Group Clause should be included in targetlist and handled
      by the above Step 3.
      
      Also this commit fixes DMLs on replicated table. Update & Delete Statement on
      a replicated table is special. These statements have to be dispatched to each
      segment to execute. So if they contain volatile functions in their targetList
      or where clause, we should reject such statements:
      
        1. For targetList, we check it at the function create_motion_path_for_upddel
        2. For where clause, they will be handled in the query planner and if we
           find the pattern and want to fix it, do another check if we are updating
           or deleting replicated table, if so reject the statement.
        3. Upsert case is handled in transform stage.
      d1f9b96b
    • J
      Fix uninitialized variable in pgrowlocks · 75283bc7
      Japin 提交于
      Because the variable rel is only used in if (SRF_IS_FIRSTCALL()) branch,
      we should move it's declaration into this branch (suggested by Hubert Zhang).
      75283bc7