1. 28 9月, 2018 1 次提交
    • Z
      Allow tables to be distributed on a subset of segments · 4eb65a53
      ZhangJackey 提交于
      There was an assumption in gpdb that a table's data is always
      distributed on all segments, however this is not always true for example
      when a cluster is expanded from M segments to N (N > M) all the tables
      are still on M segments, to workaround the problem we used to have to
      alter all the hash distributed tables to randomly distributed to get
      correct query results, at the cost of bad performance.
      
      Now we support table data to be distributed on a subset of segments.
      
      A new columne `numsegments` is added to catalog table
      `gp_distribution_policy` to record how many segments a table's data is
      distributed on.  By doing so we could allow DMLs on M tables, joins
      between M and N tables are also supported.
      
      ```sql
      -- t1 and t2 are both distributed on (c1, c2),
      -- one on 1 segments, the other on 2 segments
      select localoid::regclass, attrnums, policytype, numsegments
          from gp_distribution_policy;
       localoid | attrnums | policytype | numsegments
      ----------+----------+------------+-------------
       t1       | {1,2}    | p          |           1
       t2       | {1,2}    | p          |           2
      (2 rows)
      
      -- t1 and t1 have exactly the same distribution policy,
      -- join locally
      explain select * from t1 a join t1 b using (c1, c2);
                         QUERY PLAN
      ------------------------------------------------
       Gather Motion 1:1  (slice1; segments: 1)
         ->  Hash Join
               Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
               ->  Seq Scan on t1 a
               ->  Hash
                     ->  Seq Scan on t1 b
       Optimizer: legacy query optimizer
      
      -- t1 and t2 are both distributed on (c1, c2),
      -- but as they have different numsegments,
      -- one has to be redistributed
      explain select * from t1 a join t2 b using (c1, c2);
                                QUERY PLAN
      ------------------------------------------------------------------
       Gather Motion 1:1  (slice2; segments: 1)
         ->  Hash Join
               Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
               ->  Seq Scan on t1 a
               ->  Hash
                     ->  Redistribute Motion 2:1  (slice1; segments: 2)
                           Hash Key: b.c1, b.c2
                           ->  Seq Scan on t2 b
       Optimizer: legacy query optimizer
      ```
      4eb65a53
  2. 27 9月, 2018 8 次提交
    • H
      Remove built-in stub functions for QuickLZ compressor. · 589533be
      Heikki Linnakangas 提交于
      The proprietary build can install them as normal C language functions,
      with CREATE FUNCTION, instead.
      
      In the passing, remove some unused QuickLZ debugging GUCs.
      
      This doesn't yet get rid of all references to QuickLZ, unfortunately. The
      GUC and reloption validation code still needs to know about it, so that
      they can validate the options read from postgresql.conf, when starting up
      postmaster. For the same reason, you cannot yet add custom compression
      algorithms, besides quicklz, as an extension. But this is another step in
      the right direction, anyway.
      Co-authored-by: NJimmy Yih <jyih@pivotal.io>
      Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
      589533be
    • D
      Rename variable to avoid risk of type collision · 5da162c4
      Daniel Gustafsson 提交于
      The comment states that "small" might be defined by socket.h, and while
      thats not true for all versions of sys/socket.h, it's still not a good
      name to use as it's common in Windows headers (should we ever revive a
      Windows port). Renaming to a non-colliding name is a small price to pay
      to avoid subtle bugs, so rename and remove the preprocessor dance.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      5da162c4
    • D
      Refactor qp_functions test suite · 8f770d6d
      Daniel Gustafsson 提交于
      The test suite, which was ported over from TINC, was ignoring so much of
      the memorized output that it more or less didn't test anything (and the
      ignored blocks was as full of outdated output as one would imagine). The
      code was also formatted in weird ways and had needless NOTICEs thrown
      during execution.
      
      This refactors the testsuite to remove all ignore blocks, removes some
      utterly pointless tests (there are many more of them left), formats the
      code to be readable, fixes the output to work and removes some duplicate
      tests.
      
      The remaining bits of the suite is by no means terribly interestering,
      but it runs fast enough that it's worth keeping the leftovers for now.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      8f770d6d
    • T
      Dispatcher can create flexible size gang (#5701) · a3ddac06
      Tang Pengzhou 提交于
      * change type of db_descriptors to SegmentDatabaseDescriptor **
      
      A new gang definination may consist of cached segdbDesc and new
      created segdbDesc, there is no need to palloc all segdbDesc struct
      as new.
      
      * Remove unnecessary allocate gang unit test
      
      * Manage idle segment dbs using CdbComponentDatabases instead of available* lists.
      
      To support vary size gang, we now need to manage segment dbs in a lower
      granularity, previously, idle QEs is managed by a bunch of lists like
      availablePrimaryWriterGang, availableReaderGangsN, this restrict
      dispatcher to only create N-size (N = number of segments) or 1-size
      gang.
      
      CdbComponentDatabases is a snapshot of segment components within current
      cluster, now it maintains a freelist for each segment component. When
      creating gang, dispatcher will make up a gang from each segment
      component (from freelist or create a new segment db). When cleaning up
      a gang, dispatcher will return idle segment dbs to each segment
      component.
      
      CdbComponentDatabases provide a few functions to manipulate segment dbs
      (SegmentDatabaseDescriptor *):
      * cdbcomponent_getCdbComponents
      * cdbcomponent_destroyCdbComponents
      * cdbcomponent_allocateIdleSegdb
      * cdbcomponent_recycleIdleSegdb
      * cdbcomponent_cleanupIdleSegdbs
      
      CdbComponentDatabases is also FTS version sensitive, so once a FTS
      version changed, CdbComponentDatabases destroy all idle segment dbs
      and allocate QEs in the new promoted segment. This provides the ability
      to transparent mirror failover to users.
      
      Since segment dbs(SegmentDatabaseDescriptor *) are managed by
      CdbComponentDatabases now, we can simplify the memory context
      management by replacing GangContext & perGangContext with
      DispatcherContext & CdbComponentsContext.
      
      * Postpone the error hanlding when creating gang
      
      Now we have AtAbort_DispatcherState, one advantage of it is that
      we can postpone gang error hanlding in this function and make
      code cleaner.
      
      * Handle FTS version change correctly
      
      In some cases, when a FTS version changed, we can't update current
      snapshot of segment components, to be more specifically, we can't
      destroy current writer segment dbs and create new segment dbs.
      
      These cases include:
      * session has temp table created.
      * query need two-phase commit and gxid has been dispatched to
        segments.
      
      * Replace <gangId, sliceId> map with <qeIdentifier, sliceId> map
      
      We used to dispatch a <gangId, sliceId> map along with query to
      segment dbs so segment dbs can know which slice they should
      execute.
      
      Now gangId is useless for a segment db because a segment db can
      be reused by different gang, so we need a new way to tell the
      info to segment dbs. To resolve this, CdbComponentDatabases
      assign a unique identifier to each segment db and make up a
      bitmap set which consist of segment identifiers for each slice,
      segment dbs then can go through the slice table and find the
      right slice to execute.
      
      * Allow dispatcher to create vary size gang and refine AssignGangs()
      
      Previously, dispatcher can only create N-size gang for
      GANGTYPE_PRIMARY_WRITER or GANGTYPE_PRIMARY_READER. this
      restrict dispatcher in many ways, one example is direct
      dispatch, it always create a N-size gang even it only
      dispatch the command to one segment, another example is
      some operations may be able to use N+ size gang, like
      hash join, if both inner and outer plan is redistributed,
      the hash join node can associate with a N+ size gang to
      execute. This commit changes the API of createGang() so the
      caller can specify a list of segments (partial or even
      duplicate segments), CdbCompoentDatabase will guarantee
      each segment has only one writer in a session. With this
      it also resolves another pain point of AssignGangs(), so
      the caller don't need to promote a GANGTYPE_PRIMARY_READER
      to GANGTYPE_PRIMARY_WRITER, or promote a GANGTYPE_SINGLETON
      _READER to GANGTYPE_PRIMARY_WRITER for replicated table
      (see FinalizeSliceTree()).
      
      With this commit, AssignGang() is very clear now.
      a3ddac06
    • P
      Remove remove_subquery_in_RTEs() call in standard_planner() (#5863) · 69cd1ec5
      Paul Guo 提交于
      As the comment said, this was useful howerver now that we have
      upstream add_rte_to_flat_rtable() to handle that, let's remove
      this call.
      69cd1ec5
    • D
      Remove unused variable · cc853420
      Daniel Gustafsson 提交于
      Fixes clang (and probably gcc) compiler warning on unused variable.
      Reviewed-by: NPaul Guo <pguo@pivotal.io>
      Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
      cc853420
    • D
      Increase default value of wal_keep_segments GUC. · dd18c4a0
      David Kimura 提交于
      Until we have replication slots this will keep enough xlog segments
      around so that mirrors have an opportunity to reconnect when a
      checkpoint removes a segment while the mirror is not streaming.
      Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
      dd18c4a0
    • H
      Mark various objects as internal, for purposes of object access hooks. · a673ddaa
      Heikki Linnakangas 提交于
      As far as I can see, the 'is_internal' flag is passed through to possible
      object access hook, but it has no other effect. Mark the LOV index and
      heap created for bitmap indexes, as well as constrains created for
      exchanged partitions as 'internal'.
      a673ddaa
  3. 26 9月, 2018 13 次提交
  4. 25 9月, 2018 9 次提交
    • A
      Disable 'emergency mode' autovacuum worker. · 5ce7f06d
      Adam Berlin 提交于
      In GPDB, we only want an autovacuum worker to start once we know
      there is a database to vacuum.
      
      When we changed the default value of the `autovacuum_start_daemon` from
      `true` to `false` for GPDB, we made the behavior of the AutoVacuumLauncherMain()
      be to immediately start an autovacuum worker from the launcher and exit,
      which is called 'emergency mode'.  When the 'emergency mode' is running it is possible
      to continuously start an autovacuum worker. Within the worker, the
      PMSIGNAL_START_AUTOVAC_LAUNCHER signal is sent when a database is found that is old
      enough to be vacuumed, but because we only autovacuum non-connectable
      databases (template0) in GPDB and we do not have logic to filter out
      connectable databases in the autovacuum worker.
      
      This change allows the autovacuum launcher to do more up-front decision making
      about whether it should start an autovacuum worker, including GPDB specific rules.
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      5ce7f06d
    • P
      Allow to add motion to unique-ify the path in create_unique_path(). (#5589) · e9fe4224
      Paul Guo 提交于
      create_unique_path() could be used to convert semi join to inner join.
      Previously, during the Semi-join refactor in commit d4ce0921, creating unique
      path was disabled for the case where duplicats might be on different QEs.
      
      In this patch we enable adding motion to unique_ify the path, only if unique
      mothod is not UNIQUE_PATH_NOOP. We don't create unique path for that case
      because if later on during plan creation, it is possible to create a motion
      above this unique path whose subpath is a motion. In that case, the unique path
      node will be ignored and we will get a motion plan node above a motion plan
      node and that is bad. We could further improve that, but not in this patch.
      Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
      Co-authored-by: NPaul Guo <paulguo@gmail.com>
      e9fe4224
    • D
      Remove bkuprestore test · d6409042
      Daniel Gustafsson 提交于
      The bkuprestore test was imported along with the source code during the
      initial open sourcing, but has never been used and hasn't worked in a
      long time. Rather than trying to save this broken mess, let's remove it
      and start fresh with a pg_dump TAP test which is a much better way to
      test backup/restore.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Reviewed-by: NJimmy Yih <jyih@pivotal.io>
      d6409042
    • D
      Update ORCA output file for update_gp ICG test · 6d61b3fe
      Dhanashree Kashid 提交于
      6d61b3fe
    • A
      Delete SIGUSR2 based fault injection logic in walreceiver. · fc008690
      Ashwin Agrawal 提交于
      Regular fault injection doesn't work for mirrors. Hence, using SIGUSR2 signal
      and on-disk file coupled with it just for testing a fault injection mechanism
      was coded. This seems very hacky and intrusive, hence plan is to get rid of the
      same. Most of the tests using this framework are found not useful as majority of
      code is upstream. Even if needs testing, better alternative would be explored.
      fc008690
    • A
      Remove remaining unused pieces of wal_consistency_checking. · c9dee15b
      Ashwin Agrawal 提交于
      Most of the backup block related modification for providing the
      wal_consistency_checking was removed as part of 9.3 merge. This was mainly done
      to avoid merge conflicts. The masking functions are still used by
      gp_replica_check tool to perform checking between primary and mirrors. But the
      online version of checking during each replay of record was let go. So, in this
      commit cleaning up remaining pieces which are not used. We will get back this in
      properly working condition when we catch up to upstream.
      c9dee15b
    • A
      Remove some unused and not implemented fault types. · c2bbca41
      Ashwin Agrawal 提交于
      Removing the fault types which do not have implementation. Or have
      implementation but doesn't seem usable. This will just help to have only working
      subset of faults. Like data corruption fault seems pretty useless. Even if
      needed then can be easily coded for specific usecase using the skip fault,
      instead of having special one defined for it.
      
      Fault type "fault" is redundant with "error" hence removing the same as well.
      c2bbca41
    • A
      Add gpdb specific files to .gitignore · 36d33485
      Ashwin Agrawal 提交于
      36d33485
    • D
      Fix volatile functions handling by ORCA · e17c6f9a
      Dhanashree Kashid 提交于
      Following commits have been cherry-picked again:
      
      b1f543f3.
      
      b0359e69.
      
      a341621d.
      
      The contrib/dblink tests were failing with ORCA after the above commits.
      The issue has been fixed now in ORCA v3.1.0. Hence we re-enabled these
      commits and bumping the ORCA version.
      e17c6f9a
  5. 24 9月, 2018 3 次提交
    • H
      Remove FIXME, accept that we won't have this assertion anymore. · 1d254cf1
      Heikki Linnakangas 提交于
      I couldn't find an easy way to make this assertion work, with the
      "flattened" range table in 9.3. The information needed for this is zapped
      away in add_rte_to_flat_rtable(). I think we can live without this
      assertion.
      1d254cf1
    • H
      Fix UPDATE RETURNING on distribution key columns. · 306b114b
      Heikki Linnakangas 提交于
      Updating a distribution key column is performed as a "split update", i.e.
      separate DELETE and INSERT operations, which may happen on different nodes.
      In case of RETURNING, the DELETE operation was also returning a row, and it
      was also incorrectly counted in the row count returned to the client, in
      the command tag (e.g. "UPDATE 2"). Fix, and add a regression test.
      
      Fixes https://github.com/greenplum-db/gpdb/issues/5839
      306b114b
    • H
      Refactor code in ProcessRepliesIfAny() to match upstream. · 9e5b20e8
      Heikki Linnakangas 提交于
      The reason we needed the FIXME pq_getmessage() call, marked with the
      FIXME comment, was that we were missing the pq_getmessage() call from
      ProcessStandbyMessage(), that the corresponding upstream version, at the
      point that we're caught up in the merge, had. I believe the reason it was
      missing from ProcessStandbyMessage() was that we had earlier backported
      upstream commit cd19848bd55. That commit removed the pq_getmessage() call
      from ProcessStandbyMessage(), and added one in ProcessRepliesIfAny(),
      instead.
      
      Clarify this by changing the code to match upstream commit cd19848bd55.
      (Except that we don't have pq_startmsgread() yet, that will arrive when
      we merge the rest of commit cd19848bd55.)
      9e5b20e8
  6. 23 9月, 2018 2 次提交
  7. 22 9月, 2018 4 次提交
    • J
      Revert "Add DEBUG mode to the explain_memory_verbosity GUC" · 984cd3b9
      Jesse Zhang 提交于
      Commit 825ca1e3 didn't seem to work well when we hook up ORCA's memory
      system to memory accounting. We are tripping multiple asserts in
      regression tests. The reg test failures seem to suggest we are
      double-free'ing somewhere (or incorrectly accounting). Reverting for now
      to get master back to green.
      
      This reverts commit 825ca1e3.
      984cd3b9
    • T
      Add DEBUG mode to the explain_memory_verbosity GUC · 825ca1e3
      Taylor Vesely 提交于
      The memory accounting system generates a new memory account for every
      execution node initialized in ExecInitNode. The address to these memory
      accounts is stored in the shortLivingMemoryAccountArray. If the memory
      allocated for shortLivingMemoryAccountArray is full, we will repalloc
      the array with double the number of available entries.
      
      After creating approximately 67000000 memory accounts, it will need to
      allocate more than 1GB of memory to increase the array size, and throw
      an ERROR, canceling the running query.
      
      PL/pgSQL and SQL functions will create new executors/plan nodes that
      must be tracked my the memory accounting system. This level of detail is
      not necessary for tracking memory leaks, and creating a separate memory
      account for every executor will use large amount of memory just to track
      these memory accounts.
      
      Instead of tracking millions of individual memory accounts, we
      consolidate any child executor account into a special 'X_NestedExecutor'
      account. If explain_memory_verbosity is set to 'detailed' and below,
      consolidate all child executors into this account.
      
      If more detail is needed for debugging, set explain_memory_verbosity to
      'debug', where, as was the previous behavior, every executor will be
      assigned its own MemoryAccountId.
      
      Originally we tried to remove nested execution accounts after they
      finish executing, but rolling over those accounts into a
      'X_NestedExecutor' account was impracticable to accomplish without the
      possibility of a future regression.
      
      If any accounts are created between nested executors that are not rolled
      over to an 'X_NestedExecutor' account, recording which accounts are
      rolled over can grow in the same way that the
      shortLivingMemoryAccountArray is growing today, and would also grow too
      large to reasonably fit in memory.
      
      If we were to iterate through the SharedHeaders every time that we
      finish a nested executor, it is not likely to be very performant.
      
      While we were at it, convert some of the convenience macros dealing with
      memory accounting for executor / planner node into functions, and move
      them out of memory accounting header files into the sole callers'
      compilation units.
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Co-authored-by: NEkta Khanna <ekhanna@pivotal.io>
      Co-authored-by: NAdam Berlin <aberlin@pivotal.io>
      Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
      Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
      825ca1e3
    • T
      Move memoryAccountId out of PlannedStmt/Plan Nodes · 7c9cc053
      Taylor Vesely 提交于
      Functions using SQL and PL/pgSQL will plan and execute arbitrary SQL
      inside a running query. The first time we initialize a plan for an SQL
      block, the memory accounting system creates a new memory account for
      each Executor/Node.  In the case that we are executing a cached plan,
      (i.e.  plancache.c) the memory accounts will have already been assigned
      in a previous execution of the plan.
      
      As a result, when explain_memory_verbosity is set to 'detail', it is not
      clear what memory account corresponds to which executor. Instead, move
      the memoryAccountId into PlanState/QueryDesc, which will insure that
      every time we initialize an executor, it will be assigned a unique
      memoryAccountId.
      Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
      7c9cc053
    • H
      Remove FIXME in RemoveLocalLock, it's alright. · 9e57124b
      Heikki Linnakangas 提交于
      The FIXME was added to GPDB in commit f86622d9, which backported the
      local cache of resource owners attached to LOCALLOCK. I think the comment
      was added, because in the upstream commit that added the cache, the
      upstream didn't thave the check guarding the pfree() yet. It was added
      later in upstream, too, in commit 7e6e3bdd3c, and that had already been
      backported to GPDB. So it's alright, the guard on the pfree is a good thing
      to have, and there's nothing further to do here.
      9e57124b