1. 01 4月, 2019 2 次提交
  2. 15 3月, 2019 1 次提交
    • J
      Remove ICW job icw_planner_centos7_online_expand (#7172) · de2d530d
      Jialun 提交于
      This job is used to test online expand, it will create a cluster
      with two segments, then expand to 3 and run all ICW to check
      whether the cluster is OK after expansion.
      As restart is forbidden in online expand test, so we exclude all
      the case which contains restart opertaion. But if someone add a
      new test with restart, the job may fail. Manual intervention to
      exclude the test is needed.
      So we move this job to our own dev pipeline to reduce the impact
      on prod pipeline.
      de2d530d
  3. 14 3月, 2019 1 次提交
  4. 12 3月, 2019 1 次提交
  5. 11 3月, 2019 1 次提交
  6. 07 3月, 2019 1 次提交
  7. 27 2月, 2019 1 次提交
    • D
      We improve the check in pg_upgrade for gphdfs in the following ways. · 4809a46c
      David Krieger 提交于
      All of these checks are done only when the old cluster version has
      support for gphdfs.
      
      1). skip checking for gphdfs tables for cluster versions where gphdfs
      support is absent.
      2). fail upon a successful check for gphdfs roles
      
      NOTE: we do not special case the existence of the gphdfs.so library,
      even in absense of gphdfs tables or roles.  In other words, the existing
      library checks force the user to drop all gphdfs-dependent functions
      before upgrade.
      Co-authored-by: NFrancisco Guerrero <aguerrero@pivotal.io>
      4809a46c
  8. 26 2月, 2019 1 次提交
    • D
      pg_upgrade changes to upgrade from GPDB5 to GPDB6 · ecd3400f
      David Krieger 提交于
      Minor changes are made to pg_upgrade to allow GPDB5 to be upgraded
      to GPDB6:
      
      1). pg_ctl arguments need to explicitly contain the dbid, contentid and
      numcontents in order to start the GPDB5 cluster
      
      2). the type of the attnum field of the gp_distribution_policy had
      changed
      
      3). gp_toolkit is a view, and hence datatypes it contains that have been
      modified in GPDB6(such as name), need not be flagged as errors during
      pg_upgrade's check of the old cluster.
      
      4). index access method type 'bitmap' needs to be added to the exclusion
      list to select only bpchar_pattern_ops index access methods
      Co-authored-by: NJesse Zhang <jzhang@pivotal.io>
      ecd3400f
  9. 21 2月, 2019 1 次提交
  10. 20 2月, 2019 1 次提交
  11. 13 2月, 2019 1 次提交
  12. 11 2月, 2019 1 次提交
    • D
      Hide WITH RECURSIVE under off by default GUC · 2de46a20
      Daniel Gustafsson 提交于
      In Greenplum 5.X the recursive CTE feature was hidden behind a GUC as
      it wasn't deemed of production quality just yet. Commit 20152cbf
      removed that GUC in order to make stabilization work easier, there are
      still enough rough edges to not consider recursive CTE a feature which
      is on by default. This brings back the GUC using the same name in order
      to be backwards compatible even though "prototype" is a bit misleading
      as this point.
      
      In order for the cluster, and the associated tools, to work this also
      turns on/off the GUC as required when there are recursive queries in
      the toolchain.
      
      Also adds a test and tidies up a few comments in surrounding code.
      
      This is another attempt at this, the previous coding being reverted
      in ea57a7aa.
      Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
      Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
      2de46a20
  13. 05 2月, 2019 2 次提交
  14. 01 2月, 2019 1 次提交
    • H
      Use normal hash operator classes for data distribution. · 242783ae
      Heikki Linnakangas 提交于
      Replace the use of the built-in hashing support for built-in datatypes, in
      cdbhash.c, with the normal PostgreSQL hash functions. Now is a good time
      to do this, since we've already made the change to use jump consistent
      hashing in GPDB 6, so we'll need to deal with the upgrade problems
      associated with changing the hash functions, anyway.
      
      It is no longer enough to track which columns/expressions are used to
      distribute data. You also need to know the hash function used. For that,
      a new field is added to gp_distribution_policy, to record the hash
      operator class used for each distribution key column. In the planner,
      a new opfamily field is added to DistributionKey, to track that throughout
      the planning.
      
      Normally, if you do "CREATE TABLE ... DISTRIBUTED BY (column)", the
      default hash operator class for the datatype is used. But this patch
      extends the syntax so that you can specify the operator class explicitly,
      like "... DISTRIBUTED BY (column opclass)". This is similar to how an
      operator class can be specified for each column in CREATE INDEX.
      
      To support upgrade, the old hash functions have been converted to special
      (non-default) operator classes, named cdbhash_*_ops. For example, if you
      want to use the old hash function for an integer column, you could do
      "DISTRIBUTED BY (intcol cdbhash_int4_ops)". The old hard-coded whitelist
      of operators that have "compatible" cdbhash functions has been replaced
      by putting the compatible hash opclasses in the same operator family. For
      example, all legacy integer operator classes, cdbhash_int2_ops,
      cdbhash_int4_ops and cdbhash_int8_ops, are all part of the
      cdbhash_integer_ops operator family).
      
      This removes the pg_database.hashmethod field. The hash method is now
      tracked on a per-table and per-column basis, using the opclasses, so it's
      not needed anymore.
      
      To help with upgrade from GPDB 5, this introduces a new GUC called
      'gp_use_legacy_hashops'. If it's set, CREATE TABLE uses the legacy hash
      opclasses, instead of the default hash opclasses, if the opclass is not
      specified explicitly. pg_upgrade will set the new GUC, to force the use of
      legacy hashops, when restoring the schema dump. It will also set the GUC
      on all upgraded databases, as a per-database option, so any new tables
      created after upgrade will also use the legacy opclasses. It seems better
      to be consistent after upgrade, so that collocation between old and new
      tables work for example. The idea is that some time after the upgrade, the
      admin can reorganize all tables to use the default opclasses instead. At
      that point, he should also clear the GUC on the converted databases. (Or
      rather, the automated tool that hasn't been written yet, should do that.)
      
      ORCA doesn't know about hash operator classes, or the possibility that we
      might need to use a different hash function for two columns with the same
      datatype. Therefore, it cannot produce correct plans for queries that mix
      different distribution hash opclasses for the same datatype, in the same
      query. There are checks in the Query->DXL translation, to detect that
      case, and fall back to planner. As long as you stick to the default
      opclasses in all tables, we let ORCA to create the plan without any regard
      to them, and use the default opclasses when translating the DXL plan to a
      Plan tree. We also allow the case that all tables in the query use the
      "legacy" opclasses, so that ORCA works after pg_upgrade. But a mix of the
      two, or using any non-default opclasses, forces ORCA to fall back.
      
      One curiosity with this is the "int2vector" and "aclitem" datatypes. They
      have a hash opclass, but no b-tree operators. GPDB 4 used to allow them
      as DISTRIBUTED BY columns, but we forbid that in GPDB 5, in commit
      56e7c16b. Now they are allowed again, so you can specify an int2vector
      or aclitem column in DISTRIBUTED BY, but it's still pretty useless,
      because the planner still can't form EquivalenceClasses on it, and will
      treat it as "strewn" distribution, and won't co-locate joins.
      
      Abstime, reltime, tinterval datatypes don't have default hash opclasses.
      They are being removed completely on PostgreSQL v12, and users shouldn't
      be using them in the first place, so instead of adding hash opclasses for
      them now, we accept that they can't be used as distribution key columns
      anymore. Add a check to pg_upgrade, to refuse upgrade if they are used
      as distribution keys in the old cluster. Do the same for 'money' datatype
      as well, although that's not being removed in upstream.
      
      The legacy hashing code for anyarray in GPDB 5 was actually broken. It
      could produce a different hash value for two arrays that are considered
      equal, according to the = operator, if there were differences in e.g.
      whether the null bitmap was stored or not. Add a check to pg_upgrade, to
      reject the upgrade if array types were used as distribution keys. The
      upstream hash opclass for anyarray works, though, so it is OK to use
      arrays as distribution keys in new tables. We just don't support binary
      upgrading them from GPDB 5. (See github issue
      https://github.com/greenplum-db/gpdb/issues/5467). The legacy hashing of
      'anyrange' had the same problem, but that was new in GPDB 6, so we don't
      need a pg_upgrade check for that.
      
      This also tightens the checks ALTER TABLE ALTER COLUMN and CREATE UNIQUE
      INDEX, so that you can no longer create a situation where a non-hashable
      column becomes the distribution key. (Fixes github issue
      https://github.com/greenplum-db/gpdb/issues/6317)
      
      Discussion: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/4fZVeOpXllQCo-authored-by: NMel Kiyama <mkiyama@pivotal.io>
      Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io>
      Co-authored-by: NPengzhou Tang <ptang@pivotal.io>
      Co-authored-by: NChris Hajas <chajas@pivotal.io>
      Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
      Reviewed-by: NNing Yu <nyu@pivotal.io>
      Reviewed-by: NSimon Gao <sgao@pivotal.io>
      Reviewed-by: NJesse Zhang <jzhang@pivotal.io>
      Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
      Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
      Reviewed-by: NYandong Yao <yyao@pivotal.io>
      242783ae
  15. 23 1月, 2019 1 次提交
    • A
      Store gp_dbid and gp_contentid in conf files. · 4eaeb7bc
      Ashwin Agrawal 提交于
      Currently, gp_dbid and gp_contentid is passed as command line
      arguments for starting QD and QE. Since, the values are stored in
      master's catalog table, to get the right values, must start the master
      first. Hence, hard-coded dbid=1 was used for starting the master in
      admin mode always. This worked fine till dbid was not used for
      anything on-disk. But given dbid is used for tablespace path in GPDB
      6, startting the instance with wrong dbid, means inviting recovery
      time failues, data corruption or data loss situations. Dbid=1 will go
      wrong after failover to standby master as it has dbid != 1. This
      commit hence eliminate the need of passing the gp_dbid and
      gp_contentid on command line, instead while creating the instance the
      values are stored in conf files for the instance.
      
      This also helps to avoid passing gp_dbid as argument to pg_rewind,
      which needs to start target instance in single user mode to complete
      recovery before performing rewind operation.
      
      Plus, this eases during development to just use pg_ctl start and not
      require to correctly pass these values.
      
       - gp_contentid is stored in postgresql.conf file.
      
       - gp_dbid is stored in internal.auto.conf.
      
       - Introduce internal.auto.conf file created during
         initdb. internal.auto.conf is included from postgresql.conf file.
      
       - Separate file is chosen to write gp_dbid for ease handling during
         pg_rewind and pg_basebackup, as can exclude copying this file from
         primary to mirror, instead of trying to edit the contents of the
         same after copy during these operations. gp_contentid remains same
         for primary and mirror hence having it in postgresql.conf file
         makes senes. If gp_contentid is also stored in this new file
         internal.auto.conf then pg_basebackup needs to be passed contentid
         as well to write to this file.
      
       - pg_basebackup: write the gp_dbid after backup. Since, gp_dbid is
         unique for primary and mirror, pg_basebackup excludes copying
         internal.auto.conf file storing the gp_dbid. pg_basebackup explicit
         (over)writes the file with value passed as
         --target-gp-dbid. --target-gp-dbid due to this is mandatory
         argument to pg_basebackup now.
      
       - gpexpand: update gp_dbid and gp_contentid post directory copy.
      
       - pg_upgrade: retain all configuration files for
         segment. postgresql.auto.conf and internal.auto.conf are also
         internal configuration files which should be restored back after
         directory copy. Similar, change is required in gp_upgrade repo in
         restoreSegmentFiles() after copyMasterDirOverSegment().
      
       - Update tests to avoid passing gp_dbid and gp_contentid.
      Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
      4eaeb7bc
  16. 17 1月, 2019 1 次提交
    • Z
      Check cluster expansion status by numsegments · 1c061fbf
      ZhangJackey 提交于
      If the cluster is in expansion mode, then there must be some partial
      tables which are not equal between numsegments and cluster size.
      Now we check the expansion status by table's numsegments when we run
      gpexpand, gpexpand will raise an error if there are partial tables.
      1c061fbf
  17. 12 1月, 2019 1 次提交
  18. 10 1月, 2019 1 次提交
    • H
      Don't use TIDs with high offset numbers in AO tables. · c249ac7a
      Heikki Linnakangas 提交于
      Change the mapping AO segfilenum+rownum to an ItemPointer, so that we
      avoid using ItemPointer.ip_posid values higher than 32768. Such offsets
      are impossible on heap tables, because you can't fit that many tuples on
      a page. In GiST, since PostgreSQL 9.1, we have taken advantage of that by
      using 0xfffe (65534) to mark special "invalid" GiST tuples. We can tolerate
      that, because those invalid tuples can only appear on internal pages, so
      they cannot be confused with AO TIDs, which only appear on leaf pages. But
      we will also use of those high values for other similar magic values, in
      later version of PostgreSQL, so it seems better to keep clear of them, even
      if we could make it work.
      
      To allow binary upgrades of indexes that already contain AO tids with high
      offsets, we still allow and handle those, too, in the code to fetch AO
      tuples. Also relax the sanity check in GiST code, to not confuse those high
      values with invalid tuples.
      
      Fixes https://github.com/greenplum-db/gpdb/issues/6227Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
      c249ac7a
  19. 08 1月, 2019 2 次提交
  20. 03 1月, 2019 1 次提交
  21. 20 12月, 2018 1 次提交
    • H
      Remove FIXME, we don't need to do anything here. · 23a8c544
      Heikki Linnakangas 提交于
      Our window aggregate's now work the same as in upstream, and this contrib
      module's code is 100% identical to upstream too, except for this FIXME.
      I also tried running the tsearch2 regression tests. It failed because of
      some trivial-looking issues, but all the tests for the rewrite() aggregate
      worked fine.
      23a8c544
  22. 18 12月, 2018 2 次提交
  23. 14 12月, 2018 1 次提交
    • D
      Add a --socketdir option to pg_upgrade · 0efdbb0f
      Daniel Gustafsson 提交于
      This is a backport of the below commit from upstream PostgreSQL,
      which was originally written for Greenplum and submitted as an
      upstream-first feature. The commit didn't cherrypick as upstream
      has moved pg_upgrade to src/bin and Greenplum has yet to merge
      that.
      Reviewed-by: NJacob Champion <pchampion@pivotal.io>
      
        commit 2d34ad84
        Author: Tom Lane <tgl@sss.pgh.pa.us>
        Date:   Sat Dec 1 15:45:11 2018 -0500
      
          Add a --socketdir option to pg_upgrade.
      
          This allows control of the directory in which the postmaster sockets
          are created for the temporary postmasters started by pg_upgrade.
          The default location remains the current working directory, which is
          typically fine, but if it is deeply nested then its pathname might
          be too long to be a socket name.
      
          In passing, clean up some messiness in pg_upgrade's option handling,
          particularly the confusing and undocumented way that configuration-only
          datadirs were handled.  And fix check_required_directory's substantially
          under-baked cleanup of directory pathnames.
      
          Daniel Gustafsson, reviewed by Hironobu Suzuki, some code cleanup by me
      
          Discussion: https://postgr.es/m/E72DD5C3-2268-48A5-A907-ED4B34BEC223@yesql.se
      0efdbb0f
  24. 13 12月, 2018 1 次提交
    • D
      Reporting cleanup for GPDB specific errors/messages · 56540f11
      Daniel Gustafsson 提交于
      The Greenplum specific error handling via ereport()/elog() calls was
      in need of a unification effort as some parts of the code was using a
      different messaging style to others (and to upstream). This aims at
      bringing many of the GPDB error calls in line with the upstream error
      message writing guidelines and thus make the user experience of
      Greenplum more consistent.
      
      The main contributions of this patch are:
      
      * errmsg() messages shall start with a lowercase letter, and not end
        with a period. errhint() and errdetail() shall be complete sentences
        starting with capital letter and ending with a period. This attempts
        to fix this on as many ereport() calls as possible, with too detailed
        errmsg() content broken up into details and hints where possible.
      
      * Reindent ereport() calls to be more consistent with the common style
        used in upstream and most parts of Greenplum:
      
      	ereport(ERROR,
      			(errcode(<CODE>),
      			 errmsg("short message describing error"),
      			 errhint("Longer message as a complete sentence.")));
      
      * Avoid breaking messages due to long lines since it makes grepping
        for error messages harder when debugging. This is also the de facto
        standard in upstream code.
      
      * Convert a few internal error ereport() calls to elog(). There are
        no doubt more that can be converted, but the low hanging fruit has
        been dealt with. Also convert a few elog() calls which are user
        facing to ereport().
      
      * Update the testfiles to match the new messages.
      
      Spelling and wording is mostly left for a follow-up commit, as this was
      getting big enough as it was. The most obvious cases have been handled
      but there is work left to be done here.
      
      Discussion: https://github.com/greenplum-db/gpdb/pull/6378Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      56540f11
  25. 07 12月, 2018 2 次提交
  26. 06 12月, 2018 1 次提交
  27. 29 11月, 2018 1 次提交
    • A
      Remove gp_num_contents_in_cluster GUC. · 5f10a924
      Ashwin Agrawal 提交于
      Given the online gpexpand work, the gp_num_contents_in_cluster GUC is
      unused. So, delete the same from code to avoid confusions and eliminate this
      long argument required to start a postgres instance in gpdb.
      5f10a924
  28. 27 11月, 2018 1 次提交
  29. 19 11月, 2018 2 次提交
  30. 17 11月, 2018 2 次提交
  31. 16 11月, 2018 3 次提交
    • A
    • D
      Ensure preassigned Oids aren't reused during upgrade · 73434db2
      Daniel Gustafsson 提交于
      During an upgrade, there are numerous new objects created in the new
      cluster which either doesn't have an Oids preallocated from the old,
      or which never existed in the old cluster to begin with. These objects
      are assigned Oids in the new cluster which may collide with Oids which
      are preassigned from the old cluster, but which hasn't yet been used
      in the restore process. This is because the restore process is doing
      the preallocation just before the object creation.
      
      To avoid collisions, the Oids which will be preassigned are tracked
      during schema dump, and are injected into the Oid dispatch machinery
      before any object is restored such that the list can be queried. This
      requires a new mode of dumping in pg_dump where an object is recorded
      first in the dependency chain but is dumped last. To support this, a
      new API for amending ArchiveEntry objects has been added. Currently
      it only supports changing the definition but it can easily be extended
      to cover future usecases.
      
      The performance of this patch is a TODO, passing all the Oids in an
      array in a single call is unlikely to scale to real-world scenarios
      but it's better to subject this to a wider audience sooner rather
      than later.
      
      Also fix up since-long outdated documentation comments in oid_dispatch.c
      and some minor style nits while in there.
      Reviewed-by: NJacob Champion <pchampion@pivotal.io>
      73434db2
    • A
      99cccf76