1. 05 2月, 2019 12 次提交
  2. 03 2月, 2019 1 次提交
    • H
      Change formatting of AO checksums in error messages. · 2a4cc4f0
      Heikki Linnakangas 提交于
      The gpdiff rule in 'ao_checksum_corruption' test assumed that the
      checksums were 8 characters wide. That was not always true, however,
      because the checksums were not padded with zeros. Padding them with zeros
      seems nicer, so change the error messages to do that.
      
      This should fix these buildfarm failures we've been seeing recently:
      
      -ERROR:  header checksum does not match, expected 0xXXXXXXXX and found 0xXXXXXXXX
      +ERROR:  header checksum does not match, expected 0x21B733 and found 0x44C333F8
      
      I'm not sure why we started seeing this now, and I didn't see those errors
      on my laptop. But it's pure chance whether a checksum happens to begin
      with a 0 or not, so it's not that surprising that some completely
      unrelated change changed the physical contents of the table. This commit
      should make the failure go away.
      Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
      2a4cc4f0
  3. 02 2月, 2019 11 次提交
  4. 01 2月, 2019 16 次提交
    • H
      Fix gpcheckcat test case, distkey cannot be NULL anymore. · 6cacc636
      Heikki Linnakangas 提交于
      A randomly distributed table is now represented by an empty int2vector.
      6cacc636
    • D
      Error out on multiple writers in CTE · bfcb7882
      Daniel Gustafsson 提交于
      While Greenplum can plan a CTE query with multiple writable expressions,
      it cannot execute it as there is a limitation on using a single writer
      gang. Until we can support multiple writer gangs, let's error out with
      a graceful error message rather than failing during exeucution with a
      more cryptic internal error.
      
      Ideally this will be reverted in GPDB 7.X but right now it's much too
      close to release for attacking this.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      bfcb7882
    • D
      Fix leftover merge conflict in xmlmap test · cfad092b
      Daniel Gustafsson 提交于
      The 9.4.20 merge mistakenly left a merge conflict in the alternative
      output for the xmlmap test. Fix verified against a backend without
      XML support.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      cfad092b
    • H
      Use normal hash operator classes for data distribution. · 242783ae
      Heikki Linnakangas 提交于
      Replace the use of the built-in hashing support for built-in datatypes, in
      cdbhash.c, with the normal PostgreSQL hash functions. Now is a good time
      to do this, since we've already made the change to use jump consistent
      hashing in GPDB 6, so we'll need to deal with the upgrade problems
      associated with changing the hash functions, anyway.
      
      It is no longer enough to track which columns/expressions are used to
      distribute data. You also need to know the hash function used. For that,
      a new field is added to gp_distribution_policy, to record the hash
      operator class used for each distribution key column. In the planner,
      a new opfamily field is added to DistributionKey, to track that throughout
      the planning.
      
      Normally, if you do "CREATE TABLE ... DISTRIBUTED BY (column)", the
      default hash operator class for the datatype is used. But this patch
      extends the syntax so that you can specify the operator class explicitly,
      like "... DISTRIBUTED BY (column opclass)". This is similar to how an
      operator class can be specified for each column in CREATE INDEX.
      
      To support upgrade, the old hash functions have been converted to special
      (non-default) operator classes, named cdbhash_*_ops. For example, if you
      want to use the old hash function for an integer column, you could do
      "DISTRIBUTED BY (intcol cdbhash_int4_ops)". The old hard-coded whitelist
      of operators that have "compatible" cdbhash functions has been replaced
      by putting the compatible hash opclasses in the same operator family. For
      example, all legacy integer operator classes, cdbhash_int2_ops,
      cdbhash_int4_ops and cdbhash_int8_ops, are all part of the
      cdbhash_integer_ops operator family).
      
      This removes the pg_database.hashmethod field. The hash method is now
      tracked on a per-table and per-column basis, using the opclasses, so it's
      not needed anymore.
      
      To help with upgrade from GPDB 5, this introduces a new GUC called
      'gp_use_legacy_hashops'. If it's set, CREATE TABLE uses the legacy hash
      opclasses, instead of the default hash opclasses, if the opclass is not
      specified explicitly. pg_upgrade will set the new GUC, to force the use of
      legacy hashops, when restoring the schema dump. It will also set the GUC
      on all upgraded databases, as a per-database option, so any new tables
      created after upgrade will also use the legacy opclasses. It seems better
      to be consistent after upgrade, so that collocation between old and new
      tables work for example. The idea is that some time after the upgrade, the
      admin can reorganize all tables to use the default opclasses instead. At
      that point, he should also clear the GUC on the converted databases. (Or
      rather, the automated tool that hasn't been written yet, should do that.)
      
      ORCA doesn't know about hash operator classes, or the possibility that we
      might need to use a different hash function for two columns with the same
      datatype. Therefore, it cannot produce correct plans for queries that mix
      different distribution hash opclasses for the same datatype, in the same
      query. There are checks in the Query->DXL translation, to detect that
      case, and fall back to planner. As long as you stick to the default
      opclasses in all tables, we let ORCA to create the plan without any regard
      to them, and use the default opclasses when translating the DXL plan to a
      Plan tree. We also allow the case that all tables in the query use the
      "legacy" opclasses, so that ORCA works after pg_upgrade. But a mix of the
      two, or using any non-default opclasses, forces ORCA to fall back.
      
      One curiosity with this is the "int2vector" and "aclitem" datatypes. They
      have a hash opclass, but no b-tree operators. GPDB 4 used to allow them
      as DISTRIBUTED BY columns, but we forbid that in GPDB 5, in commit
      56e7c16b. Now they are allowed again, so you can specify an int2vector
      or aclitem column in DISTRIBUTED BY, but it's still pretty useless,
      because the planner still can't form EquivalenceClasses on it, and will
      treat it as "strewn" distribution, and won't co-locate joins.
      
      Abstime, reltime, tinterval datatypes don't have default hash opclasses.
      They are being removed completely on PostgreSQL v12, and users shouldn't
      be using them in the first place, so instead of adding hash opclasses for
      them now, we accept that they can't be used as distribution key columns
      anymore. Add a check to pg_upgrade, to refuse upgrade if they are used
      as distribution keys in the old cluster. Do the same for 'money' datatype
      as well, although that's not being removed in upstream.
      
      The legacy hashing code for anyarray in GPDB 5 was actually broken. It
      could produce a different hash value for two arrays that are considered
      equal, according to the = operator, if there were differences in e.g.
      whether the null bitmap was stored or not. Add a check to pg_upgrade, to
      reject the upgrade if array types were used as distribution keys. The
      upstream hash opclass for anyarray works, though, so it is OK to use
      arrays as distribution keys in new tables. We just don't support binary
      upgrading them from GPDB 5. (See github issue
      https://github.com/greenplum-db/gpdb/issues/5467). The legacy hashing of
      'anyrange' had the same problem, but that was new in GPDB 6, so we don't
      need a pg_upgrade check for that.
      
      This also tightens the checks ALTER TABLE ALTER COLUMN and CREATE UNIQUE
      INDEX, so that you can no longer create a situation where a non-hashable
      column becomes the distribution key. (Fixes github issue
      https://github.com/greenplum-db/gpdb/issues/6317)
      
      Discussion: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/4fZVeOpXllQCo-authored-by: NMel Kiyama <mkiyama@pivotal.io>
      Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io>
      Co-authored-by: NPengzhou Tang <ptang@pivotal.io>
      Co-authored-by: NChris Hajas <chajas@pivotal.io>
      Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
      Reviewed-by: NNing Yu <nyu@pivotal.io>
      Reviewed-by: NSimon Gao <sgao@pivotal.io>
      Reviewed-by: NJesse Zhang <jzhang@pivotal.io>
      Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
      Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
      Reviewed-by: NYandong Yao <yyao@pivotal.io>
      242783ae
    • H
      Rename gp_distribution_policy.attrnums to distkey, and make it int2vector. · 69ec6926
      Heikki Linnakangas 提交于
      This is in preparation for adding operator classes as a new column
      (distclass) to gp_distribution_policy. This naming is consistent with
      pg_index.indkey/indclass. Change the datatype to int2vector, also for
      consistency with pg_index, and some other catalogs that store attribute
      numbers, and because int2vector is slightly more convenient to work with
      in the backend. Move the column to the end of the table, so that all the
      variable-length and nullable columns are at the end, which makes it
      possible to reference the other columns directly in Form_gp_policy.
      
      Add a backend function, pg_get_table_distributedby(), to deparse the
      DISTRIBUTED BY definition of a table into a string. This is similar to
      pg_get_indexdef_columns(), pg_get_functiondef() etc. functions that we
      have. Use the new function in psql and pg_dump, when connected to a GPDB6
      server.
      Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Co-authored-by: NPeifeng Qiu <pqiu@pivotal.io>
      Co-authored-by: NAdam Lee <ali@pivotal.io>
      69ec6926
    • P
      Make GDD tests deterministic · a0b9fde8
      Pengzhou Tang 提交于
      GDD tests framework now acquire the desired lock by updating the nth tuple
      in a segment instead of a specified value, so even a hash algorithm changed,
      the tests will not be affected. This method works fine except that a segment
      has not enough tuples to provide the nth tuple. Fix is simple, enlarge the
      test tables from 20 rows to 100 rows.
      
      Authored-by: Ning Yu nyu@pivotal.io
      a0b9fde8
    • Z
      Update serially when GDD is disabled · 29e7f102
      Zhang Shujie 提交于
      If Global Deadlock Detector is enabled, then the table lock may
      downgrade to RowExclusiveLock, It may lead two problems:
      
      1. When updating distributed keys concurrently, SplitUpdate node
         would generate more tuples in the table.
      2. When updating concurrently, it may trigger the EvalPlanQual
         function, when the SubPlan has Motion node, it can not execute
         correctly.
      
      Now we add a GUC for GDD, if it is disabled, we execute these
      UPDATE statement serially, if it is enabled, we raise an error when
      updating concurrently.
      
      Co-authored-by: Zhenghua Lyu zlv@pivotal.io
      29e7f102
    • P
      Remove ext submodule folder · 1e43b584
      Peifeng Qiu 提交于
      We rmeoved the submodule address but didn't remove the actural
      folder. Submodule clone will fail due to missing url link. Remove
      the folder to avoid that.
      1e43b584
    • J
      Fix OOM after cluster reset when gp_vmem_protect_limit > 16GB (#6862) · 9e0e7c27
      Jialun 提交于
      The function VmemTracker_ShmemInit will initialize chunkSizeInBits
      according to gp_vmem_protect_limit. Which is the unit of chunk size.
      The base value of chunkSizeInBits is 20(1MB). If gp_vmem_protect_limit
      is larger than 16GB, it will increase to adapter the large memory
      environment. This value should not be changed after initialized.
      But if this function was called more times, chunkSizeInBits will
      accumulate.
      
      Considering the scenario, QD crashed, then postmaster will reaper the
      QD process and reset shared memory. This will lead to VmemTracker_ShmemInit
      be called more times. So chunkSizeInBits will increase every time after
      crash when gp_vmem_protect_limit is larger than 16GB. At last, the
      chunkSize will be very large which means the new reserved chunk will
      always be zero or a very small value. So the memory limit mechanism
      takes no effect and will cause Out-of-Memory when cannot really
      allocate new memory.
      
      So we set chunkSizeInBits to BITS_IN_MB in VmemTracker_ShmemInit
      every time instead of Assert.
      
      Why there is no new test case in this commit?
      - We just change an Assert to assignment, no logic changes.
      - It is very difficult to add a crash case in current isolation test
        frame, for the connection will be lost due to crash.
      
      We have verified the case in our dev environment manually by setting
      gp_vmem_protect_limit to 65535 and kill -9 QD process. Then we see
      chunkSizeInBits increases every time. At last, we got error message
      "ERROR:  Canceling query because of high VMEM usage."
      9e0e7c27
    • P
      Remove unused gpfdist dependency submodule and WIN32 Readme (#6861) · 7281a162
      Peifeng Qiu 提交于
      We no longer use the ext submodule for gpfdist dependencies. Remove
      it to avoid confusion. WIN32 build process is changed to native
      build. We will add README when it's ready.
      7281a162
    • P
      Remove a FIXME related to recoveryTargetIsLatest. (#6863) · 406fa028
      Paul Guo 提交于
      The recoveryTargetIsLatest setting code was missing somehow and later
      it was added back in commit 55808e18. Removing the FIXME comment.
      Reviewed-by: NJimmy Yih <jyih@pivotal.io>
      Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
      406fa028
    • A
      Set needToPromoteCatalog before updating ControlFile->state. · 434bd5b9
      Ashwin Agrawal 提交于
      Commit 6d80ce31 moved updating the
      control file state above, which caused failure in CI for
      gpactivatestandby test. As catalog update got missed since
      needToPromoteCatalog remained set as false. Hence, move setting
      needToPromoteCatalog before setting the ControlFile->state.
      434bd5b9
    • A
      Avoid FinishPreparedTransaction() calling readRecoveryCommandFile() · cb256d04
      Ashwin Agrawal 提交于
      Not sure why we had FinishPreparedTransaction() calling
      readRecoveryCommandFile(), seems serves no purpose to me. Seems to
      exist from ages and wasn't able to find the rational for the same,
      definitely not with current code. Seems unnecessary performance hit on
      every commit to read and parse the file.
      cb256d04
    • A
      Align transaction log manager (xlog.c and xlog.h) to upstream. · 6d80ce31
      Ashwin Agrawal 提交于
      Lot of differences collected over the years compared to upstream. Some
      confusing or redundant code as well hence better to make it match
      upstream.
      6d80ce31
    • A
      concourse: Remove unused dev_generate_installer.yml · b55a0b71
      Amil Khanzada 提交于
      - We're not sure when this file became abandoned, but it doesn't seem to
        be being used anywhere.
      - Also remove task file and bash scripts that were only referenced by
        this pipeline.
      Co-authored-by: NBradford D. Boyle <bboyle@pivotal.io>
      Co-authored-by: NAmil Khanzada <akhanzada@pivotal.io>
      b55a0b71
    • G
      Remove disabled code in set_plan_references_input_asserts() · e8238cc1
      Georgios Kokolatos 提交于
      This commit removes GPDB_93_MERGE_FIXME introduced while including
      46c508fb from upstream. The intention of the upstream commit is
      to keep planner params separated so that they don't get reused
      incorrectly. In doing so, it removed the need for a global list of
      PlannerParamItems.
      
      The removed assertion in this commit was verifying that each Param
      in the tree was included in a global list of PlannerParamItems, and
      that each datatype of each Param matches that in the global list.
      
      At the time of the assertion, we simply don't have the necessary
      information to be able to verify properly. An argument could be made
      for re-introducing such a global list PlannerParamItems. However
      this assertion would not verify that a parameter is ancored in the
      right place and it would introduce additional code to maintain.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      e8238cc1