1. 13 7月, 2018 1 次提交
  2. 12 7月, 2018 1 次提交
    • R
      Fix invalid reference to relcache entries. · 234150b7
      Richard Guo 提交于
      Fix invalid reference to relcache entries.
      
      After a relation is closed, the relcache entry
      might be freed if its refcount goes to zero and we
      should avoid further reference to it.
      
      This is a fix to already existing bug and current
      tests can cover the code changes here. So there is
      no need to add new test case.
      234150b7
  3. 11 7月, 2018 2 次提交
    • A
      Improve handling of rd_cdbpolicy. · 0bfc7251
      Ashwin Agrawal 提交于
      Pointers from Relation object needs to be handled with special care. As having
      refcount on the object doesn't mean the object is not modified. Incase of cache
      invalidation message handling Relation object gets *rebuild*. As part of rebuild
      only guarantee maintained is that Relation object address will not change. But
      the memory addresses inside the Relation object gets freed and freshly allocated
      and populated with latest data from catalog.
      
      For example below code sequence is dangerous
      
          rel->rd_cdbpolicy = original_policy;
          GpPolicyReplace(RelationGetRelid(rel), original_policy);
      
      If relcache invalidation message is served after assigning value to
      rd_cdbpolicy, the rebuild will free the memory for rd_cdbpolicy (which means
      original_policy) and replaced with current contents of
      gp_distribution_policy. So, when GpPolicyReplace() called with original_policy
      is going to access freed memory. Plus, rd_cdbpolicy will have stale value in
      cache and not intended refreshed value. This issue was hit in CI few times and
      reproduces with higher frequency with `-DRELCACHE_FORCE_RELEASE`.
      
      Hence this patch fixes all uses to rd_cdbpolicy to make use of rd_cdbpolicy
      pointer directly from Relation object and also to update the catalog first
      before assigning the value to rd_cdbpolicy.
      0bfc7251
    • P
      Fix duplicate distributed keys for CTAS · 7680b762
      Pengzhou Tang 提交于
      To keep it consistent with the "Create table" syntax, CTAS should also
      disallow duplicate distributed keys, otherwise backup and restore will
      mess up.
      7680b762
  4. 10 7月, 2018 1 次提交
  5. 06 7月, 2018 1 次提交
    • J
      Fix create gang failure on dns lookup error on down mirrors. · dd861e72
      Jialun 提交于
      If a segment exists in gp_segment_configuration but its ip address can
      not be resolved we will run into a runtime error on gang creation:
      
          ERROR:  could not translate host name "segment-0a", port "40000" to
          address: Name or service not known (cdbutil.c:675)
      
      This happens even if segment-0a is a mirror and is marked as down.  With
      this error queries can not be executed, gpstart and gpstop will also
      fail.
      
      One way to trigger the issue:
      
      - create a multiple segments cluster;
      - remove sdw1's dns entry from /etc/hosts on mdw;
      - kill postgres primary process on sdw1;
      
      FTS can detect this error and automatically switch to mirror, but
      queries can not be executed.
      dd861e72
  6. 04 7月, 2018 1 次提交
  7. 03 7月, 2018 1 次提交
    • A
      Remove FIXME from ao_insert_replay(). · e1aaa67c
      Ashwin Agrawal 提交于
      The AO implementation aligns with 8.4 and forward heap implementation, to write
      the data during recovery and not fail.
      
      Also, to note in case of AO the way seek it performed during replay, its not
      going to fail if file doesn't have that much data yet. As the way seek works
      irrespective of the length of file will seek to that offset from requested
      position and write the data (sure file will have hole in it for this case) . But
      it will not result in seek failure as such. We will write the data and if
      truncation has happened then will happen again during recovery.
      e1aaa67c
  8. 30 6月, 2018 1 次提交
    • S
      Fix 'no parameter found for initplan subquery' · f50e5daf
      Shreedhar Hardikar 提交于
      The issue happens because of constant folding in the testexpr of the
      SUBPLAN expression node. The testexpr may be reduced to a const and any
      PARAMs, previous used in the testexpr, disappear, However, the subplan
      still remains.
      
      This behavior is similar in upstream Postgres 10 and may be of
      performance consideration. Leaving that aside for now, the constant
      folding produces an elog(ERROR)s when the plan has subplans and no
      PARAMs are used. This check in `addRemoteExecParamsToParamList()` uses
      `context.params` which computes the used PARAMs in the plan and `nIntPrm
      = list_length(root->glob->paramlist`, which is the number of PARAMs
      declared/created.
      Given the ERROR messages generated, the above check makes no sense.
      Especially since it won’t even trip for the InitPlan bug (mentioned in
      the comments) as long as there is at least one PARAM in the query.
      
      This commit removes this check since it doesn't correctly capture the
      intent.
      
      In theory, it could be be replaced by one specifically aimed at
      InitPlans, that is, find all the params ids used by InitPlan and then
      make sure they are used in the plan. But we already do this and
      remove any unused initplans in `remove_unused_initplans()`. So I don’t
      see the point of adding that.
      
      Fixes #2839
      f50e5daf
  9. 27 6月, 2018 1 次提交
  10. 21 6月, 2018 1 次提交
  11. 19 6月, 2018 1 次提交
  12. 13 6月, 2018 1 次提交
  13. 07 6月, 2018 1 次提交
    • P
      Fix teardown issue of TCP interconnect · 1c1f1644
      Pengzhou Tang 提交于
      Previously, for an interconnect connection, if no data are available at
      sender peer, the sender sends a customized EOS packet to the receiver
      and disables further send operations using shutdown(SHUT_WR), then
      somehow, the sender closes the connection totally with close()
      immediately and it counts on the kernel and TCP stack to guarantee the
      data been transformed to the receiver. The problem is, on some platform,
      if the connection is closed on one side, the TCP behave is undetermined,
      the packets may be lost and receiver may report an unexpected error.
      
      The correct way is sender blocks on the connection until receiver
      getting the EOS packet and close its peer, then the sender can close
      the connection safely.
      1c1f1644
  14. 29 5月, 2018 1 次提交
    • N
      Support RETURNING for replicated tables. · fb7247b9
      Ning Yu 提交于
      * rpt: reorganize data when ALTER from/to replicated.
      
      There was a bug that altering from/to a replicated table has no effect,
      the root cause is that we did not change gp_distribution_policy neither
      reorganize the data.
      
      Now we perform the data reorganization by creating a temp table with the
      new dist policy and transfering all the data to it.
      
      * rpt: support RETURNING for replicated tables.
      
      This is to support below syntax (suppose foo is a replicated table):
      
      	INSERT INTO foo VALUES(1) RETURNING *;
      	UPDATE foo SET c2=c2+1 RETURNING *;
      	DELETE * FROM foo RETURNING *;
      
      A new motion type EXPLICIT GATHER MOTION is introduced in EXPLAIN
      output, data will be received from one explicit sender in this motion
      type.
      
      * rpt: fix motion type under explicit gather motion.
      
      Consider below query:
      
      	INSERT INTO foo SELECT f1+10, f2, f3+99 FROM foo
      	  RETURNING *, f1+112 IN (SELECT q1 FROM int8_tbl) AS subplan;
      
      We used to generate a plan like this:
      
      	Explicit Gather Motion 3:1  (slice2; segments: 3)
      	  ->  Insert
      	        ->  Seq Scan on foo
      	        SubPlan 1  (slice2; segments: 3)
      	          ->  Gather Motion 3:1  (slice1; segments: 1)
      	                ->  Seq Scan on int8_tbl
      
      A gather motion is used for the subplan, which is wrong and will cause a
      runtime error.
      
      A correct plan is like below:
      
      	Explicit Gather Motion 3:1  (slice2; segments: 3)
      	  ->  Insert
      	        ->  Seq Scan on foo
      	        SubPlan 1  (slice2; segments: 3)
      	          ->  Materialize
      	                ->  Broadcast Motion 3:3  (slice1; segments: 3)
      	                      ->  Seq Scan on int8_tbl
      
      * rpt: add test case for with both PRIMARY and UNIQUE.
      
      On a replicated table we could set both PRIMARY KEY and UNIQUE
      constraints, test cases are added to ensure this feature during future
      development.
      
      (cherry picked from commit 72af4af8)
      fb7247b9
  15. 28 5月, 2018 2 次提交
    • N
      Revert "Support RETURNING for replicated tables." · a74875cd
      Ning Yu 提交于
      This reverts commit 72af4af8.
      a74875cd
    • N
      Support RETURNING for replicated tables. · 72af4af8
      Ning Yu 提交于
      * rpt: reorganize data when ALTER from/to replicated.
      
      There was a bug that altering from/to a replicated table has no effect,
      the root cause is that we did not change gp_distribution_policy neither
      reorganize the data.
      
      Now we perform the data reorganization by creating a temp table with the
      new dist policy and transfering all the data to it.
      
      
      * rpt: support RETURNING for replicated tables.
      
      This is to support below syntax (suppose foo is a replicated table):
      
      	INSERT INTO foo VALUES(1) RETURNING *;
      	UPDATE foo SET c2=c2+1 RETURNING *;
      	DELETE * FROM foo RETURNING *;
      
      A new motion type EXPLICIT GATHER MOTION is introduced in EXPLAIN
      output, data will be received from one explicit sender in this motion
      type.
      
      
      * rpt: fix motion type under explicit gather motion.
      
      Consider below query:
      
      	INSERT INTO foo SELECT f1+10, f2, f3+99 FROM foo
      	  RETURNING *, f1+112 IN (SELECT q1 FROM int8_tbl) AS subplan;
      
      We used to generate a plan like this:
      
      	Explicit Gather Motion 3:1  (slice2; segments: 3)
      	  ->  Insert
      	        ->  Seq Scan on foo
      	        SubPlan 1  (slice2; segments: 3)
      	          ->  Gather Motion 3:1  (slice1; segments: 1)
      	                ->  Seq Scan on int8_tbl
      
      A gather motion is used for the subplan, which is wrong and will cause a
      runtime error.
      
      A correct plan is like below:
      
      	Explicit Gather Motion 3:1  (slice2; segments: 3)
      	  ->  Insert
      	        ->  Seq Scan on foo
      	        SubPlan 1  (slice2; segments: 3)
      	          ->  Materialize
      	                ->  Broadcast Motion 3:3  (slice1; segments: 3)
      	                      ->  Seq Scan on int8_tbl
      
      
      * rpt: add test case for with both PRIMARY and UNIQUE.
      
      On a replicated table we could set both PRIMARY KEY and UNIQUE
      constraints, test cases are added to ensure this feature during future
      development.
      72af4af8
  16. 19 5月, 2018 1 次提交
    • A
      Introduce RelationIsAppendOptimized() macro. · 958a672a
      Ashwin Agrawal 提交于
      Many places is code need to check if its row or column oriented storage table,
      which means basically is it AppendOptimized table or not. Currently its done by
      combination of two macros RelationIsAoRows() and RelationIsAoCols(). Simplify
      the same with new macro RelationIsAppendOptimized().
      958a672a
  17. 17 5月, 2018 1 次提交
    • A
      COPY: expand the type of numcompleted to 64 bits · 8d40268b
      Adam Lee 提交于
      Integer overflow occurs without this when copied more than 2^31 rows,
      under the `COPY ON SEGMENT` mode.
      
      Errors happen when it is casted to uint64, the type of `processed` in
      `CopyStateData`, third-party Postgres driver, which takes it as an
      int64, fails out of range.
      8d40268b
  18. 16 5月, 2018 2 次提交
    • A
      Avoid generating XLOG_APPENDONLY_INSERT for temp AO/CO tables. · a17f8028
      Ashwin Agrawal 提交于
      Temp tables don't have to be replicated neither crash safe and hence avoid
      generating xlog records for them. Heap already avoids, this patch skips for
      AO/CO tables as well. Adding new variable `isTempRel` to `BufferedAppend` to
      help perform the check for temp tables and skip generating xlog records.
      a17f8028
    • A
      Fix coverity CID 185522 and CID 185520. · 7a149323
      Ashwin Agrawal 提交于
      *** CID 185522:  Security best practices violations  (STRING_OVERFLOW)
      /tmp/build/0e1b53a0/gpdb_src/src/backend/cdb/cdbtm.c: 2486 in gatherRMInDoubtTransactions()
      
      and
      
      *** CID 185520:  Null pointer dereferences  (FORWARD_NULL)
      /tmp/build/0e1b53a0/gpdb_src/src/backend/storage/ipc/procarray.c: 2251 in GetSnapshotData()
      
      This condition cannot happen as `GetDistributedSnapshotMaxCount()` doesn't
      return 0 for DTX_CONTEXT_QD_DISTRIBUTED_CAPABLE and hence `inProgressXidArray`
      will always be initialzed. hence marked as ignore in coverity but still worth
      adding Assert for the same.
      7a149323
  19. 12 5月, 2018 2 次提交
  20. 11 5月, 2018 1 次提交
    • J
      Add guards to prevent negative and duplicate partition rule order values · fd9a83b1
      Jimmy Yih 提交于
      There were scenarios where adding a new partition to a partition table would
      cause a negative or duplicate partition rule order (parruleord) value to show
      up in the pg_partition_rule catalog table.
      
      1. Negative parruleord values could show up during parruleord gap closing when
         the new partition is inserted above a parruleord gap.
      2. Negative parruleord values could show up when the max number of partitions
         for that level has been reached (32767), and there is an attempt to add a
         new partition that would have been the highest ranked partition in that
         partition's partition range.
      3. Duplicate parruleord values could show up when the max number of partitions
         for that level has been reached (32767), and there is an attempt to add a
         new partition that would have been inserted between the partition table's
         sequence of parruleord values.
      Co-authored-by: NDavid Kimura <dkimura@pivotal.io>
      fd9a83b1
  21. 09 5月, 2018 2 次提交
    • X
      Refactor interconnect and dispatcher resource cleanup · b0353e0a
      xiong-gang 提交于
      Use resource owner to do the cleanup of dispatcher and interconnect(#4761)
      b0353e0a
    • A
      Eliminate distributed transaction log creation and maintenance on QD. · df19119c
      Ashwin Agrawal 提交于
      Commit b3f300b9 eliminated consulting distributed log in
      XidInMVCCSnapshot(). This was based on in the QD, the distributed transactions
      become visible at the same time as the corresponding local ones, so we can rely
      on the local XIDs only. So, given QD never consults distributed transaction log,
      lets completely eliminate its creation and maintenance on QD.
      
      Note, during initdb identity (QD or QE) is currently not known and hence still
      32k initial zero-filled distributed log file gets created on QD.
      df19119c
  22. 26 4月, 2018 1 次提交
  23. 24 4月, 2018 1 次提交
    • V
      Remove fixmes · c66bf1f9
      Venkatesh Raghavan 提交于
      Three-stage aggregate is an optimization to parallelize DISTINCT
      aggregate. Except for trivial cases (e.g. there is only one Aggref,
      or every Aggref has exactly the same aggfilter),
      it seems impossible to apply this class of optimizations to a
      query with SELECT aggfn(DISTINCT a) FILTER (WHERE ...)
      c66bf1f9
  24. 19 4月, 2018 2 次提交
    • D
      Speed up dispatcher detection of segment state changes · 85101317
      David Kimura 提交于
      Dispatcher has DISPATCH_WAIT_TIMEOUT_MSEC (current value is 2000) as poll
      timeout. It waited for 30 iterations of poll to timeout before checking the
      segment status. And then initiated fts probe before checking the segment
      status. As a result it took ~minute for query to fail in case of segment
      failures.
      
      This commit updates to check segment status on every poll timeout. It also
      leverages fts version to optimize whether to check segments. It avoids
      performing fts probe, instead it relies on fts to be called on regular
      intervals and provide cached results.
      
      With this change test time for twophase_tolerance_with_mirror_promotion was cut
      down by ~2 minutes.
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      85101317
    • N
      resgroup: backward compatibility for memory auditor · f2f86174
      Ning Yu 提交于
      Memory auditor was a new feature introduced to allow external components
      (e.g. pl/container) managed by resource group.  This feature requires a
      new gpdb dir to be created in cgroup memory controller, however on 5X
      branch unless the users created this new dir manually then the upgrade
      from a previous version would fail.
      
      In this commit we provide backward compatibility by checking the release
      version:
      
      - on 6X and master branches the memory auditor feature is always enabled
        so the new gpdb dir is mandatory;
      - on 5X branch only if the new gpdb dir is created with proper
        permissions the memory auditor feature could be enabled, when it's
        disabled `CREATE RESOURCE GROUP WITH (memory_auditor='cgroup') will fail
        with guide information on how to enable it;
      
      Binary swap tests are also provided to verify backward compatibility in
      future releases.  As cgroup need to be configured to enable resgroup we
      split the resgroup binary swap tests into two parts:
      
      - resqueue mode only tests which can be triggered in the
        icw_gporca_centos6 pipeline job after the ICW tests, these parts have
        no requirements on cgroup;
      - complete resqueue & resgroup modes tests which can be triggered in the
        mpp_resource_group_centos{6,7} pipeline jobs after the resgroup tests,
        these parts need cgroup to be properly configured;
      f2f86174
  25. 14 4月, 2018 1 次提交
    • A
      Remove AppendOnlyStorage_GetUsableBlockSize(). · 0a119de3
      Ashwin Agrawal 提交于
      When the blocksize is 2MB, the function AppendOnlyStorage_GetUsableBlockSize
      would give out the wrong usable block size. The expected result is 2MB. But the
      return value of the function call would give out (2M -4). This is because the
      macro AOSmallContentHeader_MaxLength is defined as (2M -1). After rounding down
      to 4 byte aligned, the result is (2M - 4).
      
      Without the fix can encounter errors as follows: "ERROR: Used length 2097152
      greater than bufferLen 2097148 at position 8388592 in table 'xxxx'".
      
      Also removed some related, but unused macro variables, just for cleaning up
      codes related to AO storage.
      Co-authored-by: NLirong Jian <jian@hashdata.cn>
      0a119de3
  26. 11 4月, 2018 1 次提交
  27. 09 4月, 2018 1 次提交
    • P
      Don't pass superuser flag of SessionUserId/OuterUserId to segments · 9698d51f
      Pengzhou Tang 提交于
      GUC "is_supersuer" only provide value for SHOW to display, it's
      useless on segments. Those two flags are all designed to determine
      the value of is_superuser, so it's unnecessary to pass them to the
      segments.
      
      With this commit, another problem is resolved, GPDB used to dispatch
      a command within empty transaction and resource owner to define an
      index concurrently. However, a syscache access in superuser_arg()
      may report a SIGSEGV because CurrentResourceOwner is NULL, this commit use
      SessionUserIsSuperuser instead of superuser_arg() to avoid such error.
      9698d51f
  28. 07 4月, 2018 2 次提交
    • A
      Make use of transaction options dispatched by QD on QE · a6eea210
      Asim R P 提交于
      To enforce consistent isolation level within a distributed transaction, local
      transactions on QEs should assume the same isolation level as the transaction
      on QD.  This was previously achieved by dispatching isolation level and
      read-only property from QD to QE as command line options.  In case of explicit
      BEGIN, the isolation level was dispatched as flags in DtxContextInfo.  This
      patch makes it consistent such that QEs (readers as well as writers) read the
      transaction options from DtxContextInfo.
      
      The poblem with setting transaction isolation level as command line opiton is
      command line options are processed during process initialization, when a
      transaction is already open and a snapshot is taken.  Changing isolation level
      after taking a snapshot is not correct.
      
      This patch allows merging with the check for transaction_isolation GUC as it
      stands in 9.1, without any Greenplum-specific changes.
      Co-authored-by: NJacob Champion <pchampion@pivotal.io>
      a6eea210
    • A
      Use ereportif() where logging is predicated by a debug GUC. · 4351a1e1
      Asim R P 提交于
      No change in functionality here.  The ereportif() macro avoids ereport()
      invocation if the preficate is not true.
      
      Fixed indentation in a couple of places on the way.
      4351a1e1
  29. 03 4月, 2018 1 次提交
  30. 02 4月, 2018 1 次提交
  31. 29 3月, 2018 1 次提交
    • P
      Support replicated table in GPDB · 7efe3204
      Pengzhou Tang 提交于
      * Support replicated table in GPDB
      
      Currently, tables are distributed across all segments by hash or random in GPDB. There
      are requirements to introduce a new table type that all segments have the duplicate
      and full table data called replicated table.
      
      To implement it, we added a new distribution policy named POLICYTYPE_REPLICATED to mark
      a replicated table and added a new locus type named CdbLocusType_SegmentGeneral to specify
      the distribution of tuples of a replicated table.  CdbLocusType_SegmentGeneral implies
      data is generally available on all segments but not available on qDisp, so plan node with
      this locus type can be flexibly planned to execute on either single QE or all QEs. it is
      similar with CdbLocusType_General, the only difference is that CdbLocusType_SegmentGeneral
      node can't be executed on qDisp. To guarantee this, we try our best to add a gather motion
      on the top of a CdbLocusType_SegmentGeneral node when planing motion for join even other
      rel has bottleneck locus type, a problem is such motion may be redundant if the single QE
      is not promoted to executed on qDisp finally, so we need to detect such case and omit the
      redundant motion at the end of apply_motion(). We don't reuse CdbLocusType_Replicated since
      it's always implies a broadcast motion bellow, it's not easy to plan such node as direct
      dispatch to avoid getting duplicate data.
      
      We don't support replicated table with inherit/partition by clause now, the main problem is
      that update/delete on multiple result relations can't work correctly now, we can fix this
      later.
      
      * Allow spi_* to access replicated table on QE
      
      Previously, GPDB didn't allow QE to access non-catalog table because the
      data is incomplete,
      we can remove this limitation now if it only accesses replicated table.
      
      One problem is QE need to know if a table is replicated table,
      previously, QE didn't maintain
      the gp_distribution_policy catalog, so we need to pass policy info to QE
      for replicated table.
      
      * Change schema of gp_distribution_policy to identify replicated table
      
      Previously, we used a magic number -128 in gp_distribution_policy table
      to identify replicated table which is quite a hack, so we add a new column
      in gp_distribution_policy to identify replicated table and partitioned
      table.
      
      This commit also abandon the old way that used 1-length-NULL list and
      2-length-NULL list to identify DISTRIBUTED RANDOMLY and DISTRIBUTED
      FULLY clause.
      
      Beside, this commit refactor the code to make the decision-making of
      distribution policy more clear.
      
      * support COPY for replicated table
      
      * Disable row ctid unique path for replicated table.
        Previously, GPDB use a special Unique path on rowid to address queries
        like "x IN (subquery)", For example:
        select * from t1 where t1.c2 in (select c2 from t3), the plan looks
        like:
         ->  HashAggregate
               Group By: t1.ctid, t1.gp_segment_id
                  ->  Hash Join
                        Hash Cond: t2.c2 = t1.c2
                      ->  Seq Scan on t2
                      ->  Hash
                          ->  Seq Scan on t1
      
        Obviously, the plan is wrong if t1 is a replicated table because ctid
        + gp_segment_id can't identify a tuple, in replicated table, a logical
        row may have different ctid and gp_segment_id. So we disable such plan
        for replicated table temporarily, it's not the best way because rowid
        unique way maybe the cheapest plan than normal hash semi join, so
        we left a FIXME for later optimization.
      
      * ORCA related fix
        Reported and added by Bhuvnesh Chaudhary <bchaudhary@pivotal.io>
        Fallback to legacy query optimizer for queries over replicated table
      
      * Adapt pg_dump/gpcheckcat to replicated table
        gp_distribution_policy is no longer a master-only catalog, do
        same check as other catalogs.
      
      * Support gpexpand on replicated table && alter the dist policy of replicated table
      7efe3204
  32. 28 3月, 2018 2 次提交
    • A
      Remove redundant distributed transaction state. · ecc217e8
      Asim R P 提交于
      The DTX_STATE_FORCED_COMMITTED was identical to
      DTX_STATE_INSERTED_COMMITTED.
      ecc217e8
    • A
      Avoid infinite loop in processCopyEndResults · 25bc3855
      Asim R P 提交于
      The command "COPY enumtest FROM stdin;" hit an infinite loop on merge
      branch.  Code indicates that the issue can happen on master as well.
      QD backend went into infinite loop when the connection was already
      closed from QE end.  The TCP connection was in CLOSE_WAIT state.
      Libpq connection status was CONNECTION_BAD and asyncStatus was
      PGASYNC_BUSY.
      
      Fix the infinite loop by checking libpq connection status in each
      iteration.
      25bc3855