1. 15 7月, 2020 1 次提交
    • H
      Cleanup idle reader gang after utility statements · d1ba4da5
      Hubert Zhang 提交于
      Reader gangs use local snapshot to access catalog, as a result, it will
      not synchronize with the sharedSnapshot from write gang which will
      lead to inconsistent visibility of catalog table on idle reader gang.
      Considering the case:
      
      select * from t, t t1; -- create a reader gang.
      begin;
      create role r1;
      set role r1;  -- set command will also dispatched to idle reader gang
      
      When set role command dispatched to idle reader gang, reader gang
      cannot see the new tuple t1 in catalog table pg_auth.
      To fix this issue, we should drop the idle reader gangs after each
      utility statement which may modify the catalog table.
      Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
      d1ba4da5
  2. 02 1月, 2020 1 次提交
    • G
      Avoid multiple 'abort' WAL records · 9cad3967
      Gang Xiong 提交于
      Things could go wrong between 'RecordTransactionAbort' and clear the
      'CurrentTransactionState.transactionId', that leads to multiple 'abort' WAL
      records. Putting 'rollbackDtxTransaction' in between increases the chance of
      it. This patch add a check in 'RecordTransactionAbort': skip add another
      'abort' WAL record if we are already in the middle of rollback,
      9cad3967
  3. 13 11月, 2019 1 次提交
    • P
      Flush error state before rethrowing error in dispatcher code. (#9027) · 143bb7c6
      Paul Guo 提交于
      This prevents panic due to "ERRORDATA_STACK_SIZE exceeded" which was found
      during testing.  The scenario happens when we test a transaction with multiple
      cursors. When a QE postmaster is dead, those cursor gangs could error out in
      SendChunkUDPIFC() -> checkExceptions(), causing QD loop in PostgresMain() ->
      AbortCurrentTransaction() ->...-> mppExecutorFinishup() -> ReThrowError() ->
      PostgresMain() -> AbortTransaction(). Note ReThrowError() would increase errordata_stack_depth by 1 so it could lead to error stack overflow (finally the database would PANIC to prevent real overflow).
      
      Typical stack of one item in the errordata array is as below,
      
      	stacktracearray = {[0] = 0xadb39d <errstart+1086>, [1] = 0xb93362 <cdbdisp_get_PQerror+289>, [2] = 0xb93197 <cdbdisp_dumpDispatchResult+87>,
      	[3] = 0xb935f3 <cdbdisp_dumpDispatchResults+226>, [4] = 0xb8ffef <cdbdisp_getDispatchResults+169>, [5] = 0x7545cf <mppExecutorFinishup+235>,
      	[6] = 0x736c77 <standard_ExecutorEnd+706>, [7] = 0x7369b2 <ExecutorEnd+54>, [8] = 0x6c4e28 <PortalCleanup+345>, [9] = 0xb2168b <AtAbort_Portals+214>,
      	[10] = 0x53b996 <AbortTransaction+356>, [11] = 0x53c467 <AbortCurrentTransaction+214>, [12] = 0x9714d9 <PostgresMain+1728>, [13] = 0x8dfdf4 <ExitPostmaster>,
      	[14] = 0x8df485 <BackendStartup+371>, [15] = 0x8db324 <ServerLoop+825>, [16] = 0x8da870 <PostmasterMain+4908>, [17] = 0x7d5467 <startup_hacks>,
      	[18] = 0x7f5311c51c05 <__libc_start_main+245>, [19] = 0x48e039 <_start+41>, [20] = 0x0, [21] = 0x0, [22] = 0x0, [23] = 0x0, [24] = 0x0, [25] = 0x0, [26] = 0x0,
      	[27] = 0x0, [28] = 0x0, [29] = 0x0},
      
      In icg testing, typically running tests with portals.sql + killing one QE postmaster sometimes triggers this issue.
      
      Fixing this by adding FlushErrorState() before ReThrowError(). Calling
      FlushErrorState() before ReThrowError() is usual if the error was
      copied in advance (e.g. via CopyErrorData()).
      
      Also tweak error logging code a bit.
      
      - errfinish_and_return() does not pfree some memory although so far no callers
      set those variables.
      
      - errfinish_and_return() should put context callback function calls in memory
      context ErrorContext.
      
      - Some callers of errstart() does not check return value. This is wrong though
      currently our callers in the patch does not seem to suffer from this issue, but
      it is a good habit to check.
      
      Reviewed-by: Georgios Kokolatos
      Reviewed-by: Asim R P
      143bb7c6
  4. 28 10月, 2019 1 次提交
    • P
      Fix various issues in Gang management (#8893) · 797065c5
      Paul Guo 提交于
      1. Do not call elog(FATAL) in cleanupQE() since it could be called in
      cdbdisp_destroyDispatcherState() to destroy CdbDispatcherState.  This leads to
      reentrance of cdbdisp_destroyDispatcherState() which is not supported. Changing
      the code to return false instead and to sanity check the reentrance. Returning
      false should be ok since that leads to gang destroying and thus QE resources
      should be destroyed themselves. Here is a typical stack of reentrance.
      
      0x0000000000b8ffeb in cdbdisp_destroyDispatcherState (ds=0x2eff168) at cdbdisp.c:345
      0x0000000000b90385 in cleanup_dispatcher_handle (h=0x2eff0d8) at cdbdisp.c:488
      0x0000000000b904c0 in cdbdisp_cleanupDispatcherHandle (owner=0x2e80de0) at cdbdisp.c:555
      0x0000000000b27fb7 in CdbResourceOwnerWalker (owner=0x2e80de0, callback=0xb90479 <cdbdisp_cleanupDispatcherHandle>) at resowner.c:1375
      0x0000000000b27fd8 in CdbResourceOwnerWalker (owner=0x2f30358, callback=0xb90479 <cdbdisp_cleanupDispatcherHandle>) at resowner.c:1379
      0x0000000000b903d9 in AtAbort_DispatcherState () at cdbdisp.c:511
      0x000000000053b8ab in AbortTransaction () at xact.c:3319
      0x000000000053e057 in AbortOutOfAnyTransaction () at xact.c:5248
      0x00000000005c6869 in RemoveTempRelationsCallback (code=1, arg=0) at namespace.c:4088
      0x000000000093c193 in shmem_exit (code=1) at ipc.c:257
      0x000000000093c088 in proc_exit_prepare (code=1) at ipc.c:214
      0x000000000093bf86 in proc_exit (code=1) at ipc.c:104
      0x0000000000adb6e2 in errfinish (dummy=0) at elog.c:754
      0x0000000000ade465 in elog_finish (elevel=21, fmt=0xe847c0 "cleanup called when a segworker is still busy") at elog.c:1735
      0x0000000000beca81 in cleanupQE (segdbDesc=0x2ee9048) at cdbutil.c:846
      0x0000000000becbc8 in cdbcomponent_recycleIdleQE (segdbDesc=0x2ee9048, forceDestroy=0 '\000') at cdbutil.c:871
      0x0000000000b9815a in RecycleGang (gp=0x2eff7f0, forceDestroy=0 '\000') at cdbgang.c:861
      0x0000000000b9009e in cdbdisp_destroyDispatcherState (ds=0x2eff168) at cdbdisp.c:372
      0x0000000000b96957 in CdbDispatchCopyStart (cdbCopy=0x2f23828, stmt=0x2e364d0, flags=5) at cdbdisp_query.c:1442
      
      2. Force to drop the reader gang for named portal if set command happens
      previously since that setting was not dispatched to that gang and thus we
      should not reuse them.
      
      3. Now that we have the mechanism of destroying DispatcherState in resource
      owner callback when aborting transaction. It is not needed to destroy in some
      dispatcher code.
      
      The added test cases and some existing test cases cover almost all code change
      except the change in cdbdisp_dispatchX() (I can not find a solution to test
      this, and I'll keep it in my mind to see how to test that or similar code).
      
      Reviewed-by: Pengzhou Tang
      Reviewed-by: Asim R P
      797065c5
  5. 22 7月, 2019 1 次提交
    • N
      Keep the order of reusing idle gangs · 51a7ea27
      Ning Yu 提交于
      For example:
      In the same session,
      query 1 has 3 slices and it creates gang 1, gang 2 and gang 3.
      query 2 has 2 slices, we hope it reuses gang 1 and gang 2 instead of other
      cases like gang 3 and gang 2.
      
      In this way, the two queries can have the same send-receive port pair. It's
      useful in platform like Azure. Because Azure limits the number of different
      send-receive port pairs (AKA flow) in a certain time period.
      Co-authored-by: NHubert Zhang <hzhang@pivotal.io>
      Co-authored-by: NPaul Guo <pguo@pivotal.io>
      Co-authored-by: NNing Yu <nyu@pivotal.io>
      51a7ea27
  6. 28 12月, 2018 1 次提交
  7. 14 12月, 2018 1 次提交
  8. 13 12月, 2018 1 次提交
  9. 14 11月, 2018 1 次提交
    • H
      Fix dispatching of mixed two-phase and one-phase queries. · 9db30681
      Heikki Linnakangas 提交于
      Commit 576690f2 added tracking of which segments have been enlisted in
      a distributed transaction. However, it registered every query dispatched
      with CdbDispatch*() in the distributed transaction, even if the query was
      dispatched without the DF_NEED_TWO_PHASE flag to the segments. Without
      DF_NEED_TWO_PHASE, the QE will believe it's not part of a distributed
      transaction, and will throw an error when the QD tries to prepare it for
      commit:
      
      ERROR:  Distributed transaction 1542107193-0000000232 not found (cdbtm.c:3031)
      
      This can occur if a command updates a partially distributed table (a table
      with gp_distribution_policy.numsegments smaller than the cluster size),
      and uses one of the backend functions, like pg_relation_size(), that
      dispatches an internal query to all segments.
      
      Fix the confusion, by only registering commands that are dispatched with
      the DF_NEED_TWO_PHASE flag in the distributed transaction.
      Reviewed-by: NPengzhou Tang <ptang@pivotal.io>
      9db30681
  10. 07 11月, 2018 1 次提交
    • Z
      Adjust GANG size according to numsegments · 6dd2759a
      ZhangJackey 提交于
      Now we have  partial tables and flexible GANG API, so we can allocate
      GANG according to numsegments.
      
      With the commit 4eb65a53, GPDB supports table distributed on partial segments,
      and with the series of commits (a3ddac06, 576690f2), GPDB supports flexible
      gang API. Now it is a good time to combine both the new features. The goal is
      that creating gang only on the necessary segments for each slice. This commit
      also improves singleQE gang scheduling and does some code clean work. However,
      if ORCA is enabled, the behavior is just like before.
      
      The outline of this commit is:
      
        * Modify the FillSliceGangInfo API so that gang_size is truly flexible.
        * Remove numOutputSegs and outputSegIdx fields in motion node. Add a new
           field isBroadcast to mark if the motion is a broadcast motion.
        * Remove the global variable gp_singleton_segindex and make singleQE
           segment_id randomly(by gp_sess_id).
        * Remove the field numGangMembersToBeActive in Slice because it is now
           exactly slice->gangsize.
        * Modify the message printed if the GUC Test_print_direct_dispatch_info
           is set.
        * Explicitly BEGIN create a full gang now.
        * format and remove destSegIndex
        * The isReshuffle flag in ModifyTable is useless, because it only is used
           when we want to insert tuple to the segment which is out the range of
           the numsegments.
      
      Co-authored-by: Zhenghua Lyu zlv@pivotal.io
      6dd2759a
  11. 29 10月, 2018 1 次提交
    • T
      Simplify direct dispatch related code (#6080) · 576690f2
      Tang Pengzhou 提交于
      * Simplify direct dispatch related code
      
      This commit include two parts:
      * simplify direct-dispatch dispatching code
      * simplify direct-dispatch DTM related code
      
      Previously, cdbdisp_dispatchToGang need a CdbDispatchDirectDesc info,
      now gang only contain inuse segments, so direct-dispatch info is useless.
      
      Another thing is, we need to decide if DTM is available for direct-dispatch
      within dtmPreCommand, the logic is complex, you need to know if the main plan
      is direct-dispatch and if the init plan contain direct-dispatch.
      
      one example is:
      "update foo set foo.c2 = 2
      where foo.c1 = 1 and exists (select * from bar where bar.c1=4)"
      
      main plan can be direct dispatched to segment 1, init plan can be direct
      dispatched to segment 2, with the old logic, the DTM like PREPARE need to
      dispatched to all segments, so dtmPreCommand need to dispatch a DTM named
      'DTX_PROTOCOL_COMMAND_STAY_AT_OR_BECOME_IMPLIED' to all segment so those
      segments like segment 3 who didn't receive the plan can be ready for two
      phase commit.
      
      With the new gang API, we can simplify this process, we add a list in
      currentGxact to record which segments are actually get involved in a two
      phase commit, then we can dispatch DTM to them directly. This is also very
      usefully for queries on tables that are not fully expaned yet.
      
      * support direct dispatch to more than one segment
      576690f2
  12. 25 10月, 2018 1 次提交
    • T
      Unify the way to fetch/manage the number of segments (#6034) · 8eed4217
      Tang Pengzhou 提交于
      * Don't use GpIdentity.numsegments directly for the number of segments
      
      Use getgpsegmentCount() instead.
      
      * Unify the way to fetch/manage the number of segments
      
      Commit e0b06678 lets us expanding a GPDB cluster without a restart,
      the number of segments may changes during a transaction now, so we
      need to take care of the numsegments.
      
      We now have two way to get segments number, 1) from GpIdentity.numsegments
      2) from gp_segment_configuration (cdb_component_dbs) which dispatcher used
      to decide the segments range of dispatching. We did some hard work to
      update GpIdentity.numsegments correctly within e0b06678 which made the
      management of segments more complicated, now we want to use an easier way
      to do it:
      
      1. We only allow getting segments info (include number of segments) through
      gp_segment_configuration, gp_segment_configuration has newest segments info,
      there is no need to update GpIdentity.numsegments, GpIdentity.numsegments is
      left only for debugging and can be removed totally in the future.
      
      2. Each global transaction fetches/updates the newest snapshot of
      gp_segment_configuration and never change it until the end of transaction
      unless a writer gang is lost, so a global transaction can see consistent
      state of segments. We used to use gxidDispatched to do the same thing, now
      it can be removed.
      
      * Remove GpIdentity.numsegments
      
      GpIdentity.numsegments take no effect now, remove it. This commit
      does not remove gp_num_contents_in_cluster because it needs to
      modify utilities like gpstart, gpstop, gprecoverseg etc, let's
      do such cleanup work in another PR.
      
      * Exchange the default UP/DOWN value in fts cache
      
      Previously, Fts prober read gp_segment_configuration, checked the
      status of segments and then set the status of segments in the shared
      memory struct named ftsProbeInfo->fts_status[], so other components
      (mainly used by dispatcher) can detect a segment was down.
      
      All segments were initialized as down and then be updated to up in
      most common cases, this brings two problems:
      
      1. The fts_status is invalid until FTS does the first loop, so QD
      need to check ftsProbeInfo->fts_statusVersion > 0
      2. gpexpand add a new segment in gp_segment_configuration, the
      new added segment may be marked as DOWN if FTS doesn't scan it
      yet.
      
      This commit changes the default value from DOWN to UP which can
      resolve problems mentioned above.
      
      * Fts should not be used to notify backends that a gpexpand occurs
      
      As Ashwin mentioned in PR#5679, "I don't think giving FTS responsibility to
      provide new segment count is right. FTS should only be responsible for HA
      of the segments. The dispatcher should independently figure out the count
      based on catalog.gp_segment_configuration should be the only way to get
      the segment count", FTS should decouple from gpexpand.
      
      * Access gp_segment_configuration inside a transaction
      
      * upgrade log level from ERROR to FATAL if expand version changed
      
      * Modify gpexpand test cases according to new design
      8eed4217
  13. 15 10月, 2018 1 次提交
    • N
      Retire threaded dispatcher · 87394a7b
      Ning Yu 提交于
      Now there is only the async dispatcher.  The dispatcher API interface is
      kept so we might add new backend in the future.
      
      The GUC gp_connections_per_thread is also retired which was used to
      switch between the async and threaded backends.
      87394a7b
  14. 27 9月, 2018 1 次提交
    • T
      Dispatcher can create flexible size gang (#5701) · a3ddac06
      Tang Pengzhou 提交于
      * change type of db_descriptors to SegmentDatabaseDescriptor **
      
      A new gang definination may consist of cached segdbDesc and new
      created segdbDesc, there is no need to palloc all segdbDesc struct
      as new.
      
      * Remove unnecessary allocate gang unit test
      
      * Manage idle segment dbs using CdbComponentDatabases instead of available* lists.
      
      To support vary size gang, we now need to manage segment dbs in a lower
      granularity, previously, idle QEs is managed by a bunch of lists like
      availablePrimaryWriterGang, availableReaderGangsN, this restrict
      dispatcher to only create N-size (N = number of segments) or 1-size
      gang.
      
      CdbComponentDatabases is a snapshot of segment components within current
      cluster, now it maintains a freelist for each segment component. When
      creating gang, dispatcher will make up a gang from each segment
      component (from freelist or create a new segment db). When cleaning up
      a gang, dispatcher will return idle segment dbs to each segment
      component.
      
      CdbComponentDatabases provide a few functions to manipulate segment dbs
      (SegmentDatabaseDescriptor *):
      * cdbcomponent_getCdbComponents
      * cdbcomponent_destroyCdbComponents
      * cdbcomponent_allocateIdleSegdb
      * cdbcomponent_recycleIdleSegdb
      * cdbcomponent_cleanupIdleSegdbs
      
      CdbComponentDatabases is also FTS version sensitive, so once a FTS
      version changed, CdbComponentDatabases destroy all idle segment dbs
      and allocate QEs in the new promoted segment. This provides the ability
      to transparent mirror failover to users.
      
      Since segment dbs(SegmentDatabaseDescriptor *) are managed by
      CdbComponentDatabases now, we can simplify the memory context
      management by replacing GangContext & perGangContext with
      DispatcherContext & CdbComponentsContext.
      
      * Postpone the error hanlding when creating gang
      
      Now we have AtAbort_DispatcherState, one advantage of it is that
      we can postpone gang error hanlding in this function and make
      code cleaner.
      
      * Handle FTS version change correctly
      
      In some cases, when a FTS version changed, we can't update current
      snapshot of segment components, to be more specifically, we can't
      destroy current writer segment dbs and create new segment dbs.
      
      These cases include:
      * session has temp table created.
      * query need two-phase commit and gxid has been dispatched to
        segments.
      
      * Replace <gangId, sliceId> map with <qeIdentifier, sliceId> map
      
      We used to dispatch a <gangId, sliceId> map along with query to
      segment dbs so segment dbs can know which slice they should
      execute.
      
      Now gangId is useless for a segment db because a segment db can
      be reused by different gang, so we need a new way to tell the
      info to segment dbs. To resolve this, CdbComponentDatabases
      assign a unique identifier to each segment db and make up a
      bitmap set which consist of segment identifiers for each slice,
      segment dbs then can go through the slice table and find the
      right slice to execute.
      
      * Allow dispatcher to create vary size gang and refine AssignGangs()
      
      Previously, dispatcher can only create N-size gang for
      GANGTYPE_PRIMARY_WRITER or GANGTYPE_PRIMARY_READER. this
      restrict dispatcher in many ways, one example is direct
      dispatch, it always create a N-size gang even it only
      dispatch the command to one segment, another example is
      some operations may be able to use N+ size gang, like
      hash join, if both inner and outer plan is redistributed,
      the hash join node can associate with a N+ size gang to
      execute. This commit changes the API of createGang() so the
      caller can specify a list of segments (partial or even
      duplicate segments), CdbCompoentDatabase will guarantee
      each segment has only one writer in a session. With this
      it also resolves another pain point of AssignGangs(), so
      the caller don't need to promote a GANGTYPE_PRIMARY_READER
      to GANGTYPE_PRIMARY_WRITER, or promote a GANGTYPE_SINGLETON
      _READER to GANGTYPE_PRIMARY_WRITER for replicated table
      (see FinalizeSliceTree()).
      
      With this commit, AssignGang() is very clear now.
      a3ddac06
  15. 06 9月, 2018 1 次提交
    • T
      Integrate Gang management from portal to Dispatcher and simplify AssignGangs for init plans (#5555) · 78a4890a
      Tang Pengzhou 提交于
      * Simplify the AssignGangs() logic for init plans
      
      Previously, AssignGangs() assign gangs for both main plans and
      init plans in one shot. Because init plans and main plan are
      executed sequentially, so the gangs can be reused between main
      plan and init plans, function AccumSliceReq() is designed for
      this.
      
      This process can be simplified: already know the root slice
      index id will be adjusted to according init plan id, init plan
      only need to assign their own slices.
      
      * Integrate Gang management from portal to Dispatcher
      
      Previously, Gang was managed by portal, freeGangsForPortal()
      was used to cleanup gang resource, DTM related commands also
      needed a gang to dispatch command outside of a portal and
      used freeGangsForPortal() too. There might be multiple
      command/plan/utility executed within one portal, all commands
      relied on a dispatcher routine like CdbDispatchCommand /
      CdbDispatchPlan/CdbDispatchUtility... to dispatch, gangs were
      created by each dispatcher routines, but not be recycled or
      destroyed when a routine finished except for primary writer
      gang, one defect of this is gang resource cannot be reused
      between dispatcher routines. GPDB already had an optimization
      for init plans, if a plan contained init plans, AssignGangs
      was called before execution of any of them it went through
      the whole slice tree and created the maximum gang that both
      main plan and init plans needed, this was doable because init
      plans and main plan were executed sequentially, but it also
      made AssignGangs logic complex, meanwhile, reusing an not
      clean gang was not safe.
      
      Another confusing thing was the gang and dispatcher were
      managed separately which cause context inconsistent like:
      when a dispatcher state was destroyed, gang was not recycled,
      when a gang was destroyed by portal, the dispatcher state was
      still in use and may refer to the context of a destroyed gang.
      
      As described above, this commit integrates gang management
      with dispatcher, a dispather state is responsible for creating
      and tracking gangs as needed and destroy them when dispatcher
      state is destroyed.
      
      * Handle the case when primary writer gang has gone
      
      When members of primary writer gang gone, the writer gang
      is destroyed immediately (primaryWriterGang is set to NULL)
      when a dispatcher rountine (eg.CdbDispatchCommand) finished.
      So when dispatching two-phase-DTM/DTX related command, QD
      doesn't know writer gang has gone, it may get unexpected
      error like 'savepoint not exist', 'subtransaction level not
      match', 'temp file not exist'.
      
      Previously, primaryWriterGang is not reset when DTM/DTX
      commands start even it is pointing to invalid segments, so
      those DTM/DTX commands will not actually sent to segments,
      an normal error reported on QD looks like 'could not
      connect to segment: initialization of segworker'.
      
      So we need a way to info global transaction that its writer
      gang has lost. so when aborting transaction, QD can:
      1. disconnect all reader gangs, this is usefull to skip
      dispatching "ABORT_NO_PREPARE"
      2. reset session and drop temp files because temp files in
      segment is gone.
      3. report a error when dispatching "rollback savepoint" DTX
      because savepoint in segment is gone.
      4. report a error when dispatch "abort subtransaction" DTX
      because subtransaction is rollback when writer segment is down.
      78a4890a
  16. 14 8月, 2018 2 次提交
    • P
      Remove cdbdisp_finishCommand · 957629d1
      Pengzhou Tang 提交于
      Previously, cdbdisp_finishCommand did three things:
      1. cdbdisp_checkDispatchResult
      2. cdbdisp_getDispatchResult
      3. cdbdisp_destroyDispatcherState
      
      However, cdbdisp_finishCommand didn't make code cleaner or more
      convenient to use, in contrast, it makes error handling more
      difficult and makes code more complicated and inconsistent.
      
      This commit also reset estate->dispatcherState to NULL to avoid
      re-entry of cdbdisp_* functions.
      957629d1
    • P
      Rename CdbCheckDispatchResult for name convention · 60bd3ab2
      Pengzhou Tang 提交于
      Use cdbdisp_checkDispatchResult instead of CdbCheckDispatchResult
      to be consistent of cdbdisp_* functions.
      60bd3ab2
  17. 09 5月, 2018 1 次提交
  18. 01 3月, 2018 1 次提交
    • H
      Give a better error message, if preparing an xact fails. · b3c50e40
      Heikki Linnakangas 提交于
      If an error happens in the prepare phase of two-phase commit, relay the
      original error back to the client, instead of the fairly opaque
      "Abort [Prepared]' broadcast failed to one or more segments" message you
      got previously. A lot of things happen during the prepare phase that
      can legitimately fail, like checking deferred constraints, like in the
      'constraints' regression test. But even without that, there can be
      triggers, ON COMMIT actions, etc., any of which can fail.
      
      This commit consists of several parts:
      
      * Pass 'true' for the 'raiseError' argument when dispatching the prepare
        dtx command in doPrepareTransaction(), so that the error is emitted to
        the client.
      
      * Bubble up an ErrorData struct, with as many fields intact as possible,
        to the caller,  when dispatching a dtx command. (Instead of constructing
        a message in a StringInfo). So that we can re-throw the message to
        the client, with its original formatting.
      
      * Don't throw an error in performDtxProtocolCommand(), if we try to abort
        a prepared transaction that doesn't exist. That is business-as-usual,
        if a transaction throws an error before finishing the prepare phase.
      
      * Suppress the "NOTICE: Releasing segworker groups to retry broadcast."
        message, when aborting a prepared transaction.
      
      Put together, the effect is if an error happens during prepare phase, the
      client receives a message that is largely indistinguishable from the
      message you'd get if the same failure happened while running a normal
      statement.
      
      Fixes github issue #4530.
      b3c50e40
  19. 25 1月, 2018 1 次提交
    • D
      Propagate segment errcodes to dispatcher · 58003bc7
      Daniel Gustafsson 提交于
      The errcode thrown in an ereport() on a segment was passed back to
      the dispatcher, but then dropped and replaced with a default errcode
      of ERRCODE_DATA_EXCEPTION. This works for most situations, but when
      trapping errors the exact errcode must be propagated. This extends
      the API to extract the errcode as well. The below case illustrates
      the previous issue:
      
        CREATE TABLE test1(id int primary key);
        CREATE TABLE test2(id int primary key);
        INSERT INTO test1 VALUES(1);
        INSERT INTO test2 VALUES(1);
        CREATE OR REPLACE FUNCTION merge_table() RETURNS void AS $$
        DECLARE
      	v_insert_sql varchar;
        BEGIN
      	v_insert_sql :='INSERT INTO test1 SELECT * FROM test2';
      	EXECUTE v_insert_sql;
      	EXCEPTION WHEN unique_violation THEN
      		RAISE NOTICE 'unique_violation';
      	END;
        $$ LANGUAGE plpgsql volatile;
        SELECT merge_table();
      58003bc7
  20. 02 11月, 2017 1 次提交
    • H
      Wake up faster, if a segment returns an error. · 3bbedbe9
      Heikki Linnakangas 提交于
      Previously, if a segment reported an error after starting up the
      interconnect, it would take up to 250 ms for the main thread in the QD
      process to wake up and poll the dispatcher connections, and to see that
      there was an error. Shorten that time, by waking up immediately if the
      QD->QE libpq socket becomes readable while we're waiting for data to
      arrive in a Motion node.
      
      This isn't a complete solution, because this will only wake up if one
      arbitrarily chosen connection becomes readable, and we still rely on
      polling for the others. But this greatly speeds up many common scenarios.
      In particular, the "qp_functions_in_select" test now runs in under 5 s
      on my laptop, when it took about 60 seconds before.
      3bbedbe9
  21. 30 10月, 2017 1 次提交
  22. 10 10月, 2017 1 次提交
  23. 01 9月, 2017 1 次提交
  24. 09 8月, 2017 1 次提交
    • P
      Do not include gp-libpq-fe.h and gp-libpq-int.h in cdbconn.h · cf7cddf7
      Pengzhou Tang 提交于
      The whole cdb directory was shipped to end users and all header files
      that cdb*.h included are also need to be shipped to make checkinc.py
      pass. However, exposing gp_libpq_fe/*.h will confuse customer because
      they are almost the same as libpq/*, as Heikki's suggestion, we should
      keep gp_libpq_fe/* unchanged. So to make system work, we include
      gp-libpg-fe.h and gp-libpq-int.h directly in c files that need them
      cf7cddf7
  25. 08 8月, 2017 1 次提交
    • H
      Remove unnecessary use of PQExpBuffer. · cc38f526
      Heikki Linnakangas 提交于
      StringInfo is more appropriate in backend code. (Unless the buffer needs to
      be used in a thread.)
      
      In the passing, rename the 'conn' static variable in cdbfilerepconnclient.c.
      It seemed overly generic.
      cc38f526
  26. 18 11月, 2016 2 次提交
    • H
      Use proper error code for errors. · 0bf31cd6
      Heikki Linnakangas 提交于
      Attach a suitable error code for many errors that were previously reported
      as "internal errors". GPDB's elog.c prints a source file name and line
      number for any internal errors, which is a bit ugly for errors that are
      in fact unexpected internal errors, but user-facing errors that happen
      as a result of e.g. an invalid query.
      
      To make sure we don't accumulate more of these, adjust the regression tests
      to not ignore the source file and line number in error messages. There are
      a few exceptions, which are listed explicitly.
      0bf31cd6
    • H
      Remove errOmitLocation. · f6f5c9ef
      Heikki Linnakangas 提交于
      It was somewhat broken. The order of evaluation of function arguments is
      implementation-specific, so in a statement like:
      
      ereport(ERROR,
              (errcode(ERRCODE_INTERNAL_ERROR),
      	 errOmitLocation(true)));
      
      We cannot assume that errOmitLocation() is evaluated after errcode(). If
      errOmitLocation() is evaluated first, the errcode() call could overwrite
      the omit_location field, seeing that the error code was "internal error".
      
      Almost all of the errOmitLocation calls in the codebase were superfluous
      anyway. The default logic is to omit the location for anything else than
      ERRCODE_INTERNAL error, so it is not necessary to call errOmitLocation(true),
      if errcode (other than ERRCODE_INTERNAL_ERROR) is given. Likewise
      errOmitLocation(false) is not needed for internal errors.
      
      Remove the whole errOmitLocation() function. It's not really needed. The
      most notable callsite where it mattered was in cdbdisp.c, but that one
      was broken by the order-of-evaluation issue. Use a different error code
      there. What we really should do there is to pass the error code from the
      segment back to the client, but I'll leave that for another day.
      f6f5c9ef
  27. 14 11月, 2016 1 次提交
    • X
      Use nonblocking mechanism to send data in async dispatcher. · 2516eac6
      xiong-gang 提交于
      pqFlush is sending data synchronously though the socket is set
      O_NONBLOCK, this incurs performance downgradation. This commit uses
      pqFlushNonBlocking instead, and synchronizes the completion of
      dispatching to all Gangs before query execution.
      
      Signed-off-by: Kenan Yao<kyao@pivotal.io>
      2516eac6
  28. 04 11月, 2016 1 次提交
  29. 29 8月, 2016 1 次提交
    • P
      Fix few dispatch related bugs · eb40e073
      Pengzhou Tang 提交于
      1.Fix primary writer gang leak: accidentally set PrimaryWriterGang to NULL which cause disconnectAndDestroyAllGangs()
        can not destroy primary writer gang.
      2.Fix gang leak: when creating gang, if retry count exceed the limitation, forget to destroy the failed gang.
      3.Remove duplicate sanity check before dispatchCommand().
      4.Remove unnecessary error-out when a broken Gang is no longer needed.
      5.Fix thread leak problem
      6.Enhance error handling for cdbdisp_finishCommand
      eb40e073
  30. 25 7月, 2016 1 次提交
    • P
      Refactor utility statement dispatch interfaces · 01769ada
      Pengzhou Tang 提交于
      refactor CdbDispatchUtilityStatement() to make it flexible for cdbCopyStart(),
      dispatchVacuum() to call directly. Introduce flags like DF_NEED_TWO_SNAPSHOT,
      DF_WITH_SNAPSHOT, DF_CANCEL_ON_ERROR to make function call much clearer
      01769ada
  31. 17 7月, 2016 3 次提交
  32. 28 6月, 2016 1 次提交
  33. 22 6月, 2016 1 次提交
  34. 20 6月, 2016 2 次提交
  35. 19 5月, 2016 1 次提交