• T
    Dispatcher can create flexible size gang (#5701) · a3ddac06
    Tang Pengzhou 提交于
    * change type of db_descriptors to SegmentDatabaseDescriptor **
    
    A new gang definination may consist of cached segdbDesc and new
    created segdbDesc, there is no need to palloc all segdbDesc struct
    as new.
    
    * Remove unnecessary allocate gang unit test
    
    * Manage idle segment dbs using CdbComponentDatabases instead of available* lists.
    
    To support vary size gang, we now need to manage segment dbs in a lower
    granularity, previously, idle QEs is managed by a bunch of lists like
    availablePrimaryWriterGang, availableReaderGangsN, this restrict
    dispatcher to only create N-size (N = number of segments) or 1-size
    gang.
    
    CdbComponentDatabases is a snapshot of segment components within current
    cluster, now it maintains a freelist for each segment component. When
    creating gang, dispatcher will make up a gang from each segment
    component (from freelist or create a new segment db). When cleaning up
    a gang, dispatcher will return idle segment dbs to each segment
    component.
    
    CdbComponentDatabases provide a few functions to manipulate segment dbs
    (SegmentDatabaseDescriptor *):
    * cdbcomponent_getCdbComponents
    * cdbcomponent_destroyCdbComponents
    * cdbcomponent_allocateIdleSegdb
    * cdbcomponent_recycleIdleSegdb
    * cdbcomponent_cleanupIdleSegdbs
    
    CdbComponentDatabases is also FTS version sensitive, so once a FTS
    version changed, CdbComponentDatabases destroy all idle segment dbs
    and allocate QEs in the new promoted segment. This provides the ability
    to transparent mirror failover to users.
    
    Since segment dbs(SegmentDatabaseDescriptor *) are managed by
    CdbComponentDatabases now, we can simplify the memory context
    management by replacing GangContext & perGangContext with
    DispatcherContext & CdbComponentsContext.
    
    * Postpone the error hanlding when creating gang
    
    Now we have AtAbort_DispatcherState, one advantage of it is that
    we can postpone gang error hanlding in this function and make
    code cleaner.
    
    * Handle FTS version change correctly
    
    In some cases, when a FTS version changed, we can't update current
    snapshot of segment components, to be more specifically, we can't
    destroy current writer segment dbs and create new segment dbs.
    
    These cases include:
    * session has temp table created.
    * query need two-phase commit and gxid has been dispatched to
      segments.
    
    * Replace <gangId, sliceId> map with <qeIdentifier, sliceId> map
    
    We used to dispatch a <gangId, sliceId> map along with query to
    segment dbs so segment dbs can know which slice they should
    execute.
    
    Now gangId is useless for a segment db because a segment db can
    be reused by different gang, so we need a new way to tell the
    info to segment dbs. To resolve this, CdbComponentDatabases
    assign a unique identifier to each segment db and make up a
    bitmap set which consist of segment identifiers for each slice,
    segment dbs then can go through the slice table and find the
    right slice to execute.
    
    * Allow dispatcher to create vary size gang and refine AssignGangs()
    
    Previously, dispatcher can only create N-size gang for
    GANGTYPE_PRIMARY_WRITER or GANGTYPE_PRIMARY_READER. this
    restrict dispatcher in many ways, one example is direct
    dispatch, it always create a N-size gang even it only
    dispatch the command to one segment, another example is
    some operations may be able to use N+ size gang, like
    hash join, if both inner and outer plan is redistributed,
    the hash join node can associate with a N+ size gang to
    execute. This commit changes the API of createGang() so the
    caller can specify a list of segments (partial or even
    duplicate segments), CdbCompoentDatabase will guarantee
    each segment has only one writer in a session. With this
    it also resolves another pain point of AssignGangs(), so
    the caller don't need to promote a GANGTYPE_PRIMARY_READER
    to GANGTYPE_PRIMARY_WRITER, or promote a GANGTYPE_SINGLETON
    _READER to GANGTYPE_PRIMARY_WRITER for replicated table
    (see FinalizeSliceTree()).
    
    With this commit, AssignGang() is very clear now.
    a3ddac06
cdbcopy.c 17.7 KB