• N
    Retire the reshuffle method for table data expansion (#7091) · 1c262c6e
    Ning Yu 提交于
    This method was introduced to improve the data redistribution
    performance during gpexpand phase2, however per benchmark results the
    effect does not reach our expectation.  For example when expanding a
    table from 7 segments to 8 segments the reshuffle method is only 30%
    faster than the traditional CTAS method, when expanding from 4 to 8
    segments reshuffle is even 10% slower than CTAS.  When there are indexes
    on the table the reshuffle performance can be worse, and extra VACUUM is
    needed to actually free the disk space.  According to our experiments
    the bottleneck of reshuffle method is on the tuple deletion operation,
    it is much slower than the insertion operation used by CTAS.
    
    The reshuffle method does have some benefits, it requires less extra
    disk space, it also requires less network bandwidth (similar to CTAS
    method with the new JCH reduce method, but less than CTAS + MOD).  And
    it can be faster in some cases, however as we can not automatically
    determine when it is faster it is not easy to get benefit from it in
    practice.
    
    On the other side the reshuffle method is less tested, it is possible to
    have bugs in corner cases, so it is not production ready yet.
    
    In such a case we decided to retire it entirely for now, we might add it
    back in the future if we can get rid of the slow deletion or find out
    reliable ways to automatically choose between reshuffle and ctas
    methods.
    
    Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/8xknWag-SkI/5OsIhZWdDgAJReviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
    Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
    1c262c6e
analyze.c 113.1 KB