• P
    Implement XXH3 block checksum type (#9069) · a7d4bea4
    Peter Dillinger 提交于
    Summary:
    XXH3 - latest hash function that is extremely fast on large
    data, easily faster than crc32c on most any x86_64 hardware. In
    integrating this hash function, I have handled the compression type byte
    in a non-standard way to avoid using the streaming API (extra data
    movement and active code size because of hash function complexity). This
    approach got a thumbs-up from Yann Collet.
    
    Existing functionality change:
    * reject bad ChecksumType in options with InvalidArgument
    
    This change split off from https://github.com/facebook/rocksdb/issues/9058 because context-aware checksum is
    likely to be handled through different configuration than ChecksumType.
    
    Pull Request resolved: https://github.com/facebook/rocksdb/pull/9069
    
    Test Plan:
    tests updated, and substantially expanded. Unit tests now check
    that we don't accidentally change the values generated by the checksum
    algorithms ("schema test") and that we properly handle
    invalid/unrecognized checksum types in options or in file footer.
    
    DBTestBase::ChangeOptions (etc.) updated from two to one configuration
    changing from default CRC32c ChecksumType. The point of this test code
    is to detect possible interactions among features, and the likelihood of
    some bad interaction being detected by including configurations other
    than XXH3 and CRC32c--and then not detected by stress/crash test--is
    extremely low.
    
    Stress/crash test also updated (manual run long enough to see it accepts
    new checksum type). db_bench also updated for microbenchmarking
    checksums.
    
     ### Performance microbenchmark (PORTABLE=0 DEBUG_LEVEL=0, Broadwell processor)
    
    ./db_bench -benchmarks=crc32c,xxhash,xxhash64,xxh3,crc32c,xxhash,xxhash64,xxh3,crc32c,xxhash,xxhash64,xxh3
    crc32c       :       0.200 micros/op 5005220 ops/sec; 19551.6 MB/s (4096 per op)
    xxhash       :       0.807 micros/op 1238408 ops/sec; 4837.5 MB/s (4096 per op)
    xxhash64     :       0.421 micros/op 2376514 ops/sec; 9283.3 MB/s (4096 per op)
    xxh3         :       0.171 micros/op 5858391 ops/sec; 22884.3 MB/s (4096 per op)
    crc32c       :       0.206 micros/op 4859566 ops/sec; 18982.7 MB/s (4096 per op)
    xxhash       :       0.793 micros/op 1260850 ops/sec; 4925.2 MB/s (4096 per op)
    xxhash64     :       0.410 micros/op 2439182 ops/sec; 9528.1 MB/s (4096 per op)
    xxh3         :       0.161 micros/op 6202872 ops/sec; 24230.0 MB/s (4096 per op)
    crc32c       :       0.203 micros/op 4924686 ops/sec; 19237.1 MB/s (4096 per op)
    xxhash       :       0.839 micros/op 1192388 ops/sec; 4657.8 MB/s (4096 per op)
    xxhash64     :       0.424 micros/op 2357391 ops/sec; 9208.6 MB/s (4096 per op)
    xxh3         :       0.162 micros/op 6182678 ops/sec; 24151.1 MB/s (4096 per op)
    
    As you can see, especially once warmed up, xxh3 is fastest.
    
     ### Performance macrobenchmark (PORTABLE=0 DEBUG_LEVEL=0, Broadwell processor)
    
    Test
    
        for I in `seq 1 50`; do for CHK in 0 1 2 3 4; do TEST_TMPDIR=/dev/shm/rocksdb$CHK ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=30000000 -checksum_type=$CHK 2>&1 | grep 'micros/op' | tee -a results-$CHK & done; wait; done
    
    Results (ops/sec)
    
        for FILE in results*; do echo -n "$FILE "; awk '{ s += $5; c++; } END { print 1.0 * s / c; }' < $FILE; done
    
    results-0 252118 # kNoChecksum
    results-1 251588 # kCRC32c
    results-2 251863 # kxxHash
    results-3 252016 # kxxHash64
    results-4 252038 # kXXH3
    
    Reviewed By: mrambacher
    
    Differential Revision: D31905249
    
    Pulled By: pdillinger
    
    fbshipit-source-id: cb9b998ebe2523fc7c400eedf62124a78bf4b4d1
    a7d4bea4
db_test_util.h 38.2 KB