• I
    Rewritten system for scheduling background work · fdb6be4e
    Igor Canadi 提交于
    Summary:
    When scaling to higher number of column families, the worst bottleneck was MaybeScheduleFlushOrCompaction(), which did a for loop over all column families while holding a mutex. This patch addresses the issue.
    
    The approach is similar to our earlier efforts: instead of a pull-model, where we do something for every column family, we can do a push-based model -- when we detect that column family is ready to be flushed/compacted, we add it to the flush_queue_/compaction_queue_. That way we don't need to loop over every column family in MaybeScheduleFlushOrCompaction.
    
    Here are the performance results:
    
    Command:
    
        ./db_bench --write_buffer_size=268435456 --db_write_buffer_size=268435456 --db=/fast-rocksdb-tmp/rocks_lots_of_cf --use_existing_db=0 --open_files=55000 --statistics=1 --histogram=1 --disable_data_sync=1 --max_write_buffer_number=2 --sync=0 --benchmarks=fillrandom --threads=16 --num_column_families=5000  --disable_wal=1 --max_background_flushes=16 --max_background_compactions=16 --level0_file_num_compaction_trigger=2 --level0_slowdown_writes_trigger=2 --level0_stop_writes_trigger=3 --hard_rate_limit=1 --num=33333333 --writes=33333333
    
    Before the patch:
    
         fillrandom   :      26.950 micros/op 37105 ops/sec;    4.1 MB/s
    
    After the patch:
    
          fillrandom   :      17.404 micros/op 57456 ops/sec;    6.4 MB/s
    
    Next bottleneck is VersionSet::AddLiveFiles, which is painfully slow when we have a lot of files. This is coming in the next patch, but when I removed that code, here's what I got:
    
          fillrandom   :       7.590 micros/op 131758 ops/sec;   14.6 MB/s
    
    Test Plan:
    make check
    
    two stress tests:
    
    Big number of compactions and flushes:
    
        ./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0  --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000
    
    max_background_flushes=0, to verify that this case also works correctly
    
        ./db_stress --threads=30 --ops_per_thread=2000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0  --reopen=3 --max_background_compactions=3 --max_background_flushes=0 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000
    
    Reviewers: ljin, rven, yhchiang, sdong
    
    Reviewed By: sdong
    
    Subscribers: dhruba, leveldb
    
    Differential Revision: https://reviews.facebook.net/D30123
    fdb6be4e
db_impl.h 25.4 KB