1. 22 12月, 2014 3 次提交
    • I
      Speed up FindObsoleteFiles() · 0acc7388
      Igor Canadi 提交于
      Summary:
      There are two versions of FindObsoleteFiles():
      * full scan, which is executed every 6 hours (and it's terribly slow)
      * no full scan, which is executed every time a background process finishes and iterator is deleted
      
      This diff is optimizing the second case (no full scan). Here's what we do before the diff:
      * Get the list of obsolete files (files with ref==0). Some files in obsolete_files set might actually be live.
      * Get the list of live files to avoid deleting files that are live.
      * Delete files that are in obsolete_files and not in live_files.
      
      After this diff:
      * The only files with ref==0 that are still live are files that have been part of move compaction. Don't include moved files in obsolete_files.
      * Get the list of obsolete files (which exclude moved files).
      * No need to get the list of live files, since all files in obsolete_files need to be deleted.
      
      I'll post the benchmark results, but you can get the feel of it here: https://reviews.facebook.net/D30123
      
      This depends on D30123.
      
      P.S. We should do full scan only in failure scenarios, not every 6 hours. I'll do this in a follow-up diff.
      
      Test Plan:
      One new unit test. Made sure that unit test fails if we don't have a `if (!f->moved)` safeguard in ~Version.
      
      make check
      
      Big number of compactions and flushes:
      
        ./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0  --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000
      
      Reviewers: yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D30249
      0acc7388
    • I
      Merge pull request #442 from alabid/alabid/fix-example-typo · d8c4ce6b
      Igor Canadi 提交于
      fix really trivial typo in column families example
      d8c4ce6b
    • A
      fix really trivial typo · 949bd71f
      alabid 提交于
      949bd71f
  2. 21 12月, 2014 1 次提交
    • I
      Fix a SIGSEGV in BackgroundFlush · f8999fcf
      Igor Canadi 提交于
      Summary:
      This one wasn't easy to find :)
      
      What happens is we go through all cfds on flush_queue_ and find no cfds to flush, *but* the cfd is set to the last CF we looped through and following code assumes we want it flushed.
      
      BTW @sdong do you think we should also make BackgroundFlush() only check a single cfd for flushing instead of doing this `while (!flush_queue_.empty())`?
      
      Test Plan: regression test no longer fails
      
      Reviewers: sdong, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: sdong, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D30591
      f8999fcf
  3. 20 12月, 2014 3 次提交
    • I
      MultiGet for DBWithTTL · ade4034a
      Igor Canadi 提交于
      Summary: This is a feature request from rocksdb's user. I didn't even realize we don't support multigets on TTL DB :)
      
      Test Plan: added a unit test
      
      Reviewers: yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D30561
      ade4034a
    • I
      Rewritten system for scheduling background work · fdb6be4e
      Igor Canadi 提交于
      Summary:
      When scaling to higher number of column families, the worst bottleneck was MaybeScheduleFlushOrCompaction(), which did a for loop over all column families while holding a mutex. This patch addresses the issue.
      
      The approach is similar to our earlier efforts: instead of a pull-model, where we do something for every column family, we can do a push-based model -- when we detect that column family is ready to be flushed/compacted, we add it to the flush_queue_/compaction_queue_. That way we don't need to loop over every column family in MaybeScheduleFlushOrCompaction.
      
      Here are the performance results:
      
      Command:
      
          ./db_bench --write_buffer_size=268435456 --db_write_buffer_size=268435456 --db=/fast-rocksdb-tmp/rocks_lots_of_cf --use_existing_db=0 --open_files=55000 --statistics=1 --histogram=1 --disable_data_sync=1 --max_write_buffer_number=2 --sync=0 --benchmarks=fillrandom --threads=16 --num_column_families=5000  --disable_wal=1 --max_background_flushes=16 --max_background_compactions=16 --level0_file_num_compaction_trigger=2 --level0_slowdown_writes_trigger=2 --level0_stop_writes_trigger=3 --hard_rate_limit=1 --num=33333333 --writes=33333333
      
      Before the patch:
      
           fillrandom   :      26.950 micros/op 37105 ops/sec;    4.1 MB/s
      
      After the patch:
      
            fillrandom   :      17.404 micros/op 57456 ops/sec;    6.4 MB/s
      
      Next bottleneck is VersionSet::AddLiveFiles, which is painfully slow when we have a lot of files. This is coming in the next patch, but when I removed that code, here's what I got:
      
            fillrandom   :       7.590 micros/op 131758 ops/sec;   14.6 MB/s
      
      Test Plan:
      make check
      
      two stress tests:
      
      Big number of compactions and flushes:
      
          ./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0  --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000
      
      max_background_flushes=0, to verify that this case also works correctly
      
          ./db_stress --threads=30 --ops_per_thread=2000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0  --reopen=3 --max_background_compactions=3 --max_background_flushes=0 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000
      
      Reviewers: ljin, rven, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D30123
      fdb6be4e
    • I
      Remove -mtune=native because it's redundant · a3001b1d
      Igor Canadi 提交于
      a3001b1d
  4. 19 12月, 2014 8 次提交
  5. 18 12月, 2014 3 次提交
  6. 17 12月, 2014 3 次提交
  7. 16 12月, 2014 6 次提交
  8. 15 12月, 2014 1 次提交
    • I
      Optimize default compile to compilation platform by default · 06eed650
      Igor Canadi 提交于
      Summary:
      This diff changes compile to optimize for native platform by default. This will automatically turn on crc32 optimizations for modern processors, which greatly improves rocksdb's performance.
      
      I also did some more changes to compilation documentation.
      
      Test Plan:
      compile with `make`, observe -march=native
      compile with `PORTABLE=1 make`, observe no -march=native
      
      Reviewers: sdong, rven, yhchiang, MarkCallaghan
      
      Reviewed By: MarkCallaghan
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D30225
      06eed650
  9. 13 12月, 2014 2 次提交
    • Q
      Added 'dump_live_files' command to ldb tool. · cef6f843
      Qiao Yang 提交于
      Summary:
      Priliminary diff to solicit comments.
      Given DB path, dump all SST files (key/value and properties), WAL file and manifest
      files. What command options do we need to support for this command? Maybe
      output_hex for keys?
      
      Test Plan: Create additional ldb unit tests.
      
      Reviewers: sdong, rven
      
      Reviewed By: rven
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D29547
      cef6f843
    • S
      Add an assert and avoid std::sort(autovector) to investigate an ASAN issue · 7ab1526c
      sdong 提交于
      Summary:
      ASAN build fails once for this error:
      
      14:04:52 ==== Test DBTest.CompactFilesOnLevelCompaction
      14:04:52 db_test: db/version_set.cc:1062: void rocksdb::VersionStorageInfo::AddFile(int, rocksdb::FileMetaData*): Assertion `level <= 0 || level_files->empty() || internal_comparator_->Compare( (*level_files)[level_files->size() - 1]->largest, f->smallest) < 0' failed.
      
      Not abling figure out reason. We use std:vector for sorting for save and add one more assert to help figure out whether it is the sorting's problem.
      
      Test Plan: make all check
      
      Reviewers: yhchiang, rven, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D30117
      7ab1526c
  10. 12 12月, 2014 2 次提交
  11. 11 12月, 2014 3 次提交
    • A
      Modifed the LRU cache eviction code so that it doesn't evict blocks which have exteranl references · ee95cae9
      Alexey Maykov 提交于
      Summary:
      Currently, blocks which have more than one reference (ie referenced by something other than cache itself) are evicted from cache. This doesn't make much sense:
      - blocks are still in RAM, so the RAM usage reported by the cache is incorrect
      - if the same block is needed by another iterator, it will be loaded and decompressed again
      
      This diff changes the reference counting scheme a bit. Previously, if the cache contained the block, this was accounted for in its refcount. After this change, the refcount is only used to track external references. There is a boolean flag which indicates whether or not the block is contained in the cache.
      This diff also changes how LRU list is used. Previously, both hashtable and the LRU list contained all blocks. After this change, the LRU list contains blocks with the refcount==0, ie those which can be evicted from the cache.
      
      Note that this change still allows for cache to grow beyond its capacity. This happens when all blocks are pinned (ie refcount>0). This is consistent with the current behavior. The cache's insert function never fails. I spent lots of time trying to make table_reader and other places work with the insert which might failed. It turned out to be pretty hard. It might really destabilize some customers, so finally, I decided against doing this.
      
      table_cache_remove_scan_count_limit option will be unneeded after this change, but I will remove it in the following diff, if this one gets approved
      
      Test Plan: Ran tests, made sure they pass
      
      Reviewers: sdong, ljin
      
      Differential Revision: https://reviews.facebook.net/D25503
      ee95cae9
    • S
      VersionBuilder to use unordered set and map to store added and deleted files · 0ab0242f
      sdong 提交于
      Summary: Set operations in VerisonBuilder is shown as a performance bottleneck of restarting DB when there are lots of files. Make both of added_files and deleted_files use unordered set or map. Only when adding the files, sort the added files.
      
      Test Plan: make all check
      
      Reviewers: yhchiang, rven, igor
      
      Reviewed By: igor
      
      Subscribers: hermanlee4, leveldb, dhruba, ljin
      
      Differential Revision: https://reviews.facebook.net/D30051
      0ab0242f
    • L
      add range scan test to benchmark script · e93f044d
      Lei Jin 提交于
      Summary: as title
      
      Test Plan: ran it
      
      Reviewers: yhchiang, igor, sdong, MarkCallaghan
      
      Reviewed By: MarkCallaghan
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D25563
      e93f044d
  12. 10 12月, 2014 1 次提交
    • I
      Fix #434 · cb82d7b0
      Igor Canadi 提交于
      Summary: Why do we assert here? This doesn't seem like user friendly thing to do :)
      
      Test Plan: none
      
      Reviewers: sdong, yhchiang, rven
      
      Reviewed By: rven
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D30027
      cb82d7b0
  13. 09 12月, 2014 3 次提交
  14. 06 12月, 2014 1 次提交