1. 23 8月, 2013 2 次提交
    • X
      Pull from https://reviews.facebook.net/D10917 · 17dc1280
      Xing Jin 提交于
      Summary: Pull Mark's patch and slightly revise it. I revised another place in db_impl.cc with similar new formula.
      
      Test Plan:
      make all check. Also run "time ./db_bench --num=2500000000 --numdistinct=2200000000". It has run for 20+ hours and hasn't finished. Looks good so far:
      
      Installed stack trace handler for SIGILL SIGSEGV SIGBUS SIGABRT
      LevelDB:    version 2.0
      Date:       Tue Aug 20 23:11:55 2013
      CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
      CPUCache:   20480 KB
      Keys:       16 bytes each
      Values:     100 bytes each (50 bytes after compression)
      Entries:    2500000000
      RawSize:    276565.6 MB (estimated)
      FileSize:   157356.3 MB (estimated)
      Write rate limit: 0
      Compression: snappy
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      DB path: [/tmp/leveldbtest-3088/dbbench]
      fillseq      :    7202.000 micros/op 138 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      fillsync     :    7148.000 micros/op 139 ops/sec; (2500000 ops)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      fillrandom   :    7105.000 micros/op 140 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      overwrite    :    6930.000 micros/op 144 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readrandom   :       1.020 micros/op 980507 ops/sec; (0 of 2500000000 found)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readrandom   :       1.021 micros/op 979620 ops/sec; (0 of 2500000000 found)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readseq      :     113.000 micros/op 8849 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readreverse  :     102.000 micros/op 9803 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      Created bg thread 0x7f0ac17f7700
      compact      :  111701.000 micros/op 8 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readrandom   :       1.020 micros/op 980376 ops/sec; (0 of 2500000000 found)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readseq      :     120.000 micros/op 8333 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readreverse  :      29.000 micros/op 34482 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      ... finished 618100000 ops
      
      Reviewers: MarkCallaghan, haobo, dhruba, chip
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D12441
      17dc1280
    • S
      Add APIs to query SST file metadata and to delete specific SST files · 60bf2b7d
      Simha Venkataramaiah 提交于
      Summary: An api to query the level, key ranges, size etc for each SST file and an api to delete a specific file from the db and all associated state in the bookkeeping datastructures.
      
      Notes: Editing the manifest version does not release the obsolete files right away. However deleting the file directly will mess up the iterator. We may need a more aggressive/timely file deletion api.
      
      I have used std::unique_ptr - will switch to boost:: since this is external. thoughts?
      
      Unit test is fragile right now as it expects the compaction at certain levels.
      
      Test Plan: unittest
      
      Reviewers: dhruba, vamsi, emayanke
      
      CC: zshao, leveldb, haobo
      
      Task ID: #
      
      Blame Rev:
      60bf2b7d
  2. 21 8月, 2013 1 次提交
  3. 20 8月, 2013 1 次提交
  4. 16 8月, 2013 1 次提交
    • M
      Expose statistic for sequence number and implement setTickerCount · 387ac0f1
      Mayank Agarwal 提交于
      Summary: statistic for sequence number is needed by wormhole. setTickerCount is demanded for this statistic. I can't simply recordTick(max_sequence) when db recovers because the statistic iobject is owned by client and may/may not be reset during reopen. Eg. statistic is reset in mcrocksdb whereas it is not in db_stress. Therefore it is best to go with setTickerCount
      
      Test Plan: ./db_stress ... --statistics=1 and observed expected sequence number
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12327
      387ac0f1
  5. 15 8月, 2013 1 次提交
    • X
      Minor fix to current codes · 0a5afd1a
      Xing Jin 提交于
      Summary:
      Minor fix to current codes, including: coding style, output format,
      comments. No major logic change. There are only 2 real changes, please see my inline comments.
      
      Test Plan: make all check
      
      Reviewers: haobo, dhruba, emayanke
      
      Differential Revision: https://reviews.facebook.net/D12297
      0a5afd1a
  6. 14 8月, 2013 3 次提交
    • M
      Counter for merge failure · f1bf1694
      Mayank Agarwal 提交于
      Summary:
      With Merge returning bool, it can keep failing silently(eg. While faling to fetch timestamp in TTL). We need to detect this through a rocksdb counter which can get bumped whenever Merge returns false. This will also be super-useful for the mcrocksdb-counter service where Merge may fail.
      Added a counter NUMBER_MERGE_FAILURES and appropriately updated db/merge_helper.cc
      
      I felt that it would be better to directly add counter-bumping in Merge as a default function of MergeOperator class but user should not be aware of this, so this approach seems better to me.
      
      Test Plan: make all check
      
      Reviewers: dnicholas, haobo, dhruba, vamsi
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12129
      f1bf1694
    • T
      Prefix filters for scans (v4) · f5f18422
      Tyler Harter 提交于
      Summary: Similar to v2 (db and table code understands prefixes), but use ReadOptions as in v3.  Also, make the CreateFilter code faster and cleaner.
      
      Test Plan: make db_test; export LEVELDB_TESTS=PrefixScan; ./db_test
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: haobo, emayanke
      
      Differential Revision: https://reviews.facebook.net/D12027
      f5f18422
    • S
      Separate compaction filter for each compaction · 3b81df34
      sumeet 提交于
      Summary:
      If we have same compaction filter for each compaction,
      application cannot know about the different compaction processes.
      Later on, we can put in more details in compaction filter for the
      application to consume and use it according to its needs. For e.g. In
      the universal compaction, we have a compaction process involving all the
      files while others don't involve all the files. Applications may want to
      collect some stats only when during full compaction.
      
      Test Plan: run existing unit tests
      
      Reviewers: haobo, dhruba
      
      Reviewed By: dhruba
      
      CC: xinyaohu, leveldb
      
      Differential Revision: https://reviews.facebook.net/D12057
      3b81df34
  7. 10 8月, 2013 2 次提交
    • D
      Universal Compaction should keep DeleteMarkers unless it is the earliest file. · 93d77a27
      Dhruba Borthakur 提交于
      Summary:
      The pre-existing code was purging a DeleteMarker if thay key did not
      exist in deeper levels.  But in the Universal Compaction Style, all
      files are in Level0. For compaction runs that did not include the
      earliest file, we were erroneously purging the DeleteMarkers.
      
      The fix is to purge DeleteMarkers only if the compaction includes
      the earlist file.
      
      Test Plan: DBTest.Randomized triggers this code path.
      
      Differential Revision: https://reviews.facebook.net/D12081
      93d77a27
    • X
      Fix unit tests for universal compaction (step 2) · 8ae905ed
      Xing Jin 提交于
      Summary:
      Continue fixing existing unit tests for universal compaction. I have
      tried to apply universal compaction to all unit tests those haven't
      called ChangeOptions(). I left a few which are either apparently not
      applicable to universal compaction (because they check files/keys/values
      at level 1 or above levels), or apparently not related to compaction
      (e.g., open a file, open a db).
      
      I also add a new unit test for universal compaction.
      
      Good news is I didn't see any bugs during this round.
      
      Test Plan: Ran "make all check" yesterday. Has rebased and is rerunning
      
      Reviewers: haobo, dhruba
      
      Differential Revision: https://reviews.facebook.net/D12135
      8ae905ed
  8. 08 8月, 2013 1 次提交
    • X
      Fix unit tests/bugs for universal compaction (first step) · 17b8f786
      Xing Jin 提交于
      Summary:
      This is the first step to fix unit tests and bugs for universal
      compactiion. I added universal compaction option to ChangeOptions(), and
      fixed all unit tests calling ChangeOptions(). Some of these tests
      obviously assume more than 1 level and check file number/values in level
      1 or above levels. I set kSkipUniversalCompaction for these tests.
      
      The major bug I found is manual compaction with universal compaction never stops. I have put a fix for
      it.
      
      I have also set universal compaction as the default compaction and found
      at least 20+ unit tests failing. I haven't looked into the details. The
      next step is to check all unit tests without calling ChangeOptions().
      
      Test Plan: make all check
      
      Reviewers: dhruba, haobo
      
      Differential Revision: https://reviews.facebook.net/D12051
      17b8f786
  9. 06 8月, 2013 3 次提交
    • D
      [RocksDB] [MergeOperator] The new Merge Interface! Uses merge sequences. · c2d7826c
      Deon Nicholas 提交于
      Summary:
      Here are the major changes to the Merge Interface. It has been expanded
      to handle cases where the MergeOperator is not associative. It does so by stacking
      up merge operations while scanning through the key history (i.e.: during Get() or
      Compaction), until a valid Put/Delete/end-of-history is encountered; it then
      applies all of the merge operations in the correct sequence starting with the
      base/sentinel value.
      
      I have also introduced an "AssociativeMerge" function which allows the user to
      take advantage of associative merge operations (such as in the case of counters).
      The implementation will always attempt to merge the operations/operands themselves
      together when they are encountered, and will resort to the "stacking" method if
      and only if the "associative-merge" fails.
      
      This implementation is conjectured to allow MergeOperator to handle the general
      case, while still providing the user with the ability to take advantage of certain
      efficiencies in their own merge-operator / data-structure.
      
      NOTE: This is a preliminary diff. This must still go through a lot of review,
      revision, and testing. Feedback welcome!
      
      Test Plan:
        -This is a preliminary diff. I have only just begun testing/debugging it.
        -I will be testing this with the existing MergeOperator use-cases and unit-tests
      (counters, string-append, and redis-lists)
        -I will be "desk-checking" and walking through the code with the help gdb.
        -I will find a way of stress-testing the new interface / implementation using
      db_bench, db_test, merge_test, and/or db_stress.
        -I will ensure that my tests cover all cases: Get-Memtable,
      Get-Immutable-Memtable, Get-from-Disk, Iterator-Range-Scan, Flush-Memtable-to-L0,
      Compaction-L0-L1, Compaction-Ln-L(n+1), Put/Delete found, Put/Delete not-found,
      end-of-history, end-of-file, etc.
        -A lot of feedback from the reviewers.
      
      Reviewers: haobo, dhruba, zshao, emayanke
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11499
      c2d7826c
    • J
      Add soft_rate_limit stats · 8e792e58
      Jim Paton 提交于
      Summary: This diff adds histogram stats for soft_rate_limit stalls. It also renames the old rate_limit stats to hard_rate_limit.
      
      Test Plan: make -j32 check
      
      Reviewers: dhruba, haobo, MarkCallaghan
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12021
      8e792e58
    • J
      Add soft and hard rate limit support · 1036537c
      Jim Paton 提交于
      Summary:
      This diff adds support for both soft and hard rate limiting. The following changes are included:
      
      1) Options.rate_limit is renamed to Options.hard_rate_limit.
      2) Options.rate_limit_delay_milliseconds is renamed to Options.rate_limit_delay_max_milliseconds.
      3) Options.soft_rate_limit is added.
      4) If the maximum compaction score is > hard_rate_limit and rate_limit_delay_max_milliseconds == 0, then writes are delayed by 1 ms at a time until the max compaction score falls below hard_rate_limit.
      5) If the max compaction score is > soft_rate_limit but <= hard_rate_limit, then writes are delayed by 0-1 ms depending on how close we are to hard_rate_limit.
      6) Users can disable 4 by setting hard_rate_limit = 0. They can add a limit to the maximum amount of time waited by setting rate_limit_delay_max_milliseconds > 0. Thus, the old behavior can be preserved by setting soft_rate_limit = 0, which is the default.
      
      Test Plan:
      make -j32 check
      ./db_stress
      
      Reviewers: dhruba, haobo, MarkCallaghan
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12003
      1036537c
  10. 02 8月, 2013 1 次提交
    • M
      Expand KeyMayExist to return the proper value if it can be found in memory and... · 59d0b02f
      Mayank Agarwal 提交于
      Expand KeyMayExist to return the proper value if it can be found in memory and also check block_cache
      
      Summary: Removed KeyMayExistImpl because KeyMayExist demanded Get like semantics now. Removed no_io from memtable and imm because we need the proper value now and shouldn't just stop when we see Merge in memtable. Added checks to block_cache. Updated documentation and unit-test
      
      Test Plan: make all check;db_stress for 1 hour
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11853
      59d0b02f
  11. 01 8月, 2013 2 次提交
    • J
      Slow down writes gradually rather than suddenly · 9700677a
      Jim Paton 提交于
      Summary:
      Currently, when a certain number of level0 files (level0_slowdown_writes_trigger) are present, RocksDB will slow down each write by 1ms. There is a second limit of level0 files at which RocksDB will stop writes altogether (level0_stop_writes_trigger).
      
      This patch enables the user to supply a third parameter specifying the number of files at which Rocks will start slowing down writes (level0_start_slowdown_writes). When this number is reached, Rocks will slow down writes as a quadratic function of level0_slowdown_writes_trigger - num_level0_files.
      
      For some workloads, this improves latency and throughput. I will post some stats momentarily in https://our.intern.facebook.com/intern/tasks/?t=2613384.
      
      Test Plan:
      make -j32 check
      ./db_stress
      ./db_bench
      
      Reviewers: dhruba, haobo, MarkCallaghan, xjin
      
      Reviewed By: xjin
      
      CC: leveldb, xjin, zshao
      
      Differential Revision: https://reviews.facebook.net/D11859
      9700677a
    • X
      Make arena block size configurable · 0f0a24e2
      Xing Jin 提交于
      Summary:
      Add an option for arena block size, default value 4096 bytes. Arena will allocate blocks with such size.
      
      I am not sure about passing parameter to skiplist in the new virtualized framework, though I talked to Jim a bit. So add Jim as reviewer.
      
      Test Plan:
      new unit test, I am running db_test.
      
      For passing paramter from configured option to Arena, I tried tests like:
      
        TEST(DBTest, Arena_Option) {
        std::string dbname = test::TmpDir() + "/db_arena_option_test";
        DestroyDB(dbname, Options());
      
        DB* db = nullptr;
        Options opts;
        opts.create_if_missing = true;
        opts.arena_block_size = 1000000; // tested 99, 999999
        Status s = DB::Open(opts, dbname, &db);
        db->Put(WriteOptions(), "a", "123");
        }
      
      and printed some debug info. The results look good. Any suggestion for such a unit-test?
      
      Reviewers: haobo, dhruba, emayanke, jpaton
      
      Reviewed By: dhruba
      
      CC: leveldb, zshao
      
      Differential Revision: https://reviews.facebook.net/D11799
      0f0a24e2
  12. 30 7月, 2013 2 次提交
    • J
      Don't use redundant Env::NowMicros() calls · 6db52b52
      Jim Paton 提交于
      Summary: After my patch for stall histograms, there are redundant calls to NowMicros() by both the stop watches and DBImpl::MakeRoomForWrites. So I removed the redundant calls such that the information is gotten from the stopwatch.
      
      Test Plan:
      make clean
      make -j32 check
      
      Reviewers: dhruba, haobo, MarkCallaghan
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11883
      6db52b52
    • J
      Add stall counts to statistics · 18afff2e
      Jim Paton 提交于
      Summary: Previously, statistics are kept on how much time is spent on stalls of different types. This patch adds support for keeping number of stalls of each type. For example, instead of just reporting how many microseconds are spent waiting for memtables to be compacted, it will also report how many times a write stalled for that to occur.
      
      Test Plan:
      make -j32 check
      ./db_stress
      
      # Not really sure what else should be done...
      
      Reviewers: dhruba, MarkCallaghan, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11841
      18afff2e
  13. 25 7月, 2013 1 次提交
    • D
      Revert 6fbe4e98: If disable wal is set, then... · d7ba5bce
      Dhruba Borthakur 提交于
      Revert 6fbe4e98: If disable wal is set, then batch commits are avoided
      
      Summary:
      Revert "If disable wal is set, then batch commits are avoided" because
      keeping the mutex while inserting into the skiplist means that readers
      and writes are all serialized on the mutex.
      
      Test Plan:
      
      Reviewers:
      
      CC:
      
      Task ID: #
      
      Blame Rev:
      d7ba5bce
  14. 24 7月, 2013 3 次提交
    • J
      Virtualize SkipList Interface · 52d7ecfc
      Jim Paton 提交于
      Summary: This diff virtualizes the skiplist interface so that users can provide their own implementation of a backing store for MemTables. Eventually, the backing store will be responsible for its own synchronization, allowing users (and us) to experiment with different lockless implementations.
      
      Test Plan:
      make clean
      make -j32 check
      ./db_stress
      
      Reviewers: dhruba, emayanke, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11739
      52d7ecfc
    • D
      If disable wal is set, then batch commits are avoided. · 6fbe4e98
      Dhruba Borthakur 提交于
      Summary:
      rocksdb uses batch commit to write to transaction log. But if
      disable wal is set, then writes to transaction log are anyways
      avoided. In this case, there is not much value-add to batch things,
      batching can cause unnecessary delays to Puts().
      This patch avoids batching when disableWal is set.
      
      Test Plan:
      make check.
      
      I am running db_stress now.
      
      Reviewers: haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11763
      6fbe4e98
    • M
      Use KeyMayExist for WriteBatch-Deletes · bf66c10b
      Mayank Agarwal 提交于
      Summary:
      Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete.
      Added code to skip getting Table from disk if not already present in table_cache.
      Some renaming of variables.
      Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch.
      Changed KeyMayExist to not be pure virtual and provided a default implementation.
      Expanded unit-tests in db_test to check appropriately.
      Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1.
      
      Test Plan: db_stress;make check
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D11745
      bf66c10b
  15. 23 7月, 2013 1 次提交
  16. 20 7月, 2013 1 次提交
  17. 12 7月, 2013 1 次提交
    • M
      Make rocksdb-deletes faster using bloom filter · 2a986919
      Mayank Agarwal 提交于
      Summary:
      Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving:
      1. Put of delete type
      2. Space in the db,and
      3. Compaction time
      
      Test Plan:
      make all check;
      will run db_stress and db_bench and enhance unit-test once the basic design gets approved
      
      Reviewers: dhruba, haobo, vamsi
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11607
      2a986919
  18. 09 7月, 2013 1 次提交
    • H
      [RocksDB] Provide contiguous sequence number even in case of write failure · 9ba82786
      Haobo Xu 提交于
      Summary: Replication logic would be simplifeid if we can guarantee that write sequence number is always contiguous, even if write failure occurs. Dhruba and I looked at the sequence number generation part of the code. It seems fixable. Note that if WAL was successful and insert into memtable was not, we would be in an unfortunate state. The approach in this diff is : IO error is expected and error status will be returned to client, sequence number will not be advanced; In-mem error is not expected and we panic.
      
      Test Plan: make check; db_stress
      
      Reviewers: dhruba, sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11439
      9ba82786
  19. 04 7月, 2013 1 次提交
  20. 01 7月, 2013 2 次提交
    • D
      Reduce write amplification by merging files in L0 back into L0 · 47c4191f
      Dhruba Borthakur 提交于
      Summary:
      There is a new option called hybrid_mode which, when switched on,
      causes HBase style compactions.  Files from L0 are
      compacted back into L0. This meat of this compaction algorithm
      is in PickCompactionHybrid().
      
      All files reside in L0. That means all files have overlapping
      keys. Each file has a time-bound, i.e. each file contains a
      range of keys that were inserted around the same time. The
      start-seqno and the end-seqno refers to the timeframe when
      these keys were inserted.  Files that have contiguous seqno
      are compacted together into a larger file. All files are
      ordered from most recent to the oldest.
      
      The current compaction algorithm starts to look for
      candidate files starting from the most recent file. It continues to
      add more files to the same compaction run as long as the
      sum of the files chosen till now is smaller than the next
      candidate file size. This logic needs to be debated
      and validated.
      
      The above logic should reduce write amplification to a
      large extent... will publish numbers shortly.
      
      Test Plan: dbstress runs for 6 hours with no data corruption (tested so far).
      
      Differential Revision: https://reviews.facebook.net/D11289
      47c4191f
    • D
      Reduce write amplification by merging files in L0 back into L0 · 554c06dd
      Dhruba Borthakur 提交于
      Summary:
      There is a new option called hybrid_mode which, when switched on,
      causes HBase style compactions.  Files from L0 are
      compacted back into L0. This meat of this compaction algorithm
      is in PickCompactionHybrid().
      
      All files reside in L0. That means all files have overlapping
      keys. Each file has a time-bound, i.e. each file contains a
      range of keys that were inserted around the same time. The
      start-seqno and the end-seqno refers to the timeframe when
      these keys were inserted.  Files that have contiguous seqno
      are compacted together into a larger file. All files are
      ordered from most recent to the oldest.
      
      The current compaction algorithm starts to look for
      candidate files starting from the most recent file. It continues to
      add more files to the same compaction run as long as the
      sum of the files chosen till now is smaller than the next
      candidate file size. This logic needs to be debated
      and validated.
      
      The above logic should reduce write amplification to a
      large extent... will publish numbers shortly.
      
      Test Plan: dbstress runs for 6 hours with no data corruption (tested so far).
      
      Differential Revision: https://reviews.facebook.net/D11289
      554c06dd
  21. 19 6月, 2013 2 次提交
  22. 15 6月, 2013 1 次提交
  23. 13 6月, 2013 2 次提交
    • D
      [Rocksdb] [Multiget] Introduced multiget into db_bench · 4985a9f7
      Deon Nicholas 提交于
      Summary:
      Preliminary! Introduced the --use_multiget=1 and --keys_per_multiget=n
      flags for db_bench. Also updated and tested the ReadRandom() method
      to include an option to use multiget. By default,
      keys_per_multiget=100.
      
      Preliminary tests imply that multiget is at least 1.25x faster per
      key than regular get.
      
      Will continue adding Multiget for ReadMissing, ReadHot,
      RandomWithVerify, ReadRandomWriteRandom; soon. Will also think
      about ways to better verify benchmarks.
      
      Test Plan:
      1. make db_bench
      2. ./db_bench --benchmarks=fillrandom
      3. ./db_bench --benchmarks=readrandom --use_existing_db=1
      	      --use_multiget=1 --threads=4 --keys_per_multiget=100
      4. ./db_bench --benchmarks=readrandom --use_existing_db=1
      	      --threads=4
      5. Verify ops/sec (and 1000000 of 1000000 keys found)
      
      Reviewers: haobo, MarkCallaghan, dhruba
      
      Reviewed By: MarkCallaghan
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11127
      4985a9f7
    • H
      [RocksDB] cleanup EnvOptions · bdf10859
      Haobo Xu 提交于
      Summary:
      This diff simplifies EnvOptions by treating it as POD, similar to Options.
      - virtual functions are removed and member fields are accessed directly.
      - StorageOptions is removed.
      - Options.allow_readahead and Options.allow_readahead_compactions are deprecated.
      - Unused global variables are removed: useOsBuffer, useFsReadAhead, useMmapRead, useMmapWrite
      
      Test Plan: make check; db_stress
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11175
      bdf10859
  24. 10 6月, 2013 1 次提交
    • M
      Max_mem_compaction_level can have maximum value of num_levels-1 · 184343a0
      Mayank Agarwal 提交于
      Summary:
      Without this files could be written out to a level greater than the maximum level possible and is the source of the segfaults that wormhole awas getting. The sequence of steps that was followed:
      1. WriteLevel0Table was called when memtable was to be flushed for a file.
      2. PickLevelForMemTableOutput was called to determine the level to which this file should be pushed.
      3. PickLevelForMemTableOutput returned a wrong result because max_mem_compaction_level was equal to 2 even when num_levels was equal to 0.
      The fix to re-initialize max_mem_compaction_level based on num_levels passed seems correct.
      
      Test Plan: make all check; Also made a dummy file to mimic the wormhole-file behaviour which was causing the segfaults and found that the same segfault occurs without this change and not with this.
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11157
      184343a0
  25. 06 6月, 2013 2 次提交
    • D
      Very basic Multiget and simple test cases. · d8c7c45e
      Deon Nicholas 提交于
      Summary:
      Implemented the MultiGet operator which takes in a list of keys
      and returns their associated values. Currently uses std::vector as its
      container data structure. Otherwise, it works identically to "Get".
      
      Test Plan:
       1. make db_test      ; compile it
       2. ./db_test         ; test it
       3. make all check    ; regress / run all tests
       4. make release      ; (optional) compile with release settings
      
      Reviewers: haobo, MarkCallaghan, dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D10875
      d8c7c45e
    • A
      [Rocksdb] Measure all FSYNC/SYNC times · d91b42ee
      Abhishek Kona 提交于
      Summary: Add stop watches around all sync calls.
      
      Test Plan: db_bench check if respective histograms are printed
      
      Reviewers: haobo, dhruba
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11073
      d91b42ee
  26. 04 6月, 2013 1 次提交
    • M
      Improve output for GetProperty('leveldb.stats') · d9f538e1
      Mark Callaghan 提交于
      Summary:
      Display separate values for read, write & total compaction IO.
      Display compaction amplification and write amplification.
      Add similar values for the period since the last call to GetProperty. Results since the server started
      are reported as "cumulative" stats. Results since the last call to GetProperty are reported as
      "interval" stats.
      
      Level  Files Size(MB) Time(sec)  Read(MB) Write(MB)    Rn(MB)  Rnp1(MB)  Wnew(MB) Amplify Read(MB/s) Write(MB/s)      Rn     Rnp1     Wnp1     NewW    Count  Ln-stall
      ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
        0        7       13        21         0       211         0         0       211     0.0       0.0        10.1        0        0        0        0      113       0.0
        1       79      157        88       993       989       198       795       194     9.0      11.3        11.2      106      405      502       97       14       0.0
        2       19       36         5        63        63        37        27        36     2.4      12.3        12.2       19       14       32       18       12       0.0
      >>>>>>>>>>>>>>>>>>>>>>>>> text below has been is new and/or reformatted
      Uptime(secs): 122.2 total, 0.9 interval
      Compaction IO cumulative (GB): 0.21 new, 1.03 read, 1.23 write, 2.26 read+write
      Compaction IO cumulative (MB/sec): 1.7 new, 8.6 read, 10.3 write, 19.0 read+write
      Amplification cumulative: 6.0 write, 11.0 compaction
      Compaction IO interval (MB): 5.59 new, 0.00 read, 5.59 write, 5.59 read+write
      Compaction IO interval (MB/sec): 6.5 new, 0.0 read, 6.5 write, 6.5 read+write
      Amplification interval: 1.0 write, 1.0 compaction
      >>>>>>>>>>>>>>>>>>>>>>>> text above is new and/or reformatted
      Stalls(secs): 90.574 level0_slowdown, 0.000 level0_numfiles, 10.165 memtable_compaction, 0.000 leveln_slowdown
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      make check, run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11049
      d9f538e1