1. 09 8月, 2013 1 次提交
  2. 07 8月, 2013 3 次提交
  3. 06 8月, 2013 7 次提交
    • D
      [RocksDB] [MergeOperator] The new Merge Interface! Uses merge sequences. · c2d7826c
      Deon Nicholas 提交于
      Summary:
      Here are the major changes to the Merge Interface. It has been expanded
      to handle cases where the MergeOperator is not associative. It does so by stacking
      up merge operations while scanning through the key history (i.e.: during Get() or
      Compaction), until a valid Put/Delete/end-of-history is encountered; it then
      applies all of the merge operations in the correct sequence starting with the
      base/sentinel value.
      
      I have also introduced an "AssociativeMerge" function which allows the user to
      take advantage of associative merge operations (such as in the case of counters).
      The implementation will always attempt to merge the operations/operands themselves
      together when they are encountered, and will resort to the "stacking" method if
      and only if the "associative-merge" fails.
      
      This implementation is conjectured to allow MergeOperator to handle the general
      case, while still providing the user with the ability to take advantage of certain
      efficiencies in their own merge-operator / data-structure.
      
      NOTE: This is a preliminary diff. This must still go through a lot of review,
      revision, and testing. Feedback welcome!
      
      Test Plan:
        -This is a preliminary diff. I have only just begun testing/debugging it.
        -I will be testing this with the existing MergeOperator use-cases and unit-tests
      (counters, string-append, and redis-lists)
        -I will be "desk-checking" and walking through the code with the help gdb.
        -I will find a way of stress-testing the new interface / implementation using
      db_bench, db_test, merge_test, and/or db_stress.
        -I will ensure that my tests cover all cases: Get-Memtable,
      Get-Immutable-Memtable, Get-from-Disk, Iterator-Range-Scan, Flush-Memtable-to-L0,
      Compaction-L0-L1, Compaction-Ln-L(n+1), Put/Delete found, Put/Delete not-found,
      end-of-history, end-of-file, etc.
        -A lot of feedback from the reviewers.
      
      Reviewers: haobo, dhruba, zshao, emayanke
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11499
      c2d7826c
    • M
      Fix build · 73f9518b
      Mayank Agarwal 提交于
      Summary: remove reference
      
      Test Plan: make OPT=-g
      
      Reviewers:
      
      CC:
      
      Task ID: #
      
      Blame Rev:
      73f9518b
    • J
      Add soft_rate_limit stats · 8e792e58
      Jim Paton 提交于
      Summary: This diff adds histogram stats for soft_rate_limit stalls. It also renames the old rate_limit stats to hard_rate_limit.
      
      Test Plan: make -j32 check
      
      Reviewers: dhruba, haobo, MarkCallaghan
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12021
      8e792e58
    • M
      Expose base db object from ttl wrapper · 1d7b4765
      Mayank Agarwal 提交于
      Summary: rocksdb replicaiton will need this when writing value+TS from master to slave 'as is'
      
      Test Plan: make
      
      Reviewers: dhruba, vamsi, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11919
      1d7b4765
    • J
      Add soft and hard rate limit support · 1036537c
      Jim Paton 提交于
      Summary:
      This diff adds support for both soft and hard rate limiting. The following changes are included:
      
      1) Options.rate_limit is renamed to Options.hard_rate_limit.
      2) Options.rate_limit_delay_milliseconds is renamed to Options.rate_limit_delay_max_milliseconds.
      3) Options.soft_rate_limit is added.
      4) If the maximum compaction score is > hard_rate_limit and rate_limit_delay_max_milliseconds == 0, then writes are delayed by 1 ms at a time until the max compaction score falls below hard_rate_limit.
      5) If the max compaction score is > soft_rate_limit but <= hard_rate_limit, then writes are delayed by 0-1 ms depending on how close we are to hard_rate_limit.
      6) Users can disable 4 by setting hard_rate_limit = 0. They can add a limit to the maximum amount of time waited by setting rate_limit_delay_max_milliseconds > 0. Thus, the old behavior can be preserved by setting soft_rate_limit = 0, which is the default.
      
      Test Plan:
      make -j32 check
      ./db_stress
      
      Reviewers: dhruba, haobo, MarkCallaghan
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12003
      1036537c
    • M
      Support user's compaction filter in TTL logic · cacd812f
      Mayank Agarwal 提交于
      Summary: TTL uses compaction filter to purge key-values and required the user to not pass one. This diff makes it accommodating of user's compaciton filter. Added test to ttl_test
      
      Test Plan: make; ./ttl_test
      
      Reviewers: dhruba, haobo, vamsi
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11973
      cacd812f
    • M
      Changing Makefile to have rocksdb instead of leveldb in binary-names · 7c9093ab
      Mayank Agarwal 提交于
      Summary: did a find-replace
      
      Test Plan: make
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11979
      7c9093ab
  4. 02 8月, 2013 2 次提交
    • M
      Merge operator for ttl · c42485f6
      Mayank Agarwal 提交于
      Summary: Implemented a TtlMergeOperator class which inherits from MergeOperator and is TTL aware. It strips out timestamp from existing_value and attaches timestamp to new_value, calling user-provided-Merge in between.
      
      Test Plan: make all check
      
      Reviewers: haobo, dhruba
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11775
      c42485f6
    • M
      Expand KeyMayExist to return the proper value if it can be found in memory and... · 59d0b02f
      Mayank Agarwal 提交于
      Expand KeyMayExist to return the proper value if it can be found in memory and also check block_cache
      
      Summary: Removed KeyMayExistImpl because KeyMayExist demanded Get like semantics now. Removed no_io from memtable and imm because we need the proper value now and shouldn't just stop when we see Merge in memtable. Added checks to block_cache. Updated documentation and unit-test
      
      Test Plan: make all check;db_stress for 1 hour
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11853
      59d0b02f
  5. 01 8月, 2013 2 次提交
    • J
      Slow down writes gradually rather than suddenly · 9700677a
      Jim Paton 提交于
      Summary:
      Currently, when a certain number of level0 files (level0_slowdown_writes_trigger) are present, RocksDB will slow down each write by 1ms. There is a second limit of level0 files at which RocksDB will stop writes altogether (level0_stop_writes_trigger).
      
      This patch enables the user to supply a third parameter specifying the number of files at which Rocks will start slowing down writes (level0_start_slowdown_writes). When this number is reached, Rocks will slow down writes as a quadratic function of level0_slowdown_writes_trigger - num_level0_files.
      
      For some workloads, this improves latency and throughput. I will post some stats momentarily in https://our.intern.facebook.com/intern/tasks/?t=2613384.
      
      Test Plan:
      make -j32 check
      ./db_stress
      ./db_bench
      
      Reviewers: dhruba, haobo, MarkCallaghan, xjin
      
      Reviewed By: xjin
      
      CC: leveldb, xjin, zshao
      
      Differential Revision: https://reviews.facebook.net/D11859
      9700677a
    • X
      Make arena block size configurable · 0f0a24e2
      Xing Jin 提交于
      Summary:
      Add an option for arena block size, default value 4096 bytes. Arena will allocate blocks with such size.
      
      I am not sure about passing parameter to skiplist in the new virtualized framework, though I talked to Jim a bit. So add Jim as reviewer.
      
      Test Plan:
      new unit test, I am running db_test.
      
      For passing paramter from configured option to Arena, I tried tests like:
      
        TEST(DBTest, Arena_Option) {
        std::string dbname = test::TmpDir() + "/db_arena_option_test";
        DestroyDB(dbname, Options());
      
        DB* db = nullptr;
        Options opts;
        opts.create_if_missing = true;
        opts.arena_block_size = 1000000; // tested 99, 999999
        Status s = DB::Open(opts, dbname, &db);
        db->Put(WriteOptions(), "a", "123");
        }
      
      and printed some debug info. The results look good. Any suggestion for such a unit-test?
      
      Reviewers: haobo, dhruba, emayanke, jpaton
      
      Reviewed By: dhruba
      
      CC: leveldb, zshao
      
      Differential Revision: https://reviews.facebook.net/D11799
      0f0a24e2
  6. 30 7月, 2013 4 次提交
    • D
      Fix README contents. · 542cc10b
      Dhruba Borthakur 提交于
      Summary:
      Fix README contents.
      
      Test Plan:
      
      Reviewers:
      
      CC:
      
      Task ID: #
      
      Blame Rev:
      542cc10b
    • J
      Don't use redundant Env::NowMicros() calls · 6db52b52
      Jim Paton 提交于
      Summary: After my patch for stall histograms, there are redundant calls to NowMicros() by both the stop watches and DBImpl::MakeRoomForWrites. So I removed the redundant calls such that the information is gotten from the stopwatch.
      
      Test Plan:
      make clean
      make -j32 check
      
      Reviewers: dhruba, haobo, MarkCallaghan
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11883
      6db52b52
    • J
      Use specific DB name in merge_test · abc90b06
      Jim Paton 提交于
      Summary: Currently, merge_test uses /tmp/testdb for the test database. It should really use something more specific to merge_test. Most of the other tests use test::TmpDir() + "/<test name>db". This patch implements such behavior for merge_test; it makes merge_test use test::TmpDir() + "/merge_testdb"
      
      Test Plan:
      make clean
      make -j32 merge_test
      ./merge_test
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11877
      abc90b06
    • J
      Add stall counts to statistics · 18afff2e
      Jim Paton 提交于
      Summary: Previously, statistics are kept on how much time is spent on stalls of different types. This patch adds support for keeping number of stalls of each type. For example, instead of just reporting how many microseconds are spent waiting for memtables to be compacted, it will also report how many times a write stalled for that to occur.
      
      Test Plan:
      make -j32 check
      ./db_stress
      
      # Not really sure what else should be done...
      
      Reviewers: dhruba, MarkCallaghan, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11841
      18afff2e
  7. 25 7月, 2013 1 次提交
    • D
      Revert 6fbe4e98: If disable wal is set, then... · d7ba5bce
      Dhruba Borthakur 提交于
      Revert 6fbe4e98: If disable wal is set, then batch commits are avoided
      
      Summary:
      Revert "If disable wal is set, then batch commits are avoided" because
      keeping the mutex while inserting into the skiplist means that readers
      and writes are all serialized on the mutex.
      
      Test Plan:
      
      Reviewers:
      
      CC:
      
      Task ID: #
      
      Blame Rev:
      d7ba5bce
  8. 24 7月, 2013 4 次提交
    • J
      Virtualize SkipList Interface · 52d7ecfc
      Jim Paton 提交于
      Summary: This diff virtualizes the skiplist interface so that users can provide their own implementation of a backing store for MemTables. Eventually, the backing store will be responsible for its own synchronization, allowing users (and us) to experiment with different lockless implementations.
      
      Test Plan:
      make clean
      make -j32 check
      ./db_stress
      
      Reviewers: dhruba, emayanke, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11739
      52d7ecfc
    • D
      If disable wal is set, then batch commits are avoided. · 6fbe4e98
      Dhruba Borthakur 提交于
      Summary:
      rocksdb uses batch commit to write to transaction log. But if
      disable wal is set, then writes to transaction log are anyways
      avoided. In this case, there is not much value-add to batch things,
      batching can cause unnecessary delays to Puts().
      This patch avoids batching when disableWal is set.
      
      Test Plan:
      make check.
      
      I am running db_stress now.
      
      Reviewers: haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11763
      6fbe4e98
    • M
      Adding filter_deletes to crash_tests run in jenkins · f3baeecd
      Mayank Agarwal 提交于
      Summary: filter_deletes options introduced in db_stress makes it drop Deletes on key if KeyMayExist(key) returns false on the key. code change was simple and tested so not wasting reviewer's time.
      
      Test Plan: maek crash_test; python tools/db_crashtest[1|2].py
      
      CC: dhruba, vamsi
      
      Differential Revision: https://reviews.facebook.net/D11769
      f3baeecd
    • M
      Use KeyMayExist for WriteBatch-Deletes · bf66c10b
      Mayank Agarwal 提交于
      Summary:
      Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete.
      Added code to skip getting Table from disk if not already present in table_cache.
      Some renaming of variables.
      Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch.
      Changed KeyMayExist to not be pure virtual and provided a default implementation.
      Expanded unit-tests in db_test to check appropriately.
      Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1.
      
      Test Plan: db_stress;make check
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D11745
      bf66c10b
  9. 23 7月, 2013 1 次提交
  10. 20 7月, 2013 1 次提交
  11. 13 7月, 2013 1 次提交
  12. 12 7月, 2013 2 次提交
    • M
      Make rocksdb-deletes faster using bloom filter · 2a986919
      Mayank Agarwal 提交于
      Summary:
      Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving:
      1. Put of delete type
      2. Space in the db,and
      3. Compaction time
      
      Test Plan:
      make all check;
      will run db_stress and db_bench and enhance unit-test once the basic design gets approved
      
      Reviewers: dhruba, haobo, vamsi
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11607
      2a986919
    • X
      Newbie code question · 8a5341ec
      Xing Jin 提交于
      Summary:
      This diff is more about my question when reading compaction codes,
      instead of a normal diff. I don't quite understand the logic here.
      
      Test Plan: I didn't do any test. If this is a bug, I will continue doing some test.
      
      Reviewers: haobo, dhruba, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11661
      8a5341ec
  13. 11 7月, 2013 1 次提交
    • M
      Print complete statistics in db_stress · 821889e2
      Mayank Agarwal 提交于
      Summary: db_stress should alos print complete statistics like db_bench. Needed this when I wanted to measure number of delete-IOs dropped due to CheckKeyMayExist to be introduced to rocksdb codebase later- to make deltes in rocksdb faster
      
      Test Plan: make db_stress;./db_stress --max_key=100 --ops_per_thread=1000 --statistics=1
      
      Reviewers: sheki, dhruba, vamsi, haobo
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D11655
      821889e2
  14. 10 7月, 2013 1 次提交
  15. 09 7月, 2013 1 次提交
    • H
      [RocksDB] Provide contiguous sequence number even in case of write failure · 9ba82786
      Haobo Xu 提交于
      Summary: Replication logic would be simplifeid if we can guarantee that write sequence number is always contiguous, even if write failure occurs. Dhruba and I looked at the sequence number generation part of the code. It seems fixable. Note that if WAL was successful and insert into memtable was not, we would be in an unfortunate state. The approach in this diff is : IO error is expected and error status will be returned to client, sequence number will not be advanced; In-mem error is not expected and we panic.
      
      Test Plan: make check; db_stress
      
      Reviewers: dhruba, sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11439
      9ba82786
  16. 04 7月, 2013 1 次提交
    • H
      [RocksDB] Support internal key/value dump for ldb · 92ca816a
      Haobo Xu 提交于
      Summary: This diff added a command 'idump' to ldb tool, which dumps the internal key/value pairs. It could be useful for diagnosis and estimating the per user key 'overhead'. Also cleaned up the ldb code a bit where I touched.
      
      Test Plan: make check; ldb idump
      
      Reviewers: emayanke, sheki, dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11517
      92ca816a
  17. 02 7月, 2013 1 次提交
  18. 27 6月, 2013 2 次提交
    • H
      [RocksDB] Expose count for WriteBatch · 71e0f695
      Haobo Xu 提交于
      Summary: As title. Exposed a Count function that returns the number of updates in a batch. Could be handy for replication sequence number check.
      
      Test Plan: make check;
      
      Reviewers: emayanke, sheki, dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11523
      71e0f695
    • D
      Added stringappend_test back into the unit tests. · 34ef8732
      Deon Nicholas 提交于
      Summary:
      With the Makefile now updated to correctly update all .o files, this
      should fix the issues recompiling stringappend_test. This should also fix the
      "segmentation-fault" that we were getting earlier. Now, stringappend_test should
      be clean, and I have added it back to the unit-tests. Also made some minor updates
      to the tests themselves.
      
      Test Plan:
      1. make clean; make stringappend_test -j 32	(will test it by itself)
      2. make clean; make all check -j 32		(to run all unit tests)
      3. make clean; make release			(test in release mode)
      4. valgrind ./stringappend_test 		(valgrind tests)
      
      Reviewers: haobo, jpaton, dhruba
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11505
      34ef8732
  19. 26 6月, 2013 1 次提交
    • D
      Updated "make clean" to remove all .o files · 6894a50a
      Deon Nicholas 提交于
      Summary:
      The old Makefile did not remove ALL .o and .d files, but rather only
      those that happened to be in the root folder and one-level deep. This was causing
      issues when recompiling files in deeper folders. This fix now causes make clean
      to find ALL .o and .d files via a unix "find" command, and then remove them.
      
      Test Plan:
      make clean;
      make all -j 32;
      
      Reviewers: haobo, jpaton, dhruba
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11493
      6894a50a
  20. 22 6月, 2013 1 次提交
    • M
      Simplify bucketing logic in ldb-ttl · b858da70
      Mayank Agarwal 提交于
      Summary: [start_time, end_time) is waht I'm following for the buckets and the whole time-range. Also cleaned up some code in db_ttl.* Not correcting the spacing/indenting convention for util/ldb_cmd.cc in this diff.
      
      Test Plan: python ldb_test.py, make ttl_test, Run mcrocksdb-backup tool, Run the ldb tool on 2 mcrocksdb production backups form sigmafio033.prn1
      
      Reviewers: vamsi, haobo
      
      Reviewed By: vamsi
      
      Differential Revision: https://reviews.facebook.net/D11433
      b858da70
  21. 20 6月, 2013 2 次提交
    • M
      Introducing timeranged scan, timeranged dump in ldb. Also the ability to count... · 61f1baae
      Mayank Agarwal 提交于
      Introducing timeranged scan, timeranged dump in ldb. Also the ability to count in time-batches during Dump
      
      Summary:
      Scan and Dump commands in ldb use iterator. We need to also print timestamp for ttl databases for debugging. For this I create a TtlIterator class pointer in these functions and assign it the value of Iterator pointer which actually points to t TtlIterator object, and access the new function ValueWithTS which can return TS also. Buckets feature for dump command: gives a count of different key-values in the specified time-range distributed across the time-range partitioned according to bucket-size. start_time and end_time are specified in unixtimestamp and bucket in seconds on the user-commandline
      Have commented out 3 ines from ldb_test.py so that the test does not break right now. It breaks because timestamp is also printed now and I have to look at wildcards in python to compare properly.
      
      Test Plan: python tools/ldb_test.py
      
      Reviewers: vamsi, dhruba, haobo, sheki
      
      Reviewed By: vamsi
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11403
      61f1baae
    • H
      [RocksDB] add back --mmap_read options to crashtest · 0f78fad9
      Haobo Xu 提交于
      Summary: As title, now that db_stress supports --map_read properly
      
      Test Plan: make crash_test
      
      Reviewers: vamsi, emayanke, dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11391
      0f78fad9