1. 25 7月, 2013 1 次提交
    • D
      Revert 6fbe4e98: If disable wal is set, then... · d7ba5bce
      Dhruba Borthakur 提交于
      Revert 6fbe4e98: If disable wal is set, then batch commits are avoided
      
      Summary:
      Revert "If disable wal is set, then batch commits are avoided" because
      keeping the mutex while inserting into the skiplist means that readers
      and writes are all serialized on the mutex.
      
      Test Plan:
      
      Reviewers:
      
      CC:
      
      Task ID: #
      
      Blame Rev:
      d7ba5bce
  2. 24 7月, 2013 4 次提交
    • J
      Virtualize SkipList Interface · 52d7ecfc
      Jim Paton 提交于
      Summary: This diff virtualizes the skiplist interface so that users can provide their own implementation of a backing store for MemTables. Eventually, the backing store will be responsible for its own synchronization, allowing users (and us) to experiment with different lockless implementations.
      
      Test Plan:
      make clean
      make -j32 check
      ./db_stress
      
      Reviewers: dhruba, emayanke, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11739
      52d7ecfc
    • D
      If disable wal is set, then batch commits are avoided. · 6fbe4e98
      Dhruba Borthakur 提交于
      Summary:
      rocksdb uses batch commit to write to transaction log. But if
      disable wal is set, then writes to transaction log are anyways
      avoided. In this case, there is not much value-add to batch things,
      batching can cause unnecessary delays to Puts().
      This patch avoids batching when disableWal is set.
      
      Test Plan:
      make check.
      
      I am running db_stress now.
      
      Reviewers: haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11763
      6fbe4e98
    • M
      Adding filter_deletes to crash_tests run in jenkins · f3baeecd
      Mayank Agarwal 提交于
      Summary: filter_deletes options introduced in db_stress makes it drop Deletes on key if KeyMayExist(key) returns false on the key. code change was simple and tested so not wasting reviewer's time.
      
      Test Plan: maek crash_test; python tools/db_crashtest[1|2].py
      
      CC: dhruba, vamsi
      
      Differential Revision: https://reviews.facebook.net/D11769
      f3baeecd
    • M
      Use KeyMayExist for WriteBatch-Deletes · bf66c10b
      Mayank Agarwal 提交于
      Summary:
      Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete.
      Added code to skip getting Table from disk if not already present in table_cache.
      Some renaming of variables.
      Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch.
      Changed KeyMayExist to not be pure virtual and provided a default implementation.
      Expanded unit-tests in db_test to check appropriately.
      Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1.
      
      Test Plan: db_stress;make check
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D11745
      bf66c10b
  3. 23 7月, 2013 1 次提交
  4. 20 7月, 2013 1 次提交
  5. 13 7月, 2013 1 次提交
  6. 12 7月, 2013 2 次提交
    • M
      Make rocksdb-deletes faster using bloom filter · 2a986919
      Mayank Agarwal 提交于
      Summary:
      Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving:
      1. Put of delete type
      2. Space in the db,and
      3. Compaction time
      
      Test Plan:
      make all check;
      will run db_stress and db_bench and enhance unit-test once the basic design gets approved
      
      Reviewers: dhruba, haobo, vamsi
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11607
      2a986919
    • X
      Newbie code question · 8a5341ec
      Xing Jin 提交于
      Summary:
      This diff is more about my question when reading compaction codes,
      instead of a normal diff. I don't quite understand the logic here.
      
      Test Plan: I didn't do any test. If this is a bug, I will continue doing some test.
      
      Reviewers: haobo, dhruba, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11661
      8a5341ec
  7. 11 7月, 2013 1 次提交
    • M
      Print complete statistics in db_stress · 821889e2
      Mayank Agarwal 提交于
      Summary: db_stress should alos print complete statistics like db_bench. Needed this when I wanted to measure number of delete-IOs dropped due to CheckKeyMayExist to be introduced to rocksdb codebase later- to make deltes in rocksdb faster
      
      Test Plan: make db_stress;./db_stress --max_key=100 --ops_per_thread=1000 --statistics=1
      
      Reviewers: sheki, dhruba, vamsi, haobo
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D11655
      821889e2
  8. 10 7月, 2013 1 次提交
  9. 09 7月, 2013 1 次提交
    • H
      [RocksDB] Provide contiguous sequence number even in case of write failure · 9ba82786
      Haobo Xu 提交于
      Summary: Replication logic would be simplifeid if we can guarantee that write sequence number is always contiguous, even if write failure occurs. Dhruba and I looked at the sequence number generation part of the code. It seems fixable. Note that if WAL was successful and insert into memtable was not, we would be in an unfortunate state. The approach in this diff is : IO error is expected and error status will be returned to client, sequence number will not be advanced; In-mem error is not expected and we panic.
      
      Test Plan: make check; db_stress
      
      Reviewers: dhruba, sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11439
      9ba82786
  10. 04 7月, 2013 1 次提交
    • H
      [RocksDB] Support internal key/value dump for ldb · 92ca816a
      Haobo Xu 提交于
      Summary: This diff added a command 'idump' to ldb tool, which dumps the internal key/value pairs. It could be useful for diagnosis and estimating the per user key 'overhead'. Also cleaned up the ldb code a bit where I touched.
      
      Test Plan: make check; ldb idump
      
      Reviewers: emayanke, sheki, dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11517
      92ca816a
  11. 02 7月, 2013 1 次提交
  12. 27 6月, 2013 2 次提交
    • H
      [RocksDB] Expose count for WriteBatch · 71e0f695
      Haobo Xu 提交于
      Summary: As title. Exposed a Count function that returns the number of updates in a batch. Could be handy for replication sequence number check.
      
      Test Plan: make check;
      
      Reviewers: emayanke, sheki, dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11523
      71e0f695
    • D
      Added stringappend_test back into the unit tests. · 34ef8732
      Deon Nicholas 提交于
      Summary:
      With the Makefile now updated to correctly update all .o files, this
      should fix the issues recompiling stringappend_test. This should also fix the
      "segmentation-fault" that we were getting earlier. Now, stringappend_test should
      be clean, and I have added it back to the unit-tests. Also made some minor updates
      to the tests themselves.
      
      Test Plan:
      1. make clean; make stringappend_test -j 32	(will test it by itself)
      2. make clean; make all check -j 32		(to run all unit tests)
      3. make clean; make release			(test in release mode)
      4. valgrind ./stringappend_test 		(valgrind tests)
      
      Reviewers: haobo, jpaton, dhruba
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11505
      34ef8732
  13. 26 6月, 2013 1 次提交
    • D
      Updated "make clean" to remove all .o files · 6894a50a
      Deon Nicholas 提交于
      Summary:
      The old Makefile did not remove ALL .o and .d files, but rather only
      those that happened to be in the root folder and one-level deep. This was causing
      issues when recompiling files in deeper folders. This fix now causes make clean
      to find ALL .o and .d files via a unix "find" command, and then remove them.
      
      Test Plan:
      make clean;
      make all -j 32;
      
      Reviewers: haobo, jpaton, dhruba
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11493
      6894a50a
  14. 22 6月, 2013 1 次提交
    • M
      Simplify bucketing logic in ldb-ttl · b858da70
      Mayank Agarwal 提交于
      Summary: [start_time, end_time) is waht I'm following for the buckets and the whole time-range. Also cleaned up some code in db_ttl.* Not correcting the spacing/indenting convention for util/ldb_cmd.cc in this diff.
      
      Test Plan: python ldb_test.py, make ttl_test, Run mcrocksdb-backup tool, Run the ldb tool on 2 mcrocksdb production backups form sigmafio033.prn1
      
      Reviewers: vamsi, haobo
      
      Reviewed By: vamsi
      
      Differential Revision: https://reviews.facebook.net/D11433
      b858da70
  15. 20 6月, 2013 4 次提交
    • M
      Introducing timeranged scan, timeranged dump in ldb. Also the ability to count... · 61f1baae
      Mayank Agarwal 提交于
      Introducing timeranged scan, timeranged dump in ldb. Also the ability to count in time-batches during Dump
      
      Summary:
      Scan and Dump commands in ldb use iterator. We need to also print timestamp for ttl databases for debugging. For this I create a TtlIterator class pointer in these functions and assign it the value of Iterator pointer which actually points to t TtlIterator object, and access the new function ValueWithTS which can return TS also. Buckets feature for dump command: gives a count of different key-values in the specified time-range distributed across the time-range partitioned according to bucket-size. start_time and end_time are specified in unixtimestamp and bucket in seconds on the user-commandline
      Have commented out 3 ines from ldb_test.py so that the test does not break right now. It breaks because timestamp is also printed now and I have to look at wildcards in python to compare properly.
      
      Test Plan: python tools/ldb_test.py
      
      Reviewers: vamsi, dhruba, haobo, sheki
      
      Reviewed By: vamsi
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11403
      61f1baae
    • H
      [RocksDB] add back --mmap_read options to crashtest · 0f78fad9
      Haobo Xu 提交于
      Summary: As title, now that db_stress supports --map_read properly
      
      Test Plan: make crash_test
      
      Reviewers: vamsi, emayanke, dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11391
      0f78fad9
    • H
      [RocksDB] Minor change to statistics.h · 4deaa0d4
      Haobo Xu 提交于
      Summary: as title, use initialize list so that lines fit in 80 chars.
      
      Test Plan: make check;
      
      Reviewers: sheki, dhruba
      
      Differential Revision: https://reviews.facebook.net/D11385
      4deaa0d4
    • H
      [RocksDB] Add mmap_read option for db_stress · 96be2c4e
      Haobo Xu 提交于
      Summary: as title, also removed an incorrect assertion
      
      Test Plan: make check; db_stress --mmap_read=1; db_stress --mmap_read=0
      
      Reviewers: dhruba, emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11367
      96be2c4e
  16. 19 6月, 2013 5 次提交
  17. 18 6月, 2013 4 次提交
  18. 15 6月, 2013 4 次提交
  19. 14 6月, 2013 2 次提交
  20. 13 6月, 2013 2 次提交
    • H
      [RocksDB] Sync file to disk incrementally · 778e1790
      Haobo Xu 提交于
      Summary:
      During compaction, we sync the output files after they are fully written out. This causes unnecessary blocking of the compaction thread and burstiness of the write traffic.
      This diff simply asks the OS to sync data incrementally as they are written, on the background. The hope is that, at the final sync, most of the data are already on disk and we would block less on the sync call. Thus, each compaction runs faster and we could use fewer number of compaction threads to saturate IO.
      In addition, the write traffic will be smoothed out, hopefully reducing the IO P99 latency too.
      
      Some quick tests show 10~20% improvement in per thread compaction throughput. Combined with posix advice on compaction read, just 5 threads are enough to almost saturate the udb flash bandwidth for 800 bytes write only benchmark.
      What's more promising is that, with saturated IO, iostat shows average wait time is actually smoother and much smaller.
      For the write only test 800bytes test:
      Before the change:  await  occillate between 10ms and 3ms
      After the change: await ranges 1-3ms
      
      Will test against read-modify-write workload too, see if high read latency P99 could be resolved.
      
      Will introduce a parameter to control the sync interval in a follow up diff after cleaning up EnvOptions.
      
      Test Plan: make check; db_bench; db_stress
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11115
      778e1790
    • D
      [Rocksdb] [Multiget] Introduced multiget into db_bench · 4985a9f7
      Deon Nicholas 提交于
      Summary:
      Preliminary! Introduced the --use_multiget=1 and --keys_per_multiget=n
      flags for db_bench. Also updated and tested the ReadRandom() method
      to include an option to use multiget. By default,
      keys_per_multiget=100.
      
      Preliminary tests imply that multiget is at least 1.25x faster per
      key than regular get.
      
      Will continue adding Multiget for ReadMissing, ReadHot,
      RandomWithVerify, ReadRandomWriteRandom; soon. Will also think
      about ways to better verify benchmarks.
      
      Test Plan:
      1. make db_bench
      2. ./db_bench --benchmarks=fillrandom
      3. ./db_bench --benchmarks=readrandom --use_existing_db=1
      	      --use_multiget=1 --threads=4 --keys_per_multiget=100
      4. ./db_bench --benchmarks=readrandom --use_existing_db=1
      	      --threads=4
      5. Verify ops/sec (and 1000000 of 1000000 keys found)
      
      Reviewers: haobo, MarkCallaghan, dhruba
      
      Reviewed By: MarkCallaghan
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11127
      4985a9f7