1. 13 3月, 2014 4 次提交
  2. 12 3月, 2014 2 次提交
  3. 11 3月, 2014 1 次提交
    • L
      Consolidate SliceTransform object ownership · 8d007b4a
      Lei Jin 提交于
      Summary:
      (1) Fix SanitizeOptions() to also check HashLinkList. The current
      dynamic case just happens to work because the 2 classes have the same
      layout.
      (2) Do not delete SliceTransform object in HashSkipListFactory and
      HashLinkListFactory destructor. Reason: SanitizeOptions() enforces
      prefix_extractor and SliceTransform to be the same object when
      Hash**Factory is used. This makes the behavior strange: when
      Hash**Factory is used, prefix_extractor will be released by RocksDB. If
      other memtable factory is used, prefix_extractor should be released by
      user.
      
      Test Plan: db_bench && make asan_check
      
      Reviewers: haobo, igor, sdong
      
      Reviewed By: igor
      
      CC: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D16587
      8d007b4a
  4. 09 2月, 2014 1 次提交
  5. 25 1月, 2014 3 次提交
    • S
      Moving Some includes from options.h to forward declaration · 8477255d
      Siying Dong 提交于
      Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies.
      
      Test Plan: make all check
      
      Reviewers: kailiu, haobo, igor, dhruba
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15411
      8477255d
    • I
      Revert "Moving to glibc-fb" · e832e72b
      Igor Canadi 提交于
      This reverts commit d24961b6.
      
      For some reason, glibc2.17-fb breaks gflags. Reverting for now
      e832e72b
    • I
      Moving to glibc-fb · d24961b6
      Igor Canadi 提交于
      Summary:
      It looks like we might have some trouble when building the new release with 4.8, since fbcode is using glibc2.17-fb by default and we are using glibc2.17. It was reported by Benjamin Renard in our internal group.
      
      This diff moves our fbcode build to use glibc2.17-fb by default. I got some linker errors when compiling, complaining that `google::SetUsageMessage()` was undefined. After deleting all offending lines, the compile was successful and everything works.
      
      Test Plan:
      Compiled
      Ran ./db_bench ./db_stress ./db_repl_stress
      
      Reviewers: kailiu
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15405
      d24961b6
  6. 18 1月, 2014 1 次提交
  7. 04 12月, 2013 1 次提交
    • I
      Killing Transform Rep · eb12e47e
      Igor Canadi 提交于
      Summary:
      Let's get rid of TransformRep and it's children. We have confirmed that HashSkipListRep works better with multifeed, so there is no benefit to keeping this around.
      
      This diff is mostly just deleting references to obsoleted functions. I also have a diff for fbcode that we'll need to push when we switch to new release.
      
      I had to expose HashSkipListRepFactory in the client header files because db_impl.cc needs access to GetTransform() function for SanitizeOptions.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14397
      eb12e47e
  8. 20 11月, 2013 1 次提交
    • I
      Fix two nasty use-after-free-bugs · 469a9f32
      Igor Canadi 提交于
      Summary:
      These bugs were caught by ASAN crash test.
      1. The first one, in table/filter_block.cc is very nasty. We first reference entries_ and store the reference to Slice prev. Then, we call entries_.append(), which can change the reference. The Slice prev now points to junk.
      2. The second one is a bug in a test, so it's not very serious. Once we set read_opts.prefix, we never clear it, so some other function might still reference it.
      
      Test Plan: asan crash test now runs more than 5 mins. Before, it failed immediately. I will run the full one, but the full one takes quite some time (5 hours)
      
      Reviewers: dhruba, haobo, kailiu
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14223
      469a9f32
  9. 17 11月, 2013 1 次提交
  10. 13 11月, 2013 1 次提交
    • K
      Add the index/filter block cache · 88ba331c
      Kai Liu 提交于
      Summary: This diff leverage the existing block cache and extend it to cache index/filter block.
      
      Test Plan:
      Added new tests in db_test and table_test
      
      The correctness is checked by:
      
      1. make check
      2. make valgrind_check
      
      Performance is test by:
      
      1. 10 times of build_tools/regression_build_test.sh on two versions of rocksdb before/after the code change. Test results suggests no significant difference between them. For the two key operatons `overwrite` and `readrandom`, the average iops are both 20k and ~260k, with very small variance).
      2. db_stress.
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb, haobo, xjin
      
      Differential Revision: https://reviews.facebook.net/D13167
      88ba331c
  11. 02 11月, 2013 1 次提交
    • D
      Implement a compressed block cache. · b4ad5e89
      Dhruba Borthakur 提交于
      Summary:
      Rocksdb can now support a uncompressed block cache, or a compressed
      block cache or both. Lookups first look for a block in the
      uncompressed cache, if it is not found only then it is looked up
      in the compressed cache. If it is found in the compressed cache,
      then it is uncompressed and inserted into the uncompressed cache.
      
      It is possible that the same block resides in the compressed cache
      as well as the uncompressed cache at the same time. Both caches
      have their own individual LRU policy.
      
      Test Plan: Unit test case attached.
      
      Reviewers: kailiu, sdong, haobo, leveldb
      
      Reviewed By: haobo
      
      CC: xjin, haobo
      
      Differential Revision: https://reviews.facebook.net/D12675
      b4ad5e89
  12. 24 10月, 2013 1 次提交
  13. 17 10月, 2013 1 次提交
  14. 06 10月, 2013 1 次提交
  15. 05 10月, 2013 1 次提交
  16. 03 10月, 2013 1 次提交
  17. 01 10月, 2013 1 次提交
    • N
      Phase 2 of iterator stress test · 7edb92b8
      Natalie Hildebrandt 提交于
      Summary: Using an iterator instead of the Get method, each thread goes through a portion of the database and verifies values by comparing to the shared state.
      
      Test Plan:
      ./db_stress --db=/tmp/tmppp --max_key=10000 --ops_per_thread=10000
      
      To test some basic cases, the following lines can be added (each set in turn) to the verifyDb method with the following expected results:
      
          // Should abort with "Unexpected value found"
          shared.Delete(start);
      
          // Should abort with "Value not found"
          WriteOptions write_opts;
          db_->Delete(write_opts, Key(start));
      
          // Should succeed
          WriteOptions write_opts;
          shared.Delete(start);
           db_->Delete(write_opts, Key(start));
      
          // Should abort with "Value not found"
          WriteOptions write_opts;
          db_->Delete(write_opts, Key(start + (end-start)/2));
      
          // Should abort with "Value not found"
          db_->Delete(write_opts, Key(end-1));
      
          // Should abort with "Unexpected value"
          shared.Delete(end-1);
      
          // Should abort with "Unexpected value"
          shared.Delete(start + (end-start)/2);
      
          // Should abort with "Value not found"
          db_->Delete(write_opts, Key(start));
          shared.Delete(start);
          db_->Delete(write_opts, Key(end-1));
          db_->Delete(write_opts, Key(end-2));
      
      To test the out of range abort, change the key in the for loop to Key(i+1), so that the key defined by the index i is now outside of the supposed range of the database.
      
      Reviewers: emayanke
      
      Reviewed By: emayanke
      
      CC: dhruba, xjin
      
      Differential Revision: https://reviews.facebook.net/D13071
      7edb92b8
  18. 20 9月, 2013 1 次提交
    • N
      Phase 1 of an iterator stress test · 43354182
      Natalie Hildebrandt 提交于
      Summary:
      Added MultiIterate() which does a seek and some Next/Prev
      calls.  Iterator status is checked only, no data integrity check
      
      Test Plan:
      make db_stress
      ./db_stress --iterpercent=<nonzero value> --readpercent=, etc.
      
      Reviewers: emayanke, dhruba, xjin
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12915
      43354182
  19. 14 9月, 2013 1 次提交
    • D
      Added a parameter to limit the maximum space amplification for universal compaction. · 4012ca1c
      Dhruba Borthakur 提交于
      Summary:
      Added a new field called max_size_amplification_ratio in the
      CompactionOptionsUniversal structure. This determines the maximum
      percentage overhead of space amplification.
      
      The size amplification is defined to be the ratio between the size of
      the oldest file to the sum of the sizes of all other files. If the
      size amplification exceeds the specified value, then min_merge_width
      and max_merge_width are ignored and a full compaction of all files is done.
      A value of 10 means that the size a database that stores 100 bytes
      of user data could occupy 110 bytes of physical storage.
      
      Test Plan: Unit test DBTest.UniversalCompactionSpaceAmplification added.
      
      Reviewers: haobo, emayanke, xjin
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12825
      4012ca1c
  20. 24 8月, 2013 1 次提交
  21. 23 8月, 2013 1 次提交
    • J
      Add three new MemTableRep's · 74781a0c
      Jim Paton 提交于
      Summary:
      This patch adds three new MemTableRep's: UnsortedRep, PrefixHashRep, and VectorRep.
      
      UnsortedRep stores keys in an std::unordered_map of std::sets. When an iterator is requested, it dumps the keys into an std::set and iterates over that.
      
      VectorRep stores keys in an std::vector. When an iterator is requested, it creates a copy of the vector and sorts it using std::sort. The iterator accesses that new vector.
      
      PrefixHashRep stores keys in an unordered_map mapping prefixes to ordered sets.
      
      I also added one API change. I added a function MemTableRep::MarkImmutable. This function is called when the rep is added to the immutable list. It doesn't do anything yet, but it seems like that could be useful. In particular, for the vectorrep, it means we could elide the extra copy and just sort in place. The only reason I haven't done that yet is because the use of the ArenaAllocator complicates things (I can elaborate on this if needed).
      
      Test Plan:
      make -j32 check
      ./db_stress --memtablerep=vector
      ./db_stress --memtablerep=unsorted
      ./db_stress --memtablerep=prefixhash --prefix_size=10
      
      Reviewers: dhruba, haobo, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12117
      74781a0c
  22. 21 8月, 2013 1 次提交
  23. 16 8月, 2013 1 次提交
    • D
      Benchmarking for Merge Operator · ad48c3c2
      Deon Nicholas 提交于
      Summary:
      Updated db_bench and utilities/merge_operators.h to allow for dynamic benchmarking
      of merge operators in db_bench. Added a new test (--benchmarks=mergerandom), which performs
      a bunch of random Merge() operations over random keys. Also added a "--merge_operator=" flag
      so that the tester can easily benchmark different merge operators. Currently supports
      the PutOperator and UInt64Add operator. Support for stringappend or list append may come later.
      
      Test Plan:
      	1. make db_bench
      	2. Test the PutOperator (simulating Put) as follows:
      ./db_bench --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom --merge_operator=put
      --threads=2
      
      3. Test the UInt64AddOperator (simulating numeric addition) similarly:
      ./db_bench --value_size=8 --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom
      --merge_operator=uint64add --threads=2
      
      Reviewers: haobo, dhruba, zshao, MarkCallaghan
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11535
      ad48c3c2
  24. 15 8月, 2013 2 次提交
  25. 06 8月, 2013 1 次提交
  26. 02 8月, 2013 1 次提交
    • M
      Expand KeyMayExist to return the proper value if it can be found in memory and... · 59d0b02f
      Mayank Agarwal 提交于
      Expand KeyMayExist to return the proper value if it can be found in memory and also check block_cache
      
      Summary: Removed KeyMayExistImpl because KeyMayExist demanded Get like semantics now. Removed no_io from memtable and imm because we need the proper value now and shouldn't just stop when we see Merge in memtable. Added checks to block_cache. Updated documentation and unit-test
      
      Test Plan: make all check;db_stress for 1 hour
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11853
      59d0b02f
  27. 24 7月, 2013 1 次提交
    • M
      Use KeyMayExist for WriteBatch-Deletes · bf66c10b
      Mayank Agarwal 提交于
      Summary:
      Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete.
      Added code to skip getting Table from disk if not already present in table_cache.
      Some renaming of variables.
      Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch.
      Changed KeyMayExist to not be pure virtual and provided a default implementation.
      Expanded unit-tests in db_test to check appropriately.
      Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1.
      
      Test Plan: db_stress;make check
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D11745
      bf66c10b
  28. 12 7月, 2013 1 次提交
    • M
      Make rocksdb-deletes faster using bloom filter · 2a986919
      Mayank Agarwal 提交于
      Summary:
      Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving:
      1. Put of delete type
      2. Space in the db,and
      3. Compaction time
      
      Test Plan:
      make all check;
      will run db_stress and db_bench and enhance unit-test once the basic design gets approved
      
      Reviewers: dhruba, haobo, vamsi
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11607
      2a986919
  29. 11 7月, 2013 1 次提交
    • M
      Print complete statistics in db_stress · 821889e2
      Mayank Agarwal 提交于
      Summary: db_stress should alos print complete statistics like db_bench. Needed this when I wanted to measure number of delete-IOs dropped due to CheckKeyMayExist to be introduced to rocksdb codebase later- to make deltes in rocksdb faster
      
      Test Plan: make db_stress;./db_stress --max_key=100 --ops_per_thread=1000 --statistics=1
      
      Reviewers: sheki, dhruba, vamsi, haobo
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D11655
      821889e2
  30. 04 7月, 2013 1 次提交
  31. 01 7月, 2013 2 次提交
    • D
      Reduce write amplification by merging files in L0 back into L0 · 47c4191f
      Dhruba Borthakur 提交于
      Summary:
      There is a new option called hybrid_mode which, when switched on,
      causes HBase style compactions.  Files from L0 are
      compacted back into L0. This meat of this compaction algorithm
      is in PickCompactionHybrid().
      
      All files reside in L0. That means all files have overlapping
      keys. Each file has a time-bound, i.e. each file contains a
      range of keys that were inserted around the same time. The
      start-seqno and the end-seqno refers to the timeframe when
      these keys were inserted.  Files that have contiguous seqno
      are compacted together into a larger file. All files are
      ordered from most recent to the oldest.
      
      The current compaction algorithm starts to look for
      candidate files starting from the most recent file. It continues to
      add more files to the same compaction run as long as the
      sum of the files chosen till now is smaller than the next
      candidate file size. This logic needs to be debated
      and validated.
      
      The above logic should reduce write amplification to a
      large extent... will publish numbers shortly.
      
      Test Plan: dbstress runs for 6 hours with no data corruption (tested so far).
      
      Differential Revision: https://reviews.facebook.net/D11289
      47c4191f
    • D
      Reduce write amplification by merging files in L0 back into L0 · 554c06dd
      Dhruba Borthakur 提交于
      Summary:
      There is a new option called hybrid_mode which, when switched on,
      causes HBase style compactions.  Files from L0 are
      compacted back into L0. This meat of this compaction algorithm
      is in PickCompactionHybrid().
      
      All files reside in L0. That means all files have overlapping
      keys. Each file has a time-bound, i.e. each file contains a
      range of keys that were inserted around the same time. The
      start-seqno and the end-seqno refers to the timeframe when
      these keys were inserted.  Files that have contiguous seqno
      are compacted together into a larger file. All files are
      ordered from most recent to the oldest.
      
      The current compaction algorithm starts to look for
      candidate files starting from the most recent file. It continues to
      add more files to the same compaction run as long as the
      sum of the files chosen till now is smaller than the next
      candidate file size. This logic needs to be debated
      and validated.
      
      The above logic should reduce write amplification to a
      large extent... will publish numbers shortly.
      
      Test Plan: dbstress runs for 6 hours with no data corruption (tested so far).
      
      Differential Revision: https://reviews.facebook.net/D11289
      554c06dd
  32. 20 6月, 2013 1 次提交