1. 20 10月, 2017 1 次提交
  2. 10 10月, 2017 1 次提交
    • Y
      WritePrepared Txn: Iterator · 8c392a31
      Yi Wu 提交于
      Summary:
      On iterator create, take a snapshot, create a ReadCallback and pass the ReadCallback to the underlying DBIter to check if key is committed.
      Closes https://github.com/facebook/rocksdb/pull/2981
      
      Differential Revision: D6001471
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 3565c4cdaf25370ba47008b0e0cb65b31dfe79fe
      8c392a31
  3. 04 10月, 2017 1 次提交
    • Y
      Add ValueType::kTypeBlobIndex · d1cab2b6
      Yi Wu 提交于
      Summary:
      Add kTypeBlobIndex value type, which will be used by blob db only, to insert a (key, blob_offset) KV pair. The purpose is to
      1. Make it possible to open existing rocksdb instance as blob db. Existing value will be of kTypeIndex type, while value inserted by blob db will be of kTypeBlobIndex.
      2. Make rocksdb able to detect if the db contains value written by blob db, if so return error.
      3. Make it possible to have blob db optionally store value in SST file (with kTypeValue type) or as a blob value (with kTypeBlobIndex type).
      
      The root db (DBImpl) basically pretended kTypeBlobIndex are normal value on write. On Get if is_blob is provided, return whether the value read is of kTypeBlobIndex type, or return Status::NotSupported() status if is_blob is not provided. On scan allow_blob flag is pass and if the flag is true, return wether the value is of kTypeBlobIndex type via iter->IsBlob().
      
      Changes on blob db side will be in a separate patch.
      Closes https://github.com/facebook/rocksdb/pull/2886
      
      Differential Revision: D5838431
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 3c5306c62bc13bb11abc03422ec5cbcea1203cca
      d1cab2b6
  4. 15 9月, 2017 1 次提交
    • S
      Three code-level optimization to Iterator::Next() · edcbb369
      Siying Dong 提交于
      Summary:
      Three small optimizations:
      (1) iter_->IsKeyPinned() shouldn't be called if read_options.pin_data is not true. This may trigger function call all the way down the iterator tree.
      (2) reuse the iterator key object in DBIter::FindNextUserEntryInternal(). The constructor of the class has some overheads.
      (3) Move the switching direction logic in MergingIterator::Next() to a separate function.
      
      These three in total improves readseq performance by about 3% in my benchmark setting.
      Closes https://github.com/facebook/rocksdb/pull/2880
      
      Differential Revision: D5829252
      
      Pulled By: siying
      
      fbshipit-source-id: 991aea10c6d6c3b43769cb4db168db62954ad1e3
      edcbb369
  5. 12 9月, 2017 1 次提交
    • S
      Make DBIter class final · 2dd22e54
      Siying Dong 提交于
      Summary:
      DBIter is referenced in ArenaWrappedDBIter, which is a simple wrapper. If DBIter is final, some virtual function call can be avoided. Some functions can even be inlined, like DBIter.value() to ArenaWrappedDBIter.value() and DBIter.key() to ArenaWrappedDBIter.key(). The performance gain is hard to measure. I just ran the memory-only benchmark for readseq and saw it didn't regress. There shouldn't be any harm doing it. Just give compiler more choices.
      Closes https://github.com/facebook/rocksdb/pull/2859
      
      Differential Revision: D5799888
      
      Pulled By: siying
      
      fbshipit-source-id: 829788f91310c40282dcfb7e412e6ef489931143
      2dd22e54
  6. 19 8月, 2017 1 次提交
    • A
      perf_context measure user bytes read · ed0a4c93
      Andrew Kryczka 提交于
      Summary:
      With this PR, we can measure read-amp for queries where perf_context is enabled as follows:
      
      ```
      SetPerfLevel(kEnableCount);
      Get(1, "foo");
      double read_amp = static_cast<double>(get_perf_context()->block_read_byte / get_perf_context()->get_read_bytes);
      SetPerfLevel(kDisable);
      ```
      
      Our internal infra enables perf_context for a sampling of queries. So we'll be able to compute the read-amp for the sample set, which can give us a good estimate of read-amp.
      Closes https://github.com/facebook/rocksdb/pull/2749
      
      Differential Revision: D5647240
      
      Pulled By: ajkr
      
      fbshipit-source-id: ad73550b06990cf040cc4528fa885360f308ec12
      ed0a4c93
  7. 25 7月, 2017 1 次提交
    • S
      Add Iterator::Refresh() · e67b35c0
      Siying Dong 提交于
      Summary:
      Add and implement Iterator::Refresh(). When this function is called, if the super version doesn't change, update the sequence number of the iterator to the latest one and invalidate the iterator. If the super version changed, recreated the whole iterator. This can help users reuse the iterator more easily.
      Closes https://github.com/facebook/rocksdb/pull/2621
      
      Differential Revision: D5464500
      
      Pulled By: siying
      
      fbshipit-source-id: f548bd35e85c1efca2ea69273802f6704eba6ba9
      e67b35c0
  8. 16 7月, 2017 1 次提交
  9. 29 6月, 2017 1 次提交
    • S
      Make "make analyze" happy · 18c63af6
      Siying Dong 提交于
      Summary:
      "make analyze" is reporting some errors. It's complicated to look but it seems to me that they are all false positive. Anyway, I think cleaning them up is a good idea. Some of the changes are hacky but I don't know a better way.
      Closes https://github.com/facebook/rocksdb/pull/2508
      
      Differential Revision: D5341710
      
      Pulled By: siying
      
      fbshipit-source-id: 6070e430e0e41a080ef441e05e8ec827d45efab6
      18c63af6
  10. 31 5月, 2017 1 次提交
  11. 24 5月, 2017 1 次提交
    • S
      Fix errors in clang-analyzer builds · 7d8207f1
      Sagar Vemuri 提交于
      Summary:
      Fix build error in db_iter.cc when running clang-analyzer.
      ```
        CC       db/db_iter.o
      db/db_iter.cc:938:21: error: no matching constructor for initialization of 'rocksdb::ParsedInternalKey'
        ParsedInternalKey ikey(Slice(), 0, 0);
                          ^    ~~~~~~~~~~~~~
      ./db/dbformat.h:84:3: note: candidate constructor not viable: no known conversion from 'int' to 'rocksdb::ValueType' for 3rd argument
        ParsedInternalKey(const Slice& u, const SequenceNumber& seq, ValueType t)
        ^
      ./db/dbformat.h:78:8: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 3 were provided
      struct ParsedInternalKey {
             ^
      ./db/dbformat.h:78:8: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 3 were provided
      ./db/dbformat.h:83:3: note: candidate constructor not viable: requires 0 arguments, but 3 were provided
        ParsedInternalKey() { }  // Intentionally left uninitialized (for speed)
        ^
      1 error generated.
      ```
      Closes https://github.com/facebook/rocksdb/pull/2354
      
      Differential Revision: D5115751
      
      Pulled By: sagar0
      
      fbshipit-source-id: b0e386d4e935e4725b07761c3ca5f7a8cbde3692
      7d8207f1
  12. 20 5月, 2017 1 次提交
    • Y
      Suppress clang-analyzer false positive · d746aead
      Yi Wu 提交于
      Summary:
      Fixing two types of clang-analyzer false positives:
      * db is deleted and then reopen, and clang-analyzer thinks we are reusing the pointer after it has been deleted. Adding asserts to hint clang-analyzer the pointer is recreated.
      * ParsedInternalKey is (intentionally) uninitialized. Initialize the struct only when clang-analyzer is running.
      Closes https://github.com/facebook/rocksdb/pull/2334
      
      Differential Revision: D5093801
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: f51355382098eb3da5ab9f64e094c6d03e6bdf7d
      d746aead
  13. 28 4月, 2017 1 次提交
  14. 11 4月, 2017 1 次提交
    • S
      Reduce the number of params needed to construct DBIter · 7124268a
      Sagar Vemuri 提交于
      Summary:
      DBIter, and in-turn NewDBIterator and NewArenaWrappedDBIterator, take a  bunch of params. They can be reduced by passing in ReadOptions directly instead of passing in every new param separately. It also seems much cleaner as a bunch of the params towards the end seem to be optional.
      
      (Recently I introduced max_skippable_internal_keys, which added one more to the already huge count).
      
      Idea courtesy IslamAbdelRahman
      Closes https://github.com/facebook/rocksdb/pull/2116
      
      Differential Revision: D4857128
      
      Pulled By: sagar0
      
      fbshipit-source-id: 7d239df094b94bd9ea79d145cdf825478ac037a8
      7124268a
  15. 06 4月, 2017 1 次提交
  16. 05 4月, 2017 1 次提交
  17. 04 4月, 2017 1 次提交
  18. 31 3月, 2017 1 次提交
    • S
      Option to fail a request as incomplete when skipping too many internal keys · c6d04f2e
      Sagar Vemuri 提交于
      Summary:
      Operations like Seek/Next/Prev sometimes take too long to complete when there are many internal keys to be skipped. Adding an option, max_skippable_internal_keys -- which could be used to set a threshold for the maximum number of keys that can be skipped, will help to address these cases where it is much better to fail a request (as incomplete) than to wait for a considerable time for the request to complete.
      
      This feature -- to fail an iterator seek request as incomplete, is disabled by default when max_skippable_internal_keys = 0. It is enabled only when max_skippable_internal_keys > 0.
      
      This feature is based on the discussion mentioned in the PR https://github.com/facebook/rocksdb/pull/1084.
      Closes https://github.com/facebook/rocksdb/pull/2000
      
      Differential Revision: D4753223
      
      Pulled By: sagar0
      
      fbshipit-source-id: 1c973f7
      c6d04f2e
  19. 16 3月, 2017 1 次提交
    • I
      Add macros to include file name and line number during Logging · e1916368
      Islam AbdelRahman 提交于
      Summary:
      current logging
      ```
      2017/03/14-14:20:30.393432 7fedde9f5700 (Original Log Time 2017/03/14-14:20:30.393414) [default] Level summary: base level 1 max bytes base 268435456 files[1 0 0 0 0 0 0] max score 0.25
      2017/03/14-14:20:30.393438 7fedde9f5700 [JOB 2] Try to delete WAL files size 61417909, prev total WAL file size 73820858, number of live WAL files 2.
      2017/03/14-14:20:30.393464 7fedde9f5700 [DEBUG] [JOB 2] Delete /dev/shm/old_logging//MANIFEST-000001 type=3 #1 -- OK
      2017/03/14-14:20:30.393472 7fedde9f5700 [DEBUG] [JOB 2] Delete /dev/shm/old_logging//000003.log type=0 #3 -- OK
      2017/03/14-14:20:31.427103 7fedd49f1700 [default] New memtable created with log file: #9. Immutable memtables: 0.
      2017/03/14-14:20:31.427179 7fedde9f5700 [JOB 3] Syncing log #6
      2017/03/14-14:20:31.427190 7fedde9f5700 (Original Log Time 2017/03/14-14:20:31.427170) Calling FlushMemTableToOutputFile with column family [default], flush slots available 1, compaction slots allowed 1, compaction slots scheduled 1
      2017/03/14-14:20:31.
      Closes https://github.com/facebook/rocksdb/pull/1990
      
      Differential Revision: D4708695
      
      Pulled By: IslamAbdelRahman
      
      fbshipit-source-id: cb8968f
      e1916368
  20. 09 3月, 2017 1 次提交
  21. 06 1月, 2017 1 次提交
    • A
      Maintain position in range deletions map · b104b878
      Andrew Kryczka 提交于
      Summary:
      When deletion-collapsing mode is enabled (i.e., for DBIter/CompactionIterator), we maintain position in the tombstone maps across calls to ShouldDelete(). Since iterators often access keys sequentially (or reverse-sequentially), scanning forward/backward from the last position can be faster than binary-searching the map for every key.
      
      - When Next() is invoked on an iterator, we use kForwardTraversal to scan forwards, if needed, until arriving at the range deletion containing the next key.
      - Similarly for Prev(), we use kBackwardTraversal to scan backwards in the range deletion map.
      - When the iterator seeks, we use kBinarySearch for repositioning
      - After tombstones are added or before the first ShouldDelete() invocation, the current position is set to invalid, which forces kBinarySearch to be used.
      - Non-iterator users (i.e., Get()) use kFullScan, which has the same behavior as before---scan the whole map for every key passed to ShouldDelete().
      Closes https://github.com/facebook/rocksdb/pull/1701
      
      Differential Revision: D4350318
      
      Pulled By: ajkr
      
      fbshipit-source-id: 5129b76
      b104b878
  22. 20 12月, 2016 1 次提交
    • A
      Collapse range deletions · 50e305de
      Andrew Kryczka 提交于
      Summary:
      Added a tombstone-collapsing mode to RangeDelAggregator, which eliminates overlap in the TombstoneMap. In this mode, we can check whether a tombstone covers a user key using upper_bound() (i.e., binary search). However, the tradeoff is the overhead to add tombstones is now higher, so at first I've only enabled it for range scans (compaction/flush/user iterators), where we expect a high number of calls to ShouldDelete() for the same tombstones. Point queries like Get() will still use the linear scan approach.
      
      Also in this diff I changed RangeDelAggregator's TombstoneMap to use multimap with user keys instead of map with internal keys. Callers sometimes provided ParsedInternalKey directly, from which it would've required string copying to derive an internal key Slice with which we could search the map.
      Closes https://github.com/facebook/rocksdb/pull/1614
      
      Differential Revision: D4270397
      
      Pulled By: ajkr
      
      fbshipit-source-id: 93092c7
      50e305de
  23. 17 12月, 2016 1 次提交
  24. 30 11月, 2016 1 次提交
  25. 29 11月, 2016 1 次提交
    • M
      Less linear search in DBIter::Seek() when keys are overwritten a lot · 236d4c67
      Mike Kolupaev 提交于
      Summary:
      In one deployment we saw high latencies (presumably from slow iterator operations) and a lot of CPU time reported by perf with this stack:
      
      ```
        rocksdb::MergingIterator::Next
        rocksdb::DBIter::FindNextUserEntryInternal
        rocksdb::DBIter::Seek
      ```
      
      I think what's happening is:
      1. we create a snapshot iterator,
      2. we do lots of Put()s for the same key x; this creates lots of entries in memtable,
      3. we seek the iterator to a key slightly smaller than x,
      4. the seek walks over lots of entries in memtable for key x, skipping them because of high sequence numbers.
      
      CC IslamAbdelRahman
      Closes https://github.com/facebook/rocksdb/pull/1413
      
      Differential Revision: D4083879
      
      Pulled By: IslamAbdelRahman
      
      fbshipit-source-id: a83ddae
      236d4c67
  26. 22 11月, 2016 1 次提交
    • A
      Range deletion microoptimizations · fd43ee09
      Andrew Kryczka 提交于
      Summary:
      - Made RangeDelAggregator's InternalKeyComparator member a reference-to-const so we don't need to copy-construct it. Also added InternalKeyComparator to ImmutableCFOptions so we don't need to construct one for each DBIter.
      - Made MemTable::NewRangeTombstoneIterator and the table readers' NewRangeTombstoneIterator() functions return nullptr instead of NewEmptyInternalIterator to avoid the allocation. Updated callers accordingly.
      Closes https://github.com/facebook/rocksdb/pull/1548
      
      Differential Revision: D4208169
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2fd65cf
      fd43ee09
  27. 19 11月, 2016 1 次提交
    • A
      Lazily initialize RangeDelAggregator's map and pinning manager · 3f622152
      Andrew Kryczka 提交于
      Summary:
      Since a RangeDelAggregator is created for each read request, these heap-allocating member variables were consuming significant CPU (~3% total) which slowed down request throughput. The map and pinning manager are only necessary when range deletions exist, so we can defer their initialization until the first range deletion is encountered. Currently lazy initialization is done for reads only since reads pass us a single snapshot, which is easier to store on the stack for later insertion into the map than the vector passed to us by flush or compaction.
      
      Note the Arena member variable is still expensive, I will figure out what to do with it in a subsequent diff. It cannot be lazily initialized because we currently use this arena even to allocate empty iterators, which is necessary even when no range deletions exist.
      Closes https://github.com/facebook/rocksdb/pull/1539
      
      Differential Revision: D4203488
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3b36279
      3f622152
  28. 05 11月, 2016 1 次提交
    • A
      DeleteRange user iterator support · 9e7cf346
      Andrew Kryczka 提交于
      Summary:
      Note: reviewed in  https://reviews.facebook.net/D65115
      
      - DBIter maintains a range tombstone accumulator. We don't cleanup obsolete tombstones yet, so if the user seeks back and forth, the same tombstones would be added to the accumulator multiple times.
      - DBImpl::NewInternalIterator() (used to make DBIter's underlying iterator) adds memtable/L0 range tombstones, L1+ range tombstones are added on-demand during NewSecondaryIterator() (see D62205)
      - DBIter uses ShouldDelete() when advancing to check whether keys are covered by range tombstones
      Closes https://github.com/facebook/rocksdb/pull/1464
      
      Differential Revision: D4131753
      
      Pulled By: ajkr
      
      fbshipit-source-id: be86559
      9e7cf346
  29. 19 10月, 2016 1 次提交
    • A
      fix db_stress assertion failure · 5e0d6b4c
      Aaron Gao 提交于
      Summary: in rocksdb::DBIter::FindValueForCurrentKey(), last_not_merge_type could also be SingleDelete() which is omitted
      
      Test Plan: db_iter_test
      
      Reviewers: yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D65187
      5e0d6b4c
  30. 14 10月, 2016 1 次提交
    • A
      fix assertion failure in Prev() · 21e8dace
      Aaron Gao 提交于
      Summary:
      fix assertion failure in db_stress.
      It happens because of prefix seek key is larger than merge iterator key when they have the same user key
      
      Test Plan: ./db_stress --max_background_compactions=1 --max_write_buffer_number=3 --sync=0 --reopen=20 --write_buffer_size=33554432 --delpercent=5 --log2_keys_per_lock=10 --block_size=16384 --allow_concurrent_memtable_write=0 --test_batches_snapshots=0 --max_bytes_for_level_base=67108864 --progress_reports=0 --mmap_read=0 --writepercent=35 --disable_data_sync=0 --readpercent=50 --subcompactions=4 --ops_per_thread=20000000 --memtablerep=skip_list --prefix_size=0 --target_file_size_multiplier=1 --column_families=1 --threads=32 --disable_wal=0 --open_files=500000 --destroy_db_initially=0 --target_file_size_base=16777216 --nooverwritepercent=1 --iterpercent=10 --max_key=100000000 --prefixpercent=0 --use_clock_cache=false --kill_random_test=888887 --cache_size=1048576 --verify_checksum=1
      
      Reviewers: sdong, andrewkr, yiwu, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D65025
      21e8dace
  31. 12 10月, 2016 1 次提交
    • A
      new Prev() prefix support using SeekForPrev() · 447f1712
      Aaron Gao 提交于
      Summary:
      1) The previous solution for Prev() prefix support is not clean.
      Since I add api SeekForPrev(), now the Prev() can be symmetric to Next().
      and we do not need SeekToLast() to be called in Prev() any more.
      
      Also, Next() will Seek(prefix_seek_key_) to solve the problem of possible inconsistency between db_iter and merge_iter when
      there is merge_operator. And prefix_seek_key is only refreshed when change direction to forward.
      
      2) This diff also solves the bug of Iterator::SeekToLast() with iterate_upper_bound_ with prefix extractor.
      
      add test cases for the above two cases.
      
      There are some tests for the SeekToLast() in Prev(), I will clean them later.
      
      Test Plan: make all check
      
      Reviewers: IslamAbdelRahman, andrewkr, yiwu, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D63933
      447f1712
  32. 30 9月, 2016 1 次提交
  33. 28 9月, 2016 1 次提交
    • A
      Add SeekForPrev() to Iterator · f517d9dd
      Aaron Gao 提交于
      Summary:
      Add new Iterator API, `SeekForPrev`: find the last key that <= target key
      support prefix_extractor
      support prefix_same_as_start
      support upper_bound
      not supported in iterators without Prev()
      
      Also add tests in db_iter_test and db_iterator_test
      
      Pass all tests
      Cheers!
      
      Test Plan: make all check -j64
      
      Reviewers: andrewkr, yiwu, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D64149
      f517d9dd
  34. 09 9月, 2016 1 次提交
  35. 30 8月, 2016 1 次提交
    • A
      support Prev() in prefix seek mode · 2482d5fb
      Aaron Gao 提交于
      Summary: As title, make sure Prev() works as expected with Next() when the current iter->key() in the range of the same prefix in prefix seek mode
      
      Test Plan: make all check -j64 (add prefix_test with PrefixSeekModePrev test case)
      
      Reviewers: andrewkr, sdong, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: yoshinorim, andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D61419
      2482d5fb
  36. 12 8月, 2016 1 次提交
    • I
      Minor PinnedIteratorsManager Refactoring · b693ba68
      Islam AbdelRahman 提交于
      Summary:
      This diff include these simple change
      - Rename ReleasePinnedIterators to ReleasePinnedData
      - Rename PinIteratorIfNeeded to PinIterator
      - Use std::vector directly in PinnedIteratorsManager instead of std::unique_ptr<std::vector>
      - Generalize PinnedIteratorsManager by adding PinPtr which can pin any pointer
      
      Test Plan: existing tests
      
      Reviewers: sdong, yiwu, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D61305
      b693ba68
  37. 21 7月, 2016 1 次提交
    • I
      Introduce FullMergeV2 (eliminate memcpy from merge operators) · 68a8e6b8
      Islam AbdelRahman 提交于
      Summary:
      This diff update the code to pin the merge operator operands while the merge operation is done, so that we can eliminate the memcpy cost, to do that we need a new public API for FullMerge that replace the std::deque<std::string> with std::vector<Slice>
      
      This diff is stacked on top of D56493 and D56511
      
      In this diff we
      - Update FullMergeV2 arguments to be encapsulated in MergeOperationInput and MergeOperationOutput which will make it easier to add new arguments in the future
      - Replace std::deque<std::string> with std::vector<Slice> to pass operands
      - Replace MergeContext std::deque with std::vector (based on a simple benchmark I ran https://gist.github.com/IslamAbdelRahman/78fc86c9ab9f52b1df791e58943fb187)
      - Allow FullMergeV2 output to be an existing operand
      
      ```
      [Everything in Memtable | 10K operands | 10 KB each | 1 operand per key]
      
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=10000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000
      
      [FullMergeV2]
      readseq      :       0.607 micros/op 1648235 ops/sec; 16121.2 MB/s
      readseq      :       0.478 micros/op 2091546 ops/sec; 20457.2 MB/s
      readseq      :       0.252 micros/op 3972081 ops/sec; 38850.5 MB/s
      readseq      :       0.237 micros/op 4218328 ops/sec; 41259.0 MB/s
      readseq      :       0.247 micros/op 4043927 ops/sec; 39553.2 MB/s
      
      [master]
      readseq      :       3.935 micros/op 254140 ops/sec; 2485.7 MB/s
      readseq      :       3.722 micros/op 268657 ops/sec; 2627.7 MB/s
      readseq      :       3.149 micros/op 317605 ops/sec; 3106.5 MB/s
      readseq      :       3.125 micros/op 320024 ops/sec; 3130.1 MB/s
      readseq      :       4.075 micros/op 245374 ops/sec; 2400.0 MB/s
      ```
      
      ```
      [Everything in Memtable | 10K operands | 10 KB each | 10 operand per key]
      
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=1000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000
      
      [FullMergeV2]
      readseq      :       3.472 micros/op 288018 ops/sec; 2817.1 MB/s
      readseq      :       2.304 micros/op 434027 ops/sec; 4245.2 MB/s
      readseq      :       1.163 micros/op 859845 ops/sec; 8410.0 MB/s
      readseq      :       1.192 micros/op 838926 ops/sec; 8205.4 MB/s
      readseq      :       1.250 micros/op 800000 ops/sec; 7824.7 MB/s
      
      [master]
      readseq      :      24.025 micros/op 41623 ops/sec;  407.1 MB/s
      readseq      :      18.489 micros/op 54086 ops/sec;  529.0 MB/s
      readseq      :      18.693 micros/op 53495 ops/sec;  523.2 MB/s
      readseq      :      23.621 micros/op 42335 ops/sec;  414.1 MB/s
      readseq      :      18.775 micros/op 53262 ops/sec;  521.0 MB/s
      
      ```
      
      ```
      [Everything in Block cache | 10K operands | 10 KB each | 1 operand per key]
      
      [FullMergeV2]
      $ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions
      readseq      :      14.741 micros/op 67837 ops/sec;  663.5 MB/s
      readseq      :       1.029 micros/op 971446 ops/sec; 9501.6 MB/s
      readseq      :       0.974 micros/op 1026229 ops/sec; 10037.4 MB/s
      readseq      :       0.965 micros/op 1036080 ops/sec; 10133.8 MB/s
      readseq      :       0.943 micros/op 1060657 ops/sec; 10374.2 MB/s
      
      [master]
      readseq      :      16.735 micros/op 59755 ops/sec;  584.5 MB/s
      readseq      :       3.029 micros/op 330151 ops/sec; 3229.2 MB/s
      readseq      :       3.136 micros/op 318883 ops/sec; 3119.0 MB/s
      readseq      :       3.065 micros/op 326245 ops/sec; 3191.0 MB/s
      readseq      :       3.014 micros/op 331813 ops/sec; 3245.4 MB/s
      ```
      
      ```
      [Everything in Block cache | 10K operands | 10 KB each | 10 operand per key]
      
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10-operands-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions
      
      [FullMergeV2]
      readseq      :      24.325 micros/op 41109 ops/sec;  402.1 MB/s
      readseq      :       1.470 micros/op 680272 ops/sec; 6653.7 MB/s
      readseq      :       1.231 micros/op 812347 ops/sec; 7945.5 MB/s
      readseq      :       1.091 micros/op 916590 ops/sec; 8965.1 MB/s
      readseq      :       1.109 micros/op 901713 ops/sec; 8819.6 MB/s
      
      [master]
      readseq      :      27.257 micros/op 36687 ops/sec;  358.8 MB/s
      readseq      :       4.443 micros/op 225073 ops/sec; 2201.4 MB/s
      readseq      :       5.830 micros/op 171526 ops/sec; 1677.7 MB/s
      readseq      :       4.173 micros/op 239635 ops/sec; 2343.8 MB/s
      readseq      :       4.150 micros/op 240963 ops/sec; 2356.8 MB/s
      ```
      
      Test Plan: COMPILE_WITH_ASAN=1 make check -j64
      
      Reviewers: yhchiang, andrewkr, sdong
      
      Reviewed By: sdong
      
      Subscribers: lovro, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57075
      68a8e6b8
  38. 14 6月, 2016 1 次提交
  39. 04 5月, 2016 1 次提交
    • I
      Fix Iterator::Prev memory pinning bug · ff4b3fb5
      Islam AbdelRahman 提交于
      Summary: We should not use IterKey::SetKey with copy = false except if we are pinning the iterator thru it's life time, otherwise we may release the temporarily pinned blocks and in this case the IterKey will be pointing to freed memory
      
      Test Plan: added a new test
      
      Reviewers: sdong, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57561
      ff4b3fb5
  40. 03 5月, 2016 1 次提交
    • I
      Eliminate memcpy in Iterator::Prev() by pinning blocks for keys spanning multiple blocks · 6e801b0b
      Islam AbdelRahman 提交于
      Summary:
      This diff is stacked on top of this diff https://reviews.facebook.net/D56493
      The current Iterator::Prev() implementation need to copy every value since the underlying Iterator may move after reading the value.
      This can be optimized by making sure that the block containing the value is pinned until the Iterator move. which will improve the throughput by up to 1.5X
      
      master
      ```
      ==> 1000000_Keys_100Byte.txt <==
      readreverse  :       0.449 micros/op 2225887 ops/sec;  246.2 MB/s
      readreverse  :       0.433 micros/op 2311508 ops/sec;  255.7 MB/s
      readreverse  :       0.436 micros/op 2294335 ops/sec;  253.8 MB/s
      readreverse  :       0.471 micros/op 2121295 ops/sec;  234.7 MB/s
      readreverse  :       0.465 micros/op 2152227 ops/sec;  238.1 MB/s
      readreverse  :       0.454 micros/op 2203011 ops/sec;  243.7 MB/s
      readreverse  :       0.451 micros/op 2216095 ops/sec;  245.2 MB/s
      readreverse  :       0.462 micros/op 2162447 ops/sec;  239.2 MB/s
      readreverse  :       0.476 micros/op 2099151 ops/sec;  232.2 MB/s
      readreverse  :       0.472 micros/op 2120710 ops/sec;  234.6 MB/s
      
      avg : 242.34 MB/s
      
      ==> 1000000_Keys_1KB.txt <==
      readreverse  :       1.013 micros/op 986793 ops/sec;  978.7 MB/s
      readreverse  :       0.942 micros/op 1061136 ops/sec; 1052.5 MB/s
      readreverse  :       0.951 micros/op 1051901 ops/sec; 1043.3 MB/s
      readreverse  :       0.932 micros/op 1072894 ops/sec; 1064.1 MB/s
      readreverse  :       1.024 micros/op 976720 ops/sec;  968.7 MB/s
      readreverse  :       0.935 micros/op 1069169 ops/sec; 1060.4 MB/s
      readreverse  :       1.012 micros/op 988132 ops/sec;  980.1 MB/s
      readreverse  :       0.962 micros/op 1039579 ops/sec; 1031.1 MB/s
      readreverse  :       0.991 micros/op 1008924 ops/sec; 1000.7 MB/s
      readreverse  :       1.004 micros/op 996144 ops/sec;  988.0 MB/s
      
      avg : 1016.76 MB/s
      
      ==> 1000000_Keys_10KB.txt <==
      readreverse  :       4.167 micros/op 239952 ops/sec; 2346.9 MB/s
      readreverse  :       4.070 micros/op 245713 ops/sec; 2403.3 MB/s
      readreverse  :       4.572 micros/op 218733 ops/sec; 2139.4 MB/s
      readreverse  :       4.497 micros/op 222388 ops/sec; 2175.2 MB/s
      readreverse  :       4.203 micros/op 237920 ops/sec; 2327.1 MB/s
      readreverse  :       4.206 micros/op 237756 ops/sec; 2325.5 MB/s
      readreverse  :       4.181 micros/op 239149 ops/sec; 2339.1 MB/s
      readreverse  :       4.157 micros/op 240552 ops/sec; 2352.8 MB/s
      readreverse  :       4.187 micros/op 238848 ops/sec; 2336.1 MB/s
      readreverse  :       4.106 micros/op 243575 ops/sec; 2382.4 MB/s
      
      avg : 2312.78 MB/s
      
      ==> 100000_Keys_100KB.txt <==
      readreverse  :      41.281 micros/op 24224 ops/sec; 2366.0 MB/s
      readreverse  :      39.722 micros/op 25175 ops/sec; 2458.9 MB/s
      readreverse  :      40.319 micros/op 24802 ops/sec; 2422.5 MB/s
      readreverse  :      39.762 micros/op 25149 ops/sec; 2456.4 MB/s
      readreverse  :      40.916 micros/op 24440 ops/sec; 2387.1 MB/s
      readreverse  :      41.188 micros/op 24278 ops/sec; 2371.4 MB/s
      readreverse  :      40.061 micros/op 24962 ops/sec; 2438.1 MB/s
      readreverse  :      40.221 micros/op 24862 ops/sec; 2428.4 MB/s
      readreverse  :      40.084 micros/op 24947 ops/sec; 2436.7 MB/s
      readreverse  :      40.655 micros/op 24597 ops/sec; 2402.4 MB/s
      
      avg : 2416.79 MB/s
      
      ==> 10000_Keys_1MB.txt <==
      readreverse  :     298.038 micros/op 3355 ops/sec; 3355.3 MB/s
      readreverse  :     335.001 micros/op 2985 ops/sec; 2985.1 MB/s
      readreverse  :     286.956 micros/op 3484 ops/sec; 3484.9 MB/s
      readreverse  :     329.954 micros/op 3030 ops/sec; 3030.8 MB/s
      readreverse  :     306.428 micros/op 3263 ops/sec; 3263.5 MB/s
      readreverse  :     330.749 micros/op 3023 ops/sec; 3023.5 MB/s
      readreverse  :     328.903 micros/op 3040 ops/sec; 3040.5 MB/s
      readreverse  :     324.853 micros/op 3078 ops/sec; 3078.4 MB/s
      readreverse  :     320.488 micros/op 3120 ops/sec; 3120.3 MB/s
      readreverse  :     320.536 micros/op 3119 ops/sec; 3119.8 MB/s
      
      avg : 3150.21 MB/s
      ```
      
      After memcpy elimination
      ```
      
      ==> 1000000_Keys_100Byte.txt <==
      readreverse  :       0.395 micros/op 2529890 ops/sec;  279.9 MB/s
      readreverse  :       0.368 micros/op 2715922 ops/sec;  300.5 MB/s
      readreverse  :       0.384 micros/op 2603929 ops/sec;  288.1 MB/s
      readreverse  :       0.375 micros/op 2663286 ops/sec;  294.6 MB/s
      readreverse  :       0.357 micros/op 2802180 ops/sec;  310.0 MB/s
      readreverse  :       0.363 micros/op 2757684 ops/sec;  305.1 MB/s
      readreverse  :       0.372 micros/op 2689603 ops/sec;  297.5 MB/s
      readreverse  :       0.379 micros/op 2638599 ops/sec;  291.9 MB/s
      readreverse  :       0.375 micros/op 2663803 ops/sec;  294.7 MB/s
      readreverse  :       0.375 micros/op 2665579 ops/sec;  294.9 MB/s
      
      avg: 295.72 MB/s (1.22 X)
      
      ==> 1000000_Keys_1KB.txt <==
      readreverse  :       0.879 micros/op 1138112 ops/sec; 1128.8 MB/s
      readreverse  :       0.842 micros/op 1187998 ops/sec; 1178.3 MB/s
      readreverse  :       0.837 micros/op 1194915 ops/sec; 1185.1 MB/s
      readreverse  :       0.845 micros/op 1182983 ops/sec; 1173.3 MB/s
      readreverse  :       0.877 micros/op 1140308 ops/sec; 1131.0 MB/s
      readreverse  :       0.849 micros/op 1177581 ops/sec; 1168.0 MB/s
      readreverse  :       0.915 micros/op 1093284 ops/sec; 1084.3 MB/s
      readreverse  :       0.863 micros/op 1159418 ops/sec; 1149.9 MB/s
      readreverse  :       0.895 micros/op 1117670 ops/sec; 1108.5 MB/s
      readreverse  :       0.852 micros/op 1174116 ops/sec; 1164.5 MB/s
      
      avg: 1147.17 MB/s (1.12 X)
      
      ==> 1000000_Keys_10KB.txt <==
      readreverse  :       3.870 micros/op 258386 ops/sec; 2527.2 MB/s
      readreverse  :       3.568 micros/op 280296 ops/sec; 2741.5 MB/s
      readreverse  :       4.005 micros/op 249694 ops/sec; 2442.2 MB/s
      readreverse  :       3.550 micros/op 281719 ops/sec; 2755.5 MB/s
      readreverse  :       3.562 micros/op 280758 ops/sec; 2746.1 MB/s
      readreverse  :       3.507 micros/op 285125 ops/sec; 2788.8 MB/s
      readreverse  :       3.463 micros/op 288739 ops/sec; 2824.1 MB/s
      readreverse  :       3.428 micros/op 291734 ops/sec; 2853.4 MB/s
      readreverse  :       3.553 micros/op 281491 ops/sec; 2753.2 MB/s
      readreverse  :       3.535 micros/op 282885 ops/sec; 2766.9 MB/s
      
      avg : 2719.89 MB/s (1.17 X)
      
      ==> 100000_Keys_100KB.txt <==
      readreverse  :      22.815 micros/op 43830 ops/sec; 4281.0 MB/s
      readreverse  :      29.957 micros/op 33381 ops/sec; 3260.4 MB/s
      readreverse  :      25.334 micros/op 39473 ops/sec; 3855.4 MB/s
      readreverse  :      23.037 micros/op 43409 ops/sec; 4239.8 MB/s
      readreverse  :      27.810 micros/op 35958 ops/sec; 3512.1 MB/s
      readreverse  :      30.327 micros/op 32973 ops/sec; 3220.6 MB/s
      readreverse  :      29.704 micros/op 33665 ops/sec; 3288.2 MB/s
      readreverse  :      29.423 micros/op 33987 ops/sec; 3319.6 MB/s
      readreverse  :      23.334 micros/op 42856 ops/sec; 4185.9 MB/s
      readreverse  :      29.969 micros/op 33368 ops/sec; 3259.1 MB/s
      
      avg : 3642.21 MB/s (1.5 X)
      
      ==> 10000_Keys_1MB.txt <==
      readreverse  :     244.748 micros/op 4085 ops/sec; 4085.9 MB/s
      readreverse  :     230.208 micros/op 4343 ops/sec; 4344.0 MB/s
      readreverse  :     235.655 micros/op 4243 ops/sec; 4243.6 MB/s
      readreverse  :     235.730 micros/op 4242 ops/sec; 4242.2 MB/s
      readreverse  :     237.346 micros/op 4213 ops/sec; 4213.3 MB/s
      readreverse  :     227.306 micros/op 4399 ops/sec; 4399.4 MB/s
      readreverse  :     194.957 micros/op 5129 ops/sec; 5129.4 MB/s
      readreverse  :     238.359 micros/op 4195 ops/sec; 4195.4 MB/s
      readreverse  :     221.588 micros/op 4512 ops/sec; 4513.0 MB/s
      readreverse  :     235.911 micros/op 4238 ops/sec; 4239.0 MB/s
      
      avg : 4360.52 MB/s (1.38 X)
      ```
      
      Test Plan: COMPILE_WITH_ASAN=1 make check -j64
      
      Reviewers: andrewkr, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56511
      6e801b0b