1. 18 2月, 2021 1 次提交
  2. 05 12月, 2020 1 次提交
    • L
      Add blob support to DBIter (#7731) · 61932cdf
      Levi Tamasi 提交于
      Summary:
      The patch adds iterator support to the integrated BlobDB implementation.
      Whenever a blob reference is encountered during iteration, the corresponding
      blob is retrieved by calling `Version::GetBlob`, assuming the `expose_blob_index`
      (formerly `allow_blob`) flag is *not* set. (Note: the flag is set by the old stacked
      BlobDB implementation, which has its own blob file handling/blob retrieval logic.)
      
      In addition, `DBIter` now uniformly returns `Status::NotSupported` with the error
      message `"BlobDB does not support merge operator."` when encountering a
      blob reference while performing a merge (instead of potentially returning a
      message that implies the database should be opened using the stacked BlobDB's
      `Open`.)
      
      TODO: We can implement support for lazily retrieving the blob value (or in other
      words, bypassing the retrieval of blob values based on key) by extending the `Iterator`
      API with a new `PrepareValue` method (similarly to `InternalIterator`, which already
      supports lazy values).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7731
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D25256293
      
      Pulled By: ltamasi
      
      fbshipit-source-id: c39cd782011495a526cdff99c16f5fca400c4811
      61932cdf
  3. 18 9月, 2020 1 次提交
  4. 07 8月, 2020 1 次提交
  5. 04 8月, 2020 1 次提交
    • A
      dedup ReadOptions in iterator hierarchy (#7210) · a4a4a2da
      Andrew Kryczka 提交于
      Summary:
      Previously, a `ReadOptions` object was stored in every `BlockBasedTableIterator`
      and every `LevelIterator`. This redundancy consumes extra memory,
      resulting in the `Arena` making more allocations, and iteration
      observing worse cache performance.
      
      This PR migrates callers of `NewInternalIterator()` and
      `MakeInputIterator()` to provide a `ReadOptions` object guaranteed to
      outlive the returned iterator. When the iterator's lifetime will be managed by the
      user, this lifetime guarantee is achieved by storing the `ReadOptions`
      value in `ArenaWrappedDBIter`. Then, sub-iterators of `NewInternalIterator()` and
      `MakeInputIterator()` can hold a reference-to-const `ReadOptions`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7210
      
      Test Plan:
      - `make check` under ASAN and valgrind
      - benchmark: on a DB with 2 L0 files and 3 L1+ levels, this PR reduced `Arena` allocation 4792 -> 4160 bytes.
      
      Reviewed By: anand1976
      
      Differential Revision: D22861323
      
      Pulled By: ajkr
      
      fbshipit-source-id: 54aebb3e89c872eeab0f5793b4b6e42878d093ce
      a4a4a2da
  6. 03 7月, 2020 1 次提交
  7. 04 6月, 2020 1 次提交
    • Z
      API change: DB::OpenForReadOnly will not write to the file system unless... · 02df00d9
      Zitan Chen 提交于
      API change: DB::OpenForReadOnly will not write to the file system unless create_if_missing is true (#6900)
      
      Summary:
      DB::OpenForReadOnly will not write anything to the file system (i.e., create directories or files for the DB) unless create_if_missing is true.
      
      This change also fixes some subcommands of ldb, which write to the file system even if the purpose is for readonly.
      
      Two tests for this updated behavior of DB::OpenForReadOnly are also added.
      
      Other minor changes:
      1. Updated HISTORY.md to include this API change of DB::OpenForReadOnly;
      2. Updated the help information for the put and batchput subcommands of ldb with the option [--create_if_missing];
      3. Updated the comment of Env::DeleteDir to emphasize that it returns OK only if the directory to be deleted is empty.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6900
      
      Test Plan: passed make check; also manually tested a few ldb subcommands
      
      Reviewed By: pdillinger
      
      Differential Revision: D21822188
      
      Pulled By: gg814
      
      fbshipit-source-id: 604cc0f0d0326a937ee25a32cdc2b512f9a3be6e
      02df00d9
  8. 16 4月, 2020 1 次提交
    • M
      Properly report IO errors when IndexType::kBinarySearchWithFirstKey is used (#6621) · e45673de
      Mike Kolupaev 提交于
      Summary:
      Context: Index type `kBinarySearchWithFirstKey` added the ability for sst file iterator to sometimes report a key from index without reading the corresponding data block. This is useful when sst blocks are cut at some meaningful boundaries (e.g. one block per key prefix), and many seeks land between blocks (e.g. for each prefix, the ranges of keys in different sst files are nearly disjoint, so a typical seek needs to read a data block from only one file even if all files have the prefix). But this added a new error condition, which rocksdb code was really not equipped to deal with: `InternalIterator::value()` may fail with an IO error or Status::Incomplete, but it's just a method returning a Slice, with no way to report error instead. Before this PR, this type of error wasn't handled at all (an empty slice was returned), and kBinarySearchWithFirstKey implementation was considered a prototype.
      
      Now that we (LogDevice) have experimented with kBinarySearchWithFirstKey for a while and confirmed that it's really useful, this PR is adding the missing error handling.
      
      It's a pretty inconvenient situation implementation-wise. The error needs to be reported from InternalIterator when trying to access value. But there are ~700 call sites of `InternalIterator::value()`, most of which either can't hit the error condition (because the iterator is reading from memtable or from index or something) or wouldn't benefit from the deferred loading of the value (e.g. compaction iterator that reads all values anyway). Adding error handling to all these call sites would needlessly bloat the code. So instead I made the deferred value loading optional: only the call sites that may use deferred loading have to call the new method `PrepareValue()` before calling `value()`. The feature is enabled with a new bool argument `allow_unprepared_value` to a bunch of methods that create iterators (it wouldn't make sense to put it in ReadOptions because it's completely internal to iterators, with virtually no user-visible effect). Lmk if you have better ideas.
      
      Note that the deferred value loading only happens for *internal* iterators. The user-visible iterator (DBIter) always prepares the value before returning from Seek/Next/etc. We could go further and add an API to defer that value loading too, but that's most likely not useful for LogDevice, so it doesn't seem worth the complexity for now.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6621
      
      Test Plan: make -j5 check . Will also deploy to some logdevice test clusters and look at stats.
      
      Reviewed By: siying
      
      Differential Revision: D20786930
      
      Pulled By: al13n321
      
      fbshipit-source-id: 6da77d918bad3780522e918f17f4d5513d3e99ee
      e45673de
  9. 03 3月, 2020 1 次提交
    • H
      return timestamp from get (#6409) · 904a60ff
      Huisheng Liu 提交于
      Summary:
      Added new Get() methods that return timestamp. Dummy implementation is given so that classes derived from DB don't need to be touched to provide their implementation. MultiGet is not included.
      
      ReadRandom perf test (10 minutes) on the same development machine ram drive with the same DB data shows no regression (within marge of error). The test is adapted from https://github.com/facebook/rocksdb/wiki/RocksDB-In-Memory-Workload-Performance-Benchmarks.
          base line (commit 72ee067b):
              101.712 micros/op 314602 ops/sec;   36.0 MB/s (5658999 of 5658999 found)
          This PR:
              100.288 micros/op 319071 ops/sec;   36.5 MB/s (5674999 of 5674999 found)
      
      ./db_bench --db=r:\rocksdb.github --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --cache_size=2147483648 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=r:\rocksdb.github\WAL_LOG --sync=0 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --duration=600 --benchmarks=readrandom --use_existing_db=1 --num=25000000 --threads=32
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6409
      
      Differential Revision: D20200086
      
      Pulled By: riversand963
      
      fbshipit-source-id: 490edd74d924f62bd8ae9c29c2a6bbbb8410ca50
      904a60ff
  10. 21 2月, 2020 1 次提交
    • S
      Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) · fdf882de
      sdong 提交于
      Summary:
      When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
      
      Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
      
      Differential Revision: D19977691
      
      fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
      fdf882de
  11. 20 9月, 2019 1 次提交
  12. 14 9月, 2019 1 次提交
  13. 01 6月, 2019 2 次提交
  14. 01 3月, 2019 1 次提交
    • S
      Add two more StatsLevel (#5027) · 5e298f86
      Siying Dong 提交于
      Summary:
      Statistics cost too much CPU for some use cases. Add two stats levels
      so that people can choose to skip two types of expensive stats, timers and
      histograms.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5027
      
      Differential Revision: D14252765
      
      Pulled By: siying
      
      fbshipit-source-id: 75ecec9eaa44c06118229df4f80c366115346592
      5e298f86
  15. 18 12月, 2018 1 次提交
  16. 29 11月, 2018 1 次提交
    • A
      Clean up FragmentedRangeTombstoneList (#4692) · 8fe1e06c
      Abhishek Madan 提交于
      Summary:
      Removed `one_time_use` flag, which removed the need for some
      tests, and changed all `NewRangeTombstoneIterator` methods to return
      `FragmentedRangeTombstoneIterators`.
      
      These changes also led to removing `RangeDelAggregatorV2::AddUnfragmentedTombstones`
      and one of the `MemTableListVersion::AddRangeTombstoneIterators` methods.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4692
      
      Differential Revision: D13106570
      
      Pulled By: abhimadan
      
      fbshipit-source-id: cbab5432d7fc2d9cdfd8d9d40361a1bffaa8f845
      8fe1e06c
  17. 25 10月, 2018 1 次提交
    • A
      Use only "local" range tombstones during Get (#4449) · 8c78348c
      Abhishek Madan 提交于
      Summary:
      Previously, range tombstones were accumulated from every level, which
      was necessary if a range tombstone in a higher level covered a key in a lower
      level. However, RangeDelAggregator::AddTombstones's complexity is based on
      the number of tombstones that are currently stored in it, which is wasteful in
      the Get case, where we only need to know the highest sequence number of range
      tombstones that cover the key from higher levels, and compute the highest covering
      sequence number at the current level. This change introduces this optimization, and
      removes the use of RangeDelAggregator from the Get path.
      
      In the benchmark results, the following command was used to initialize the database:
      ```
      ./db_bench -db=/dev/shm/5k-rts -use_existing_db=false -benchmarks=filluniquerandom -write_buffer_size=1048576 -compression_type=lz4 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -value_size=112 -key_size=16 -block_size=4096 -level_compaction_dynamic_level_bytes=true -num=5000000 -max_background_jobs=12 -benchmark_write_rate_limit=20971520 -range_tombstone_width=100 -writes_per_range_tombstone=100 -max_num_range_tombstones=50000 -bloom_bits=8
      ```
      
      ...and the following command was used to measure read throughput:
      ```
      ./db_bench -db=/dev/shm/5k-rts/ -use_existing_db=true -benchmarks=readrandom -disable_auto_compactions=true -num=5000000 -reads=100000 -threads=32
      ```
      
      The filluniquerandom command was only run once, and the resulting database was used
      to measure read performance before and after the PR. Both binaries were compiled with
      `DEBUG_LEVEL=0`.
      
      Readrandom results before PR:
      ```
      readrandom   :       4.544 micros/op 220090 ops/sec;   16.9 MB/s (63103 of 100000 found)
      ```
      
      Readrandom results after PR:
      ```
      readrandom   :      11.147 micros/op 89707 ops/sec;    6.9 MB/s (63103 of 100000 found)
      ```
      
      So it's actually slower right now, but this PR paves the way for future optimizations (see #4493).
      
      ----
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4449
      
      Differential Revision: D10370575
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 9a2e152be1ef36969055c0e9eb4beb0d96c11f4d
      8c78348c
  18. 23 8月, 2018 1 次提交
    • Z
      add missing counters in readonly mode (#4260) · f1f5ba08
      Zhongyi Xie 提交于
      Summary:
      User reported (https://github.com/facebook/rocksdb/issues/4168) that when opening RocksDB in read-only mode, some statistics are not correctly reported. After some investigation, we believe the following counters are indeed not reported during Get() call in a read-only DB:
      rocksdb.memtable.hit
      rocksdb.memtable.miss
      rocksdb.number.keys.read
      rocksdb.bytes.read
      As well as histogram rocksdb.bytes.per.read
      and perf context get_read_bytes
      This PR will add the necessary counter reporting logic in the Get() call path
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4260
      
      Differential Revision: D9476431
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 7ab409d4e59df05d09ae8b69fe75554e5aa240d6
      f1f5ba08
  19. 01 8月, 2018 1 次提交
    • S
      Trace and Replay for RocksDB (#3837) · 12b6cdee
      Sagar Vemuri 提交于
      Summary:
      A framework for tracing and replaying RocksDB operations.
      
      A binary trace file is created by capturing the DB operations, and it can be replayed back at the same rate using db_bench.
      
      - Column-families are supported
      - Multi-threaded tracing is supported.
      - TraceReader and TraceWriter are exposed to the user, so that tracing to various destinations can be enabled (say, to other messaging/logging services). By default, a FileTraceReader and FileTraceWriter are implemented to capture to a file and replay from it.
      - This is not yet ideal to be enabled in production due to large performance overhead, but it can be safely tried out in a shadow setup, say, for analyzing RocksDB operations.
      
      Currently supported DB operations:
      - Writes:
      -- Put
      -- Merge
      -- Delete
      -- SingleDelete
      -- DeleteRange
      -- Write
      - Reads:
      -- Get (point lookups)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3837
      
      Differential Revision: D7974837
      
      Pulled By: sagar0
      
      fbshipit-source-id: 8ec65aaf336504bc1f6ed0feae67f6ed5ef97a72
      12b6cdee
  20. 22 5月, 2018 1 次提交
    • Z
      Move prefix_extractor to MutableCFOptions · c3ebc758
      Zhongyi Xie 提交于
      Summary:
      Currently it is not possible to change bloom filter config without restart the db, which is causing a lot of operational complexity for users.
      This PR aims to make it possible to dynamically change bloom filter config.
      Closes https://github.com/facebook/rocksdb/pull/3601
      
      Differential Revision: D7253114
      
      Pulled By: miasantreble
      
      fbshipit-source-id: f22595437d3e0b86c95918c484502de2ceca120c
      c3ebc758
  21. 13 4月, 2018 1 次提交
  22. 06 3月, 2018 1 次提交
  23. 23 2月, 2018 2 次提交
  24. 10 10月, 2017 1 次提交
    • Y
      WritePrepared Txn: Iterator · 8c392a31
      Yi Wu 提交于
      Summary:
      On iterator create, take a snapshot, create a ReadCallback and pass the ReadCallback to the underlying DBIter to check if key is committed.
      Closes https://github.com/facebook/rocksdb/pull/2981
      
      Differential Revision: D6001471
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 3565c4cdaf25370ba47008b0e0cb65b31dfe79fe
      8c392a31
  25. 06 10月, 2017 1 次提交
  26. 25 7月, 2017 1 次提交
    • S
      Add Iterator::Refresh() · e67b35c0
      Siying Dong 提交于
      Summary:
      Add and implement Iterator::Refresh(). When this function is called, if the super version doesn't change, update the sequence number of the iterator to the latest one and invalidate the iterator. If the super version changed, recreated the whole iterator. This can help users reuse the iterator more easily.
      Closes https://github.com/facebook/rocksdb/pull/2621
      
      Differential Revision: D5464500
      
      Pulled By: siying
      
      fbshipit-source-id: f548bd35e85c1efca2ea69273802f6704eba6ba9
      e67b35c0
  27. 22 7月, 2017 2 次提交
  28. 16 7月, 2017 1 次提交
  29. 25 6月, 2017 1 次提交
    • M
      Optimize for serial commits in 2PC · 499ebb3a
      Maysam Yabandeh 提交于
      Summary:
      Throughput: 46k tps in our sysbench settings (filling the details later)
      
      The idea is to have the simplest change that gives us a reasonable boost
      in 2PC throughput.
      
      Major design changes:
      1. The WAL file internal buffer is not flushed after each write. Instead
      it is flushed before critical operations (WAL copy via fs) or when
      FlushWAL is called by MySQL. Flushing the WAL buffer is also protected
      via mutex_.
      2. Use two sequence numbers: last seq, and last seq for write. Last seq
      is the last visible sequence number for reads. Last seq for write is the
      next sequence number that should be used to write to WAL/memtable. This
      allows to have a memtable write be in parallel to WAL writes.
      3. BatchGroup is not used for writes. This means that we can have
      parallel writers which changes a major assumption in the code base. To
      accommodate for that i) allow only 1 WriteImpl that intends to write to
      memtable via mem_mutex_--which is fine since in 2PC almost all of the memtable writes
      come via group commit phase which is serial anyway, ii) make all the
      parts in the code base that assumed to be the only writer (via
      EnterUnbatched) to also acquire mem_mutex_, iii) stat updates are
      protected via a stat_mutex_.
      
      Note: the first commit has the approach figured out but is not clean.
      Submitting the PR anyway to get the early feedback on the approach. If
      we are ok with the approach I will go ahead with this updates:
      0) Rebase with Yi's pipelining changes
      1) Currently batching is disabled by default to make sure that it will be
      consistent with all unit tests. Will make this optional via a config.
      2) A couple of unit tests are disabled. They need to be updated with the
      serial commit of 2PC taken into account.
      3) Replacing BatchGroup with mem_mutex_ got a bit ugly as it requires
      releasing mutex_ beforehand (the same way EnterUnbatched does). This
      needs to be cleaned up.
      Closes https://github.com/facebook/rocksdb/pull/2345
      
      Differential Revision: D5210732
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 78653bd95a35cd1e831e555e0e57bdfd695355a4
      499ebb3a
  30. 28 4月, 2017 1 次提交
  31. 11 4月, 2017 1 次提交
    • S
      Reduce the number of params needed to construct DBIter · 7124268a
      Sagar Vemuri 提交于
      Summary:
      DBIter, and in-turn NewDBIterator and NewArenaWrappedDBIterator, take a  bunch of params. They can be reduced by passing in ReadOptions directly instead of passing in every new param separately. It also seems much cleaner as a bunch of the params towards the end seem to be optional.
      
      (Recently I introduced max_skippable_internal_keys, which added one more to the already huge count).
      
      Idea courtesy IslamAbdelRahman
      Closes https://github.com/facebook/rocksdb/pull/2116
      
      Differential Revision: D4857128
      
      Pulled By: sagar0
      
      fbshipit-source-id: 7d239df094b94bd9ea79d145cdf825478ac037a8
      7124268a
  32. 06 4月, 2017 1 次提交
  33. 31 3月, 2017 1 次提交
    • S
      Option to fail a request as incomplete when skipping too many internal keys · c6d04f2e
      Sagar Vemuri 提交于
      Summary:
      Operations like Seek/Next/Prev sometimes take too long to complete when there are many internal keys to be skipped. Adding an option, max_skippable_internal_keys -- which could be used to set a threshold for the maximum number of keys that can be skipped, will help to address these cases where it is much better to fail a request (as incomplete) than to wait for a considerable time for the request to complete.
      
      This feature -- to fail an iterator seek request as incomplete, is disabled by default when max_skippable_internal_keys = 0. It is enabled only when max_skippable_internal_keys > 0.
      
      This feature is based on the discussion mentioned in the PR https://github.com/facebook/rocksdb/pull/1084.
      Closes https://github.com/facebook/rocksdb/pull/2000
      
      Differential Revision: D4753223
      
      Pulled By: sagar0
      
      fbshipit-source-id: 1c973f7
      c6d04f2e
  34. 16 3月, 2017 1 次提交
    • I
      Add macros to include file name and line number during Logging · e1916368
      Islam AbdelRahman 提交于
      Summary:
      current logging
      ```
      2017/03/14-14:20:30.393432 7fedde9f5700 (Original Log Time 2017/03/14-14:20:30.393414) [default] Level summary: base level 1 max bytes base 268435456 files[1 0 0 0 0 0 0] max score 0.25
      2017/03/14-14:20:30.393438 7fedde9f5700 [JOB 2] Try to delete WAL files size 61417909, prev total WAL file size 73820858, number of live WAL files 2.
      2017/03/14-14:20:30.393464 7fedde9f5700 [DEBUG] [JOB 2] Delete /dev/shm/old_logging//MANIFEST-000001 type=3 #1 -- OK
      2017/03/14-14:20:30.393472 7fedde9f5700 [DEBUG] [JOB 2] Delete /dev/shm/old_logging//000003.log type=0 #3 -- OK
      2017/03/14-14:20:31.427103 7fedd49f1700 [default] New memtable created with log file: #9. Immutable memtables: 0.
      2017/03/14-14:20:31.427179 7fedde9f5700 [JOB 3] Syncing log #6
      2017/03/14-14:20:31.427190 7fedde9f5700 (Original Log Time 2017/03/14-14:20:31.427170) Calling FlushMemTableToOutputFile with column family [default], flush slots available 1, compaction slots allowed 1, compaction slots scheduled 1
      2017/03/14-14:20:31.
      Closes https://github.com/facebook/rocksdb/pull/1990
      
      Differential Revision: D4708695
      
      Pulled By: IslamAbdelRahman
      
      fbshipit-source-id: cb8968f
      e1916368
  35. 14 3月, 2017 1 次提交
    • M
      Pinnableslice (2nd attempt) · 11526252
      Maysam Yabandeh 提交于
      Summary:
      PinnableSlice
      
          Summary:
          Currently the point lookup values are copied to a string provided by the
          user. This incures an extra memcpy cost. This patch allows doing point lookup
          via a PinnableSlice which pins the source memory location (instead of
          copying their content) and releases them after the content is consumed
          by the user. The old API of Get(string) is translated to the new API
          underneath.
      
          Here is the summary for improvements:
      
          value 100 byte: 1.8% regular, 1.2% merge values
          value 1k byte: 11.5% regular, 7.5% merge values
          value 10k byte: 26% regular, 29.9% merge values
          The improvement for merge could be more if we extend this approach to
          pin the merge output and delay the full merge operation until the user
          actually needs it. We have put that for future work.
      
          PS:
          Sometimes we observe a small decrease in performance when switching from
          t5452014 to this patch but with the old Get(string) API. The d
      Closes https://github.com/facebook/rocksdb/pull/1756
      
      Differential Revision: D4391738
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 6f3edd3
      11526252
  36. 09 1月, 2017 2 次提交
    • M
      Revert "PinnableSlice" · d0ba8ec8
      Maysam Yabandeh 提交于
      Summary:
      This reverts commit 54d94e9c.
      
      The pull request was landed by mistake.
      Closes https://github.com/facebook/rocksdb/pull/1755
      
      Differential Revision: D4391678
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 36d5149
      d0ba8ec8
    • M
      PinnableSlice · 54d94e9c
      Maysam Yabandeh 提交于
      Summary:
      Currently the point lookup values are copied to a string provided by the user.
      This incures an extra memcpy cost. This patch allows doing point lookup
      via a PinnableSlice which pins the source memory location (instead of
      copying their content) and releases them after the content is consumed
      by the user. The old API of Get(string) is translated to the new API
      underneath.
      
       Here is the summary for improvements:
       1. value 100 byte: 1.8%  regular, 1.2% merge values
       2. value 1k   byte: 11.5% regular, 7.5% merge values
       3. value 10k byte: 26% regular,    29.9% merge values
      
       The improvement for merge could be more if we extend this approach to
       pin the merge output and delay the full merge operation until the user
       actually needs it. We have put that for future work.
      
      PS:
      Sometimes we observe a small decrease in performance when switching from
      t5452014 to this patch but with the old Get(string) API. The difference
      is a little and could be noise. More importantly it is safely
      cancelled
      Closes https://github.com/facebook/rocksdb/pull/1732
      
      Differential Revision: D4374613
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: a077f1a
      54d94e9c