1. 16 3月, 2023 1 次提交
    • H
      Add new stat rocksdb.table.open.prefetch.tail.read.bytes,... · bab5f9a6
      Hui Xiao 提交于
      Add new stat rocksdb.table.open.prefetch.tail.read.bytes, rocksdb.table.open.prefetch.tail.{miss|hit} (#11265)
      
      Summary:
      **Context/Summary:**
      We are adding new stats to measure behavior of prefetched tail size and look up into this buffer
      
      The stat collection is done in FilePrefetchBuffer but only for prefetched tail buffer during table open for now using FilePrefetchBuffer enum. It's cleaner than the alternative of implementing in upper-level call places of FilePrefetchBuffer for table open. It also has the benefit of extensible to other types of FilePrefetchBuffer if needed. See db bench for perf regression concern.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11265
      
      Test Plan:
      **- Piggyback on existing test**
      **- rocksdb.table.open.prefetch.tail.miss is harder to UT so I manually set prefetch tail read bytes to be small and run db bench.**
      ```
      ./db_bench -db=/tmp/testdb -statistics=true -benchmarks="fillseq" -key_size=32 -value_size=512 -num=5000 -write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3  -use_direct_reads=true
      ```
      ```
      rocksdb.table.open.prefetch.tail.read.bytes P50 : 4096.000000 P95 : 4096.000000 P99 : 4096.000000 P100 : 4096.000000 COUNT : 225 SUM : 921600
      rocksdb.table.open.prefetch.tail.miss COUNT : 91
      rocksdb.table.open.prefetch.tail.hit COUNT : 1034
      ```
      **- No perf regression observed in db_bench**
      
      SETUP command: create same db with ~900 files for pre-change/post-change.
      ```
      ./db_bench -db=/tmp/testdb -benchmarks="fillseq" -key_size=32 -value_size=512 -num=500000 -write_buffer_size=655360  -disable_auto_compactions=true -target_file_size_base=16777216 -compression_type=none
      ```
      TEST command 60 runs or til convergence: as suggested by anand1976 and akankshamahajan15, vary `seek_nexts` and `async_io` in testing.
      ```
      ./db_bench -use_existing_db=true -db=/tmp/testdb -statistics=false -cache_size=0 -cache_index_and_filter_blocks=false -benchmarks=seekrandom[-X60] -num=50000 -seek_nexts={10, 500, 1000} -async_io={0|1} -use_direct_reads=true
      ```
      async io = 0, direct io read = true
      
        | seek_nexts = 10, 30 runs | seek_nexts = 500, 12 runs | seek_nexts = 1000, 6 runs
      -- | -- | -- | --
      pre-post change | 4776 (± 28) ops/sec;   24.8 (± 0.1) MB/sec | 288 (± 1) ops/sec;   74.8 (± 0.4) MB/sec | 145 (± 4) ops/sec;   75.6 (± 2.2) MB/sec
      post-change | 4790 (± 32) ops/sec;   24.9 (± 0.2) MB/sec | 288 (± 3) ops/sec;   74.7 (± 0.8) MB/sec | 143 (± 3) ops/sec;   74.5 (± 1.6) MB/sec
      
      async io = 1, direct io read = true
        | seek_nexts = 10, 54 runs | seek_nexts = 500, 6 runs | seek_nexts = 1000, 4 runs
      -- | -- | -- | --
      pre-post change | 3350 (± 36) ops/sec;   17.4 (± 0.2) MB/sec | 264 (± 0) ops/sec;   68.7 (± 0.2) MB/sec | 138 (± 1) ops/sec;   71.8 (± 1.0) MB/sec
      post-change | 3358 (± 27) ops/sec;   17.4 (± 0.1) MB/sec  | 263 (± 2) ops/sec;   68.3 (± 0.8) MB/sec | 139 (± 1) ops/sec;   72.6 (± 0.6) MB/sec
      
      Reviewed By: ajkr
      
      Differential Revision: D43781467
      
      Pulled By: hx235
      
      fbshipit-source-id: a706a18472a8edb2b952bac3af40eec803537f2a
      bab5f9a6
  2. 01 3月, 2023 1 次提交
  3. 18 2月, 2023 1 次提交
    • M
      Remove FactoryFunc from LoadXXXObject (#11203) · b6640c31
      mrambacher 提交于
      Summary:
      The primary purpose of the FactoryFunc was to support LITE mode where the ObjectRegistry was not available.  With the removal of LITE mode, the function was no longer required.
      
      Note that the MergeOperator had some private classes defined in header files.  To gain access to their constructors (and name methods), the class definitions were moved into header files.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11203
      
      Reviewed By: cbi42
      
      Differential Revision: D43160255
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f3a465fd5d1a7049b73ecf31e4b8c3762f6dae6c
      b6640c31
  4. 28 1月, 2023 1 次提交
    • S
      Remove RocksDB LITE (#11147) · 4720ba43
      sdong 提交于
      Summary:
      We haven't been actively mantaining RocksDB LITE recently and the size must have been gone up significantly. We are removing the support.
      
      Most of changes were done through following comments:
      
      unifdef -m -UROCKSDB_LITE `git grep -l ROCKSDB_LITE | egrep '[.](cc|h)'`
      
      by Peter Dillinger. Others changes were manually applied to build scripts, CircleCI manifests, ROCKSDB_LITE is used in an expression and file db_stress_test_base.cc.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11147
      
      Test Plan: See CI
      
      Reviewed By: pdillinger
      
      Differential Revision: D42796341
      
      fbshipit-source-id: 4920e15fc2060c2cd2221330a6d0e5e65d4b7fe2
      4720ba43
  5. 26 1月, 2023 2 次提交
  6. 25 1月, 2023 1 次提交
    • L
      Remove some deprecated/obsolete statistics from the API (#11123) · 99e55953
      Levi Tamasi 提交于
      Summary:
      These tickers/histograms have been obsolete (and not populated) for a long time.
      The patch removes them from the API completely. Note that this means that the
      numeric values of the remaining tickers change in the C++ code as they get shifted up.
      This should be OK: the values of some existing tickers have changed many times
      over the years as items have been added in the middle. (In contrast, the convention
      in the Java bindings is to keep the ids, which are not guaranteed to be the same
      as the ids on the C++ side, the same across releases.)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11123
      
      Test Plan: `make check`
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D42727793
      
      Pulled By: ltamasi
      
      fbshipit-source-id: e058a155a20b05b45f53e67ee380aece1b43b6c5
      99e55953
  7. 14 11月, 2022 1 次提交
  8. 25 10月, 2022 1 次提交
  9. 30 8月, 2022 1 次提交
  10. 29 6月, 2022 1 次提交
  11. 17 6月, 2022 1 次提交
    • A
      Update stats to help users estimate MultiGet async IO impact (#10182) · a6691d0f
      anand76 提交于
      Summary:
      Add a couple of stats to help users estimate the impact of potential MultiGet perf improvements -
      1. NUM_LEVEL_READ_PER_MULTIGET - A histogram stat for number of levels that required MultiGet to read from a file
      2. MULTIGET_COROUTINE_COUNT - A ticker stat to count the number of times the coroutine version of MultiGetFromSST was used
      
      The NUM_DATA_BLOCKS_READ_PER_LEVEL stat is obsoleted as it doesn't provide useful information for MultiGet optimization.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10182
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D37213296
      
      Pulled By: anand1976
      
      fbshipit-source-id: 5d2b7708017c0e278578ae4bffac3926f6530efb
      a6691d0f
  12. 20 5月, 2022 1 次提交
    • A
      Multi file concurrency in MultiGet using coroutines and async IO (#9968) · 57997dda
      anand76 提交于
      Summary:
      This PR implements a coroutine version of batched MultiGet in order to concurrently read from multiple SST files in a level using async IO, thus reducing the latency of the MultiGet. The API from the user perspective is still synchronous and single threaded, with the RocksDB part of the processing happening in the context of the caller's thread. In Version::MultiGet, the decision is made whether to call synchronous or coroutine code.
      
      A good way to review this PR is to review the first 4 commits in order - de773b3, 70c2f70, 10b50e1, and 377a597 - before reviewing the rest.
      
      TODO:
      1. Figure out how to build it in CircleCI (requires some dependencies to be installed)
      2. Do some stress testing with coroutines enabled
      
      No regression in synchronous MultiGet between this branch and main -
      ```
      ./db_bench -use_existing_db=true --db=/data/mysql/rocksdb/prefix_scan -benchmarks="readseq,multireadrandom" -key_size=32 -value_size=512 -num=5000000 -batch_size=64 -multiread_batched=true -use_direct_reads=false -duration=60 -ops_between_duration_checks=1 -readonly=true -adaptive_readahead=true -threads=16 -cache_size=10485760000 -async_io=false -multiread_stride=40000 -statistics
      ```
      Branch - ```multireadrandom :       4.025 micros/op 3975111 ops/sec 60.001 seconds 238509056 operations; 2062.3 MB/s (14767808 of 14767808 found)```
      
      Main - ```multireadrandom :       3.987 micros/op 4013216 ops/sec 60.001 seconds 240795392 operations; 2082.1 MB/s (15231040 of 15231040 found)```
      
      More benchmarks in various scenarios are given below. The measurements were taken with ```async_io=false``` (no coroutines) and ```async_io=true``` (use coroutines). For an IO bound workload (with every key requiring an IO), the coroutines version shows a clear benefit, being ~2.6X faster. For CPU bound workloads, the coroutines version has ~6-15% higher CPU utilization, depending on how many keys overlap an SST file.
      
      1. Single thread IO bound workload on remote storage with sparse MultiGet batch keys (~1 key overlap/file) -
      No coroutines - ```multireadrandom :     831.774 micros/op 1202 ops/sec 60.001 seconds 72136 operations;    0.6 MB/s (72136 of 72136 found)```
      Using coroutines - ```multireadrandom :     318.742 micros/op 3137 ops/sec 60.003 seconds 188248 operations;    1.6 MB/s (188248 of 188248 found)```
      
      2. Single thread CPU bound workload (all data cached) with ~1 key overlap/file -
      No coroutines - ```multireadrandom :       4.127 micros/op 242322 ops/sec 60.000 seconds 14539384 operations;  125.7 MB/s (14539384 of 14539384 found)```
      Using coroutines - ```multireadrandom :       4.741 micros/op 210935 ops/sec 60.000 seconds 12656176 operations;  109.4 MB/s (12656176 of 12656176 found)```
      
      3. Single thread CPU bound workload with ~2 key overlap/file -
      No coroutines - ```multireadrandom :       3.717 micros/op 269000 ops/sec 60.000 seconds 16140024 operations;  139.6 MB/s (16140024 of 16140024 found)```
      Using coroutines - ```multireadrandom :       4.146 micros/op 241204 ops/sec 60.000 seconds 14472296 operations;  125.1 MB/s (14472296 of 14472296 found)```
      
      4. CPU bound multi-threaded (16 threads) with ~4 key overlap/file -
      No coroutines - ```multireadrandom :       4.534 micros/op 3528792 ops/sec 60.000 seconds 211728728 operations; 1830.7 MB/s (12737024 of 12737024 found) ```
      Using coroutines - ```multireadrandom :       4.872 micros/op 3283812 ops/sec 60.000 seconds 197030096 operations; 1703.6 MB/s (12548032 of 12548032 found) ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9968
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D36348563
      
      Pulled By: anand1976
      
      fbshipit-source-id: c0ce85a505fd26ebfbb09786cbd7f25202038696
      57997dda
  13. 26 4月, 2022 1 次提交
    • A
      Add stats related to async prefetching (#9845) · 3653029d
      Akanksha Mahajan 提交于
      Summary:
      Add stats PREFETCHED_BYTES_DISCARDED and POLL_WAIT_MICROS.
      PREFETCHED_BYTES_DISCARDED records number of prefetched bytes discarded by
      FilePrefetchBuffer. POLL_WAIT_MICROS records the time taken by underling
      file_system Poll API.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9845
      
      Test Plan: Update existing tests
      
      Reviewed By: anand1976
      
      Differential Revision: D35909694
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: e009ef940bb9ed72c9446f5529095caabb8a1e36
      3653029d
  14. 07 4月, 2022 1 次提交
  15. 30 3月, 2022 1 次提交
    • A
      Fb 9718 verify checksums is ignored (#9767) · b6ad0d95
      Alan Paxton 提交于
      Summary:
      Fixes https://github.com/facebook/rocksdb/issues/9718
      
      The verify_checksums flag of read_options should be passed to the read options used by the BlockFetcher in a couple of cases where it is not at present. It will now happen (but did not, previously) on iteration and on [multi]get, where a fetcher is created as part of the iterate/get call.
      
      This may result in much better performance in a few workloads where the client chooses to remove verification.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9767
      
      Reviewed By: mrambacher
      
      Differential Revision: D35218986
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 329d29764bb70fbc7f2673440bc46c107a813bc8
      b6ad0d95
  16. 19 2月, 2022 1 次提交
  17. 11 1月, 2022 1 次提交
    • M
      Restore Regex support for ObjectLibrary::Register, rename new APIs to allow... · 1973fcba
      mrambacher 提交于
      Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362)
      
      Summary:
      In order to support old-style regex function registration, restored the original "Register<T>(string, Factory)" method using regular expressions.  The PatternEntry methods were left in place but renamed to AddFactory.  The goal is to allow for the deprecation of the original regex Registry method in an upcoming release.
      
      Added modes to the PatternEntry kMatchZeroOrMore and kMatchAtLeastOne to match * or +, respectively (kMatchAtLeastOne was the original behavior).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9362
      
      Reviewed By: pdillinger
      
      Differential Revision: D33432562
      
      Pulled By: mrambacher
      
      fbshipit-source-id: ed88ab3f9a2ad0d525c7bd1692873f9bb3209d02
      1973fcba
  18. 17 11月, 2021 1 次提交
  19. 29 9月, 2021 1 次提交
  20. 11 9月, 2021 1 次提交
  21. 08 9月, 2021 2 次提交
  22. 17 8月, 2021 1 次提交
    • A
      Add a stat to count secondary cache hits (#8666) · add68bd2
      anand76 提交于
      Summary:
      Add a stat for secondary cache hits. The ```Cache::Lookup``` API had an unused ```stats``` parameter. This PR uses that to pass the pointer to a ```Statistics``` object that ```LRUCache``` uses to record the stat.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8666
      
      Test Plan: Update a unit test in lru_cache_test
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30353816
      
      Pulled By: anand1976
      
      fbshipit-source-id: 2046f78b460428877a26ffdd2bb914ae47dfbe77
      add68bd2
  23. 18 6月, 2021 1 次提交
    • B
      Added memtable garbage statistics (#8411) · e817bc96
      Baptiste Lemaire 提交于
      Summary:
      **Summary**:
      2 new statistics counters are added to RocksDB: `MEMTABLE_PAYLOAD_BYTES_AT_FLUSH` and `MEMTABLE_GARBAGE_BYTES_AT_FLUSH`. The former tracks how many raw bytes of useful data are present on the memtable at flush time, whereas the latter is tracks how many of these raw bytes are considered garbage, meaning that they ended up not being imported on the SSTables resulting from the flush operations.
      
      **Unit test**: run `make db_flush_test -j$(nproc); ./db_flush_test` to run the unit test.
      This executable includes 3 tests, that test support and correct stat calculations for workloads with inserts, deletes, and DeleteRanges. The parameters are set such that the workloads are performed on a single memtable, and a single SSTable is created as a result of the flush operation. The flush operation is manually called in the test file. The tests verify that the values of these 2 statistics counters introduced in this PR  can be exactly predicted, showing that we have a full understanding of the underlying operations.
      
      **Performance testing**:
      `./db_bench -statistics -benchmarks=fillrandom -num=10000000` repeated 10 times.
      Timing done using "date" function in a bash script.
      _Results_:
      Original Rocksdb fork: mean 66.6 sec, std 1.18 sec.
      This feature branch: mean 67.4 sec, std 1.35 sec.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8411
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D29150629
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 7b3c2e86d50c6aa34fa50fd134282eacb543a5b1
      e817bc96
  24. 18 3月, 2021 1 次提交
    • Z
      Add the statistics and info log for Error handler (#8050) · 08ec5e73
      Zhichao Cao 提交于
      Summary:
      Add statistics and info log for error handler: counters for bg error, bg io error, bg retryable io error, auto resume, auto resume total retry, and auto resume sucess; Histogram for auto resume retry count in each recovery call.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8050
      
      Test Plan: make check and add test to error_handler_fs_test
      
      Reviewed By: anand1976
      
      Differential Revision: D26990565
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 49f71e8ea4e9db8b189943976404205b56ab883f
      08ec5e73
  25. 29 10月, 2020 1 次提交
    • Y
      Remove unused includes (#7604) · 394210f2
      Yanqin Jin 提交于
      Summary:
      This is a PR generated **semi-automatically** by an internal tool to remove unused includes and `using` statements.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7604
      
      Test Plan: make check
      
      Reviewed By: ajkr
      
      Differential Revision: D24579392
      
      Pulled By: riversand963
      
      fbshipit-source-id: c4bfa6c6b08da1de186690d37eb73d8fff45aecd
      394210f2
  26. 08 10月, 2020 1 次提交
    • A
      Add Stats for MultiGet (#7366) · 38d0a365
      Akanksha Mahajan 提交于
      Summary:
      Add following stats for MultiGet in Histogram to get more insight on MultiGet.
          1. Number of index and filter blocks read from file as part of MultiGet
          request per level.
          2. Number of data blocks read from file per level.
          3. Number of SST files loaded from file system per level.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7366
      
      Reviewed By: anand1976
      
      Differential Revision: D24127040
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: e63a003056b833729b277edc0639c08fb432756b
      38d0a365
  27. 05 9月, 2020 1 次提交
    • Y
      Add a new stats level to exclude tickers (#7329) · ab202e8d
      Yanqin Jin 提交于
      Summary:
      Currently, application may pass a statistics object to db but later
      wants to reduce stats tracking overhead by setting stats level to
      kExceptHistogramOrTimers (the current lowest level). Tickers will still
      be incremented, causing up to 1% CPU. We can add a new lowest stats
      level `kExceptTickers` to disable ticker incrementing as well, thus
      reducing CPU cycles spent on tickers.
      
      Test Plan (devserver):
      ```
      make check
      make clean
      DEBUG_LEVEL=0 make db_bench
      ./db_bench -perf_level=1 -stats_level=0 -statistics -benchmarks=fillseq,readrandom -duration=120
      ```
      
      Measure CPU util (%) before and after change:
      CPU util by rocksdb::RecordTick: 1.1 vs (<0.1)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7329
      
      Reviewed By: pdillinger
      
      Differential Revision: D23434014
      
      Pulled By: riversand963
      
      fbshipit-source-id: 72ff0f02a192ac476d4b0044b9f37fd4a22ff0d4
      ab202e8d
  28. 30 7月, 2020 1 次提交
    • A
      Compaction Read/Write Stats by Compaction Type (#7165) · 56ed601d
      Aaron Kabcenell 提交于
      Summary:
      Adds compaction statistics (total bytes read and written) for compactions that occur for delete-triggered, periodic, and TTL compaction reasons.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7165
      
      Test Plan:
      TTL and periodic can be checked by runnning db_bench with the options activated:
      
      /db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -periodic_compaction_seconds=1
      ./db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -fifo_compaction_ttl=1
      
      Setting the time to one second causes non-zero bytes read/written for those compaction reasons. Disabling them or setting them to times longer than the test run length causes the stats to return to zero as expected.
      
      Delete-triggered compaction counting is tested in DBTablePropertiesTest.DeletionTriggeredCompactionMarking
      
      Reviewed By: ajkr
      
      Differential Revision: D22693050
      
      Pulled By: akabcenell
      
      fbshipit-source-id: d15cef4d94576f703015c8942d5f0d492f69401d
      56ed601d
  29. 06 6月, 2020 1 次提交
  30. 28 4月, 2020 1 次提交
    • P
      Stats for redundant insertions into block cache (#6681) · 249eff0f
      Peter Dillinger 提交于
      Summary:
      Since read threads do not coordinate on loading data into block
      cache, two threads between Lookup and Insert can end up loading and
      inserting the same data. This is particularly concerning with
      cache_index_and_filter_blocks since those are hot and more likely to
      be race targets if ejected from (or not pre-populated in) the cache.
      
      Particularly with moves toward disaggregated / network storage, the cost
      of redundant retrieval might be high, and we should at least have some
      hard statistics from which we can estimate impact.
      
      Example with full filter thrashing "cliff":
      
          $ ./db_bench --benchmarks=fillrandom --num=15000000 --cache_index_and_filter_blocks -bloom_bits=10
          ...
          $ ./db_bench --db=/tmp/rocksdbtest-172704/dbbench --use_existing_db --benchmarks=readrandom,stats --num=200000 --cache_index_and_filter_blocks --cache_size=$((130 * 1024 * 1024)) --bloom_bits=10 --threads=16 -statistics 2>&1 | egrep '^rocksdb.block.cache.(.*add|.*redundant)' | grep -v compress | sort
          rocksdb.block.cache.add COUNT : 14181
          rocksdb.block.cache.add.failures COUNT : 0
          rocksdb.block.cache.add.redundant COUNT : 476
          rocksdb.block.cache.data.add COUNT : 12749
          rocksdb.block.cache.data.add.redundant COUNT : 18
          rocksdb.block.cache.filter.add COUNT : 1003
          rocksdb.block.cache.filter.add.redundant COUNT : 217
          rocksdb.block.cache.index.add COUNT : 429
          rocksdb.block.cache.index.add.redundant COUNT : 241
          $ ./db_bench --db=/tmp/rocksdbtest-172704/dbbench --use_existing_db --benchmarks=readrandom,stats --num=200000 --cache_index_and_filter_blocks --cache_size=$((120 * 1024 * 1024)) --bloom_bits=10 --threads=16 -statistics 2>&1 | egrep '^rocksdb.block.cache.(.*add|.*redundant)' | grep -v compress | sort
          rocksdb.block.cache.add COUNT : 1182223
          rocksdb.block.cache.add.failures COUNT : 0
          rocksdb.block.cache.add.redundant COUNT : 302728
          rocksdb.block.cache.data.add COUNT : 31425
          rocksdb.block.cache.data.add.redundant COUNT : 12
          rocksdb.block.cache.filter.add COUNT : 795455
          rocksdb.block.cache.filter.add.redundant COUNT : 130238
          rocksdb.block.cache.index.add COUNT : 355343
          rocksdb.block.cache.index.add.redundant COUNT : 172478
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6681
      
      Test Plan: Some manual testing (above) and unit test covering key metrics is included
      
      Reviewed By: ltamasi
      
      Differential Revision: D21134113
      
      Pulled By: pdillinger
      
      fbshipit-source-id: c11497b5f00f4ffdfe919823904e52d0a1a91d87
      249eff0f
  31. 21 2月, 2020 1 次提交
    • S
      Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) · fdf882de
      sdong 提交于
      Summary:
      When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
      
      Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
      
      Differential Revision: D19977691
      
      fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
      fdf882de
  32. 21 9月, 2019 1 次提交
  33. 06 8月, 2019 1 次提交
    • M
      WritePrepared: fix Get without snapshot (#5664) · 208556ee
      Maysam Yabandeh 提交于
      Summary:
      if read_options.snapshot is not set, ::Get will take the last sequence number after taking a super-version and uses that as the sequence number. Theoretically max_eviceted_seq_ could advance this sequence number. This could lead ::IsInSnapshot that will be invoked by the ReadCallback to notice the absence of the snapshot. In this case, the ReadCallback should have passed a non-value to snap_released so that it could be set by the ::IsInSnapshot. The patch does that, and adds a unit test to verify it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5664
      
      Differential Revision: D16614033
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 06fb3fd4aacd75806ed1a1acec7961f5d02486f2
      208556ee
  34. 07 6月, 2019 1 次提交
  35. 12 4月, 2019 1 次提交
    • A
      Introduce a new MultiGet batching implementation (#5011) · fefd4b98
      anand76 提交于
      Summary:
      This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching.
      
      Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to -
      1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch()
      2. Bloom filter cachelines can be prefetched, hiding the cache miss latency
      
      The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress.
      
      Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32).
      
      Batch   Sizes
      
      1        | 2        | 4         | 8      | 16  | 32
      
      Random pattern (Stride length 0)
      4.158 | 4.109 | 4.026 | 4.05 | 4.1 | 4.074        - Get
      4.438 | 4.302 | 4.165 | 4.122 | 4.096 | 4.075 - MultiGet (no batching)
      4.461 | 4.256 | 4.277 | 4.11 | 4.182 | 4.14        - MultiGet (w/ batching)
      
      Good locality (Stride length 16)
      4.048 | 3.659 | 3.248 | 2.99 | 2.84 | 2.753
      4.429 | 3.728 | 3.406 | 3.053 | 2.911 | 2.781
      4.452 | 3.45 | 2.833 | 2.451 | 2.233 | 2.135
      
      Good locality (Stride length 256)
      4.066 | 3.786 | 3.581 | 3.447 | 3.415 | 3.232
      4.406 | 4.005 | 3.644 | 3.49 | 3.381 | 3.268
      4.393 | 3.649 | 3.186 | 2.882 | 2.676 | 2.62
      
      Medium locality (Stride length 4096)
      4.012 | 3.922 | 3.768 | 3.61 | 3.582 | 3.555
      4.364 | 4.057 | 3.791 | 3.65 | 3.57 | 3.465
      4.479 | 3.758 | 3.316 | 3.077 | 2.959 | 2.891
      
      dbbench command used (on a DB with 4 levels, 12 million keys)-
      TEST_TMPDIR=/dev/shm numactl -C 10  ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011
      
      Differential Revision: D14348703
      
      Pulled By: anand1976
      
      fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b
      fefd4b98
  36. 02 3月, 2019 1 次提交
  37. 01 3月, 2019 1 次提交
    • S
      Add two more StatsLevel (#5027) · 5e298f86
      Siying Dong 提交于
      Summary:
      Statistics cost too much CPU for some use cases. Add two stats levels
      so that people can choose to skip two types of expensive stats, timers and
      histograms.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5027
      
      Differential Revision: D14252765
      
      Pulled By: siying
      
      fbshipit-source-id: 75ecec9eaa44c06118229df4f80c366115346592
      5e298f86
  38. 21 2月, 2019 1 次提交
    • Z
      add GetStatsHistory to retrieve stats snapshots (#4748) · c4f5d0aa
      Zhongyi Xie 提交于
      Summary:
      This PR adds public `GetStatsHistory` API to retrieve stats history in the form of an std map. The key of the map is the timestamp in microseconds when the stats snapshot is taken, the value is another std map from stats name to stats value (stored in std string). Two DBOptions are introduced: `stats_persist_period_sec` (default 10 minutes) controls the intervals between two snapshots are taken; `max_stats_history_count` (default 10) controls the max number of history snapshots to keep in memory. RocksDB will stop collecting stats snapshots if `stats_persist_period_sec` is set to 0.
      
      (This PR is the in-memory part of https://github.com/facebook/rocksdb/pull/4535)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4748
      
      Differential Revision: D13961471
      
      Pulled By: miasantreble
      
      fbshipit-source-id: ac836d401ecb84ea92216bf9966f969dedf4ad04
      c4f5d0aa