1. 07 4月, 2022 1 次提交
    • H
      Account memory of big memory users in BlockBasedTable in global memory limit (#9748) · 49623f9c
      Hui Xiao 提交于
      Summary:
      **Context:**
      Through heap profiling, we discovered that `BlockBasedTableReader` objects can accumulate and lead to high memory usage (e.g, `max_open_file = -1`). These memories are currently not saved, not tracked, not constrained and not cache evict-able. As a first step to improve this, similar to https://github.com/facebook/rocksdb/pull/8428,  this PR is to track an estimate of `BlockBasedTableReader` object's memory in block cache and fail future creation if the memory usage exceeds the available space of cache at the time of creation.
      
      **Summary:**
      - Approximate big memory users  (`BlockBasedTable::Rep` and `TableProperties` )' memory usage in addition to the existing estimated ones (filter block/index block/un-compression dictionary)
      - Charge all of these memory usages to block cache on `BlockBasedTable::Open()` and release them on `~BlockBasedTable()` as there is no memory usage fluctuation of concern in between
      - Refactor on CacheReservationManager (and its call-sites) to add concurrent support for BlockBasedTable  used in this PR.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9748
      
      Test Plan:
      - New unit tests
      - db bench: `OpenDb` : **-0.52% in ms**
        - Setup `./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -disable_auto_compactions=1 -write_buffer_size=1048576`
        - Repeated run with pre-change w/o feature and post-change with feature, benchmark `OpenDb`:  `./db_bench -benchmarks=readrandom -use_existing_db=1 -db=/dev/shm/testdb -reserve_table_reader_memory=true (remove this when running w/o feature) -file_opening_threads=3 -open_files=-1 -report_open_timing=true| egrep 'OpenDb:'`
      
      #-run | (feature-off) avg milliseconds | std milliseconds | (feature-on) avg milliseconds | std milliseconds | change (%)
      -- | -- | -- | -- | -- | --
      10 | 11.4018 | 5.95173 | 9.47788 | 1.57538 | -16.87382694
      20 | 9.23746 | 0.841053 | 9.32377 | 1.14074 | 0.9343477536
      40 | 9.0876 | 0.671129 | 9.35053 | 1.11713 | 2.893283155
      80 | 9.72514 | 2.28459 | 9.52013 | 1.0894 | -2.108041632
      160 | 9.74677 | 0.991234 | 9.84743 | 1.73396 | 1.032752389
      320 | 10.7297 | 5.11555 | 10.547 | 1.97692 | **-1.70275031**
      640 | 11.7092 | 2.36565 | 11.7869 | 2.69377 | **0.6635807741**
      
      -  db bench on write with cost to cache in WriteBufferManager (just in case this PR's CRM refactoring accidentally slows down anything in WBM) : `fillseq` : **+0.54% in micros/op**
      `./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -disable_auto_compactions=1 -cost_write_buffer_to_cache=true -write_buffer_size=10000000000 | egrep 'fillseq'`
      
      #-run | (pre-PR) avg micros/op | std micros/op | (post-PR)  avg micros/op | std micros/op | change (%)
      -- | -- | -- | -- | -- | --
      10 | 6.15 | 0.260187 | 6.289 | 0.371192 | 2.260162602
      20 | 7.28025 | 0.465402 | 7.37255 | 0.451256 | 1.267813605
      40 | 7.06312 | 0.490654 | 7.13803 | 0.478676 | **1.060579461**
      80 | 7.14035 | 0.972831 | 7.14196 | 0.92971 | **0.02254791432**
      
      -  filter bench: `bloom filter`: **-0.78% in ms/key**
          - ` ./filter_bench -impl=2 -quick -reserve_table_builder_memory=true | grep 'Build avg'`
      
      #-run | (pre-PR) avg ns/key | std ns/key | (post-PR)  ns/key | std ns/key | change (%)
      -- | -- | -- | -- | -- | --
      10 | 26.4369 | 0.442182 | 26.3273 | 0.422919 | **-0.4145720565**
      20 | 26.4451 | 0.592787 | 26.1419 | 0.62451 | **-1.1465262**
      
      - Crash test `python3 tools/db_crashtest.py blackbox --reserve_table_reader_memory=1 --cache_size=1` killed as normal
      
      Reviewed By: ajkr
      
      Differential Revision: D35136549
      
      Pulled By: hx235
      
      fbshipit-source-id: 146978858d0f900f43f4eb09bfd3e83195e3be28
      49623f9c
  2. 06 4月, 2022 7 次提交
  3. 05 4月, 2022 6 次提交
    • H
      Add Env::IOPriority to IOOptions (#9806) · 9cd47ce5
      Hui Xiao 提交于
      Summary:
      **Context/Todo:**
      As requested, allow IOOptions to take in an Env::IOPriority for convenience to pass down rate limiter related hint to file system level and for future interaction between RocksDB internal's rate limiting and custom file system level's rate-limiting.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9806
      
      Test Plan: No actual code changes in RocksDB internals
      
      Reviewed By: ajkr
      
      Differential Revision: D35388966
      
      Pulled By: hx235
      
      fbshipit-source-id: 5891c97c3f9184cd221a9ab8536ce8dfa8526c08
      9cd47ce5
    • A
      Fix segfault in FilePrefetchBuffer with async_io enabled (#9777) · 36bc3da9
      Akanksha Mahajan 提交于
      Summary:
      If FilePrefetchBuffer object is destroyed and then later Poll() calls callback on object which has been destroyed, it gives segfault on accessing destroyed object. It was caught after adding unit tests that tests Posix implementation of ReadAsync and Poll APIs.
      This PR also updates and fixes existing IOURing tests which were not running locally because RocksDbIOUringEnable function wasn't defined and IOUring was disabled for those tests
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9777
      
      Test Plan: Added new unit test
      
      Reviewed By: anand1976
      
      Differential Revision: D35254002
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 68e80054ffb14ae25c255920ebc6548ca5f130a1
      36bc3da9
    • J
      Fix commit_prereq and other targets (#9797) · ec77a928
      Jay Zhuang 提交于
      Summary:
      Make `commit_prereq` work and a few other improvements:
      * Remove gcc 481 and gcc5xx which are no longer supported
      * Remove platform007 which is gone
      * `make clean` work for both mac and linux
      * `precommit_checker.py` to python3
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9797
      
      Test Plan: `make commit_prereq`
      
      Reviewed By: ajkr
      
      Differential Revision: D35338536
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 1e159962ab9d31c43c4b85de7d0f582d3e881ffe
      ec77a928
    • S
      Fix typo about file/sst_file_manager_impl.h (#9799) · f6870640
      SGZW 提交于
      Summary:
      Fix typo deletition-> deletion
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9799
      
      Reviewed By: ajkr
      
      Differential Revision: D35341617
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 32bc384b99e5564f6a673076c6a4f160ee6c2e46
      f6870640
    • S
      build_tools/rocksdb-lego-determinator to pass parallelism information for no_compression (#9796) · d4159c80
      sdong 提交于
      Summary:
      Right now, parallelism information passed to "build_tools/rocksdb-lego-determinator no_compression" isn't effective when the test actually runs, as the information is dropped in the middle. Fix it.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9796
      
      Test Plan: Run "build_tools/rocksdb-lego-determinator no_compression" and execute the command line generated and observe the parallelism.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35330085
      
      fbshipit-source-id: e9b32d0520d61fbc2697ebd841099485f64482e3
      d4159c80
    • C
      Fix some typos in comments and HISTORY.md (#9798) · cd59b139
      Chen Lixiang 提交于
      Summary:
      compation --> compaction
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9798
      
      Reviewed By: ajkr
      
      Differential Revision: D35341611
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 5ea07527c311de75cade219456b6ee52b23020f6
      cd59b139
  4. 03 4月, 2022 1 次提交
  5. 02 4月, 2022 4 次提交
  6. 01 4月, 2022 4 次提交
    • A
      Add benchmark for GetMergeOperands() (#9785) · bfea9e7c
      Andrew Kryczka 提交于
      Summary:
      There's an existing benchmark, "getmergeoperands", but it is unconventional in that it has multiple phases and hardcoded setup parameters.
      
      This PR adds a different one, "readrandomoperands", that follows the pattern of other benchmarks of having a single phase and taking its configuration from existing flags.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9785
      
      Test Plan:
      ```
      $ ./db_bench -benchmarks=mergerandom -merge_operator=StringAppendOperator -write_buffer_size=1048576 -max_bytes_for_level_base=4194304 -target_file_size_base=1048576 -compression_type=none -disable_auto_compactions=true
      $ ./db_bench -use_existing_db=true -benchmarks=readrandomoperands -merge_operator=StringAppendOperator -disable_auto_compactions=true -duration=10
      ...
      readrandomoperands :     542.082 micros/op 1844 ops/sec;    0.2 MB/s (11980 of 18999 found)
      ```
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35290412
      
      Pulled By: ajkr
      
      fbshipit-source-id: fb367ca614b128cef844a75f0e5d9dd7c3328d85
      bfea9e7c
    • Y
      Encode min_log_number_to_keep and delete_wals_before in one version edit (#9766) · 6eafdf13
      Yanqin Jin 提交于
      Summary:
      min_log_number_to_keep denotes that the WALs whose numbers are below
      this value **will** be deleted by RocksDB.
      delete_wals_before will be used by RocksDB if
      track_and_verify_wals_in_manifest is set to true. During recovery,
      RocksDB uses the info encoded in delete_wals_before to reconstruct its
      knowledge about what WALs to expect existing.
      If these two tags are not encoded in the same VersionEdit, then it's
      possible for min_log_number_to_keep=100 to exist, but
      delete_wals_before=100 to be lost due to power failure. Subsequent
      recovery will delete 99.log. If the db crashes again, the following
      recovery will expect to see 99.log since there is no
      delete_wals_before=100 in the MANIFEST, but the WAL is already deleted.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9766
      
      Test Plan:
      First of all, make check.
      Second, format compatibility.
      SHORT_TEST=1 ./tools/check_format_compatible.sh
      
      Reviewed By: ltamasi
      
      Differential Revision: D35203623
      
      Pulled By: riversand963
      
      fbshipit-source-id: 45623fc4b4b50d299d5e0f9559a3a4c5e9522c8f
      6eafdf13
    • J
      Add microbench document (#9781) · 76383bea
      Jay Zhuang 提交于
      Summary:
      Add basic microbenchmark document
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9781
      
      Reviewed By: gitbw95
      
      Differential Revision: D35272866
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: f482e652151fd05ca46e29629261833f038a6075
      76383bea
    • S
      Fix DB::Open() error logging (#9784) · bbcf7b19
      sdong 提交于
      Summary:
      Right now we log a wrong error when DB::Open() fails. Fix it.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9784
      
      Test Plan: CI runs should pass
      
      Reviewed By: ajkr, riversand963
      
      Differential Revision: D35290203
      
      fbshipit-source-id: ffc640afa27f6b0a2382ee153dc43f28d9e242be
      bbcf7b19
  7. 31 3月, 2022 6 次提交
  8. 30 3月, 2022 5 次提交
  9. 29 3月, 2022 1 次提交
  10. 26 3月, 2022 5 次提交
    • A
      Fix some errors in async prefetching in FilePrefetchBuffer (#9734) · 33f8a08a
      Akanksha Mahajan 提交于
      Summary:
      In ReadOption `async_io` which prefetches the data asynchronously, db_bench and db_stress runs were failing  because wrong data was prefetched which resulted in Error: Checksum mismatched. Wrong data was copied because capacity was less than actual size needed. It has been fixed in this PR.
      
      Since there are two separate methods for async and sync prefetching, these changes are in async prefetching methods and any changes would not effect normal prefetching. I ran the regressions to make sure normal prefetching is fine.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9734
      
      Test Plan:
      1. CircleCI jobs
      
      2.  Ran db_bench
      ```
      . /db_bench -use_existing_db=true
      -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32
      -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680
      -duration=120 -ops_between_duration_checks=1 -async_io=1 -adaptive_readahead=1
      
      ```
      3. Ran db_stress test
      ```
      export CRASH_TEST_EXT_ARGS=" --async_io=1 --adaptive_readahead=1"
      make crash_test -j
      ```
      
      4. Run regressions for async_io disabled.
      
      Old flow without any async changes:
      ```
      ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.0
      Date:       Thu Mar 17 13:11:34 2022
      CPU:        24 * Intel Core Processor (Broadwell)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      seekrandom   :  483618.390 micros/op 2 ops/sec;  338.9 MB/s (249 of 249 found)
      ```
      
      With async prefetching changes and async_io disabled to make sure in normal prefetching there is no regression.
       ```
       ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 --async_io=0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.1
      Date:       Wed Mar 23 15:56:37 2022
      CPU:        24 * Intel Core Processor (Broadwell)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      seekrandom   :  481819.816 micros/op 2 ops/sec;  340.2 MB/s (250 of 250 found)
      ```
      
      Reviewed By: riversand963
      
      Differential Revision: D35058471
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 9233a1e6d97cea0c7a8111bfb9e8ac3251c341ce
      33f8a08a
    • M
      Correctly set ThreadState::tid (#9757) · 37de4e1d
      Mark Callaghan 提交于
      Summary:
      Fixes a bug introduced by me in https://github.com/facebook/rocksdb/pull/9733
      That PR added a counter so that the per-thread seeds in ThreadState would
      be unique even when --benchmarks had more than one test. But it incorrectly
      used this counter as the value for ThreadState::tid as well.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9757
      
      Test Plan:
      Confirm that unexpectedly good QPS results on the regression tests return
      to normal with this fix. I have confirmed that the QPS increase starts with
      the PR 9733 diff.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35149303
      
      Pulled By: mdcallag
      
      fbshipit-source-id: dee5cc36b7faaba6c3be6d6a253d3c2eaad72864
      37de4e1d
    • H
      Clarify Options::rate_limiter api doc for #9607 Rate-limit automatic WAL flush... · e2cb9aa2
      Hui Xiao 提交于
      Clarify Options::rate_limiter api doc for #9607 Rate-limit automatic WAL flush after each user write (#9745)
      
      Summary:
      As title for https://github.com/facebook/rocksdb/pull/9607
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9745
      
      Test Plan: No code change
      
      Reviewed By: ajkr
      
      Differential Revision: D35096901
      
      Pulled By: hx235
      
      fbshipit-source-id: 6bd3671baecfdc04579b0a81a957bfaa7bed81e1
      e2cb9aa2
    • J
      jni: uniformly use GetByteArrayRegion() to copy bytes (#9380) · b83263bb
      Jermy Li 提交于
      Summary:
      Uniformly use GetByteArrayRegion() instead of GetByteArrayElements()
      to copy bytes.
      In addition, it can avoid an inefficient ReleaseByteArrayElements()
      operation.
      Some benefits of GetByteArrayRegion() can be referred to:
      https://stackoverflow.com/a/2480493
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9380
      
      Reviewed By: ajkr
      
      Differential Revision: D35135474
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: a32c1774d37f2d22b9bcd105d83e0bb984b71b54
      b83263bb
    • M
      db_bench should use a good seed when --seed is not set or set to 0 (#9740) · 1a130fa3
      Mark Callaghan 提交于
      Summary:
      This is for https://github.com/facebook/rocksdb/issues/9737
      
      I have wasted more than a few hours running db_bench benchmarks where --seed was not set
      and getting better than expected results because cache hit rates are great because
      multiple invocations of db_bench used the same value for --seed or did not set it,
      and then all used 0. The result is that all see the same sequence of keys.
      
      Others have done the same. The problem is worse in that it is easy to miss and the result is a benchmark with results that are misleading.
      
      A good way to avoid this is to set it to the equivalent of gettimeofday() when either
      --seed is not set or it is set to 0 (the default).
      
      With this change the actual seed is printed when it was 0 at process start:
        Set seed to 1647992570365606 because --seed was 0
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9740
      
      Test Plan:
      Perf results:
      
      ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000
        readrandom   :       6.469 micros/op 154583 ops/sec;   17.1 MB/s (4000000 of 4000000 found)
      
      ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=0
        readrandom   :       6.565 micros/op 152321 ops/sec;   16.9 MB/s (4000000 of 4000000 found)
      
      ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=1
        readrandom   :       6.461 micros/op 154777 ops/sec;   17.1 MB/s (4000000 of 4000000 found)
      
      ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=2
        readrandom   :       6.525 micros/op 153244 ops/sec;   17.0 MB/s (4000000 of 4000000 found)
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35145361
      
      Pulled By: mdcallag
      
      fbshipit-source-id: 2b35b153ccec46b27d7c9405997523555fc51267
      1a130fa3