1. 08 6月, 2022 3 次提交
    • A
      Set db_stress defaults for TSAN deadlock detector (#10131) · ff323464
      Andrew Kryczka 提交于
      Summary:
      After https://github.com/facebook/rocksdb/issues/9357 we began seeing the following error attempting to acquire
      locks for file ingestion:
      
      ```
      FATAL: ThreadSanitizer CHECK failed: /home/engshare/third-party2/llvm-fb/12/src/llvm/compiler-rt/lib/sanitizer_common/sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" (0x40, 0x40)
      ```
      
      The command was using default values for `ingest_external_file_width`
      (1000) and `log2_keys_per_lock` (2). The expected number of locks needed
      to update those keys is then (1000 / 2^2) = 250, which is above the 0x40 (64)
      limit. This PR reduces the default value of `ingest_external_file_width`
      to 100 so the expected number of locks is 25, which is within the limit.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10131
      
      Reviewed By: ltamasi
      
      Differential Revision: D36986307
      
      Pulled By: ajkr
      
      fbshipit-source-id: e918cdb2fcc39517d585f1e5fd2539e185ada7c1
      ff323464
    • G
      Add unit test to verify that the dynamic priority can be passed from compaction to FS (#10088) · 5cbee1f6
      gitbw95 提交于
      Summary:
      **Summary:**
      Add unit tests to verify that the dynamic priority can be passed from compaction to FS. Compaction reads&writes and other DB reads&writes share the same read&write paths to FSRandomAccessFile or FSWritableFile, so a MockTestFileSystem is added to replace the default filesystem from Env to intercept and verify the io_priority. To prepare the compaction input files, use the default filesystem from Env. To test the io priority of the compaction reads and writes, db_options_.fs is set as MockTestFileSystem.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10088
      
      Test Plan: Add unit tests.
      
      Reviewed By: anand1976
      
      Differential Revision: D36882528
      
      Pulled By: gitbw95
      
      fbshipit-source-id: 120adc15801966f2b8c9fc45285f590a3fff96d1
      5cbee1f6
    • Z
      Handle "NotSupported" status by default implementation of Close() in … (#10127) · b6de139d
      zczhu 提交于
      Summary:
      The default implementation of Close() function in Directory/FSDirectory classes returns `NotSupported` status. However, we don't want operations that worked in older versions to begin failing after upgrading when run on FileSystems that have not implemented Directory::Close() yet. So we require the upper level that calls Close() function should properly handle "NotSupported" status instead of treating it as an error status.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10127
      
      Reviewed By: ajkr
      
      Differential Revision: D36971112
      
      Pulled By: littlepig2013
      
      fbshipit-source-id: 100f0e6ad1191e1acc1ba6458c566a11724cf466
      b6de139d
  2. 07 6月, 2022 5 次提交
    • Z
      Consolidate manual_compaction_paused_ check (#10070) · 3ee6c9ba
      zczhu 提交于
      Summary:
      As pointed out by [https://github.com/facebook/rocksdb/pull/8351#discussion_r645765422](https://github.com/facebook/rocksdb/pull/8351#discussion_r645765422), check `manual_compaction_paused` and `manual_compaction_canceled` can be reduced by setting `*canceled` to be true in `DisableManualCompaction()` and `*canceled` to be false in the last time calling `EnableManualCompaction()`.
      
      Changed Tests: The origin `DBTest2.PausingManualCompaction1` uses a callback function to increase `manual_compaction_paused` and the origin CompactionJob/CompactionIterator with `manual_compaction_paused` can detect this. I changed the callback function so that it sets `*canceled` as true if `canceled` is not `nullptr` (to notify CompactionJob/CompactionIterator the compaction has been canceled).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10070
      
      Test Plan: This change does not introduce new features, but some slight difference in compaction implementation. Run the same manual compaction unit tests as before (e.g., PausingManualCompaction[1-4], CancelManualCompaction[1-2], CancelManualCompactionWithListener in db_test2, and db_compaction_test).
      
      Reviewed By: ajkr
      
      Differential Revision: D36949133
      
      Pulled By: littlepig2013
      
      fbshipit-source-id: c5dc4c956fbf8f624003a0f5ad2690240063a821
      3ee6c9ba
    • Y
      Return "invalid argument" when read timestamp is too old (#10109) · a101c9de
      Yu Zhang 提交于
      Summary:
      With this change, when a given read timestamp is smaller than the column-family's full_history_ts_low, Get(), MultiGet() and iterators APIs will return Status::InValidArgument().
      Test plan
      ```
      $COMPILE_WITH_ASAN=1 make -j24 all
      $./db_with_timestamp_basic_test --gtest_filter=DBBasicTestWithTimestamp.UpdateFullHistoryTsLow
      $ make -j24 check
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10109
      
      Reviewed By: riversand963
      
      Differential Revision: D36901126
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: 255feb1a66195351f06c1d0e42acb1ff74527f86
      a101c9de
    • Z
      Fix default implementaton of close() function for Directory/FSDirecto… (#10123) · 9f244b21
      zczhu 提交于
      Summary:
      As pointed by anand1976 in his [comment](https://github.com/facebook/rocksdb/pull/10049#pullrequestreview-994255819), previous implementation (adding Close() function in Directory/FSDirectory class) is not backward-compatible. And we mistakenly added the default implementation `return Status::NotSupported("Close")` or `return IOStatus::NotSupported("Close")` in WritableFile class in this [pull request](https://github.com/facebook/rocksdb/pull/10101). This pull request fixes the above issue.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10123
      
      Reviewed By: ajkr
      
      Differential Revision: D36943661
      
      Pulled By: littlepig2013
      
      fbshipit-source-id: 9dc45f4d2ab3a9d51c30bdfde679f1d13c4d5509
      9f244b21
    • G
      Fix overflow bug in standard deviation computation. (#10100) · 2af132c3
      Guido Tagliavini Ponce 提交于
      Summary:
      There was an overflow bug when computing the variance in the HistogramStat class.
      
      This manifests, for instance, when running cache_bench with default arguments. This executes 32M lookups/inserts/deletes in a block cache, and then computes (among other things) the variance of the latencies. The variance is computed as ``variance = (cur_sum_squares * cur_num - cur_sum * cur_sum) / (cur_num * cur_num)``, where ``cum_sum_squares`` is the sum of the squares of the samples, ``cur_num`` is the number of samples, and ``cur_sum`` is the sum of the samples. Because the median latency in a typical run is around 3800 nanoseconds, both the ``cur_sum_squares * cur_num`` and ``cur_sum * cur_sum`` terms overflow as uint64_t.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10100
      
      Test Plan: Added a unit test. Run ``make -j24 histogram_test && ./histogram_test``.
      
      Reviewed By: pdillinger
      
      Differential Revision: D36942738
      
      Pulled By: guidotag
      
      fbshipit-source-id: 0af5fb9e2a297a284e8e74c24e604d302906006e
      2af132c3
    • P
      Refactor: Add BlockTypes to make them imply C++ type in block cache (#10098) · 4f78f969
      Peter Dillinger 提交于
      Summary:
      We have three related concepts:
      * BlockType: an internal enum conceptually indicating a type of SST file
      block
      * CacheEntryRole: a user-facing enum for categorizing block cache entries,
      which is also involved in associated cache entries with an appropriate
      deleter. Can include categories for non-block cache entries (e.g. memory
      reservations).
      * TBlocklike: a C++ type for the actual type behind a void* cache entry.
      
      We had some existing code ugliness because BlockType did not imply
      TBlocklike, because of various kinds of "filter" block. This refactoring
      fixes that with new BlockTypes.
      
      More clean-up can come in later work.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10098
      
      Test Plan: existing tests
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D36897945
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 3ae496b5caa81e0a0ed85e873eb5b525e2d9a295
      4f78f969
  3. 06 6月, 2022 1 次提交
  4. 05 6月, 2022 1 次提交
    • A
      CI Benchmarking with CircleCI Runner and OpenSearch Dashboard (EB 1088) (#9723) · 2f4a0ffe
      Alan Paxton 提交于
      Summary:
      CircleCI runner based benchmarking. A runner is a dedicate machine configured for CircleCI to perform work on. Our work is a repeatable benchmark, the `benchmark-linux` job in `config.yml`
      
      A runner, in CircleCI terminology, is a machine that is managed by the client (us) rather than running on CircleCI resources in the cloud. This means that we define and configure the iron, and that therefore the performance is repeatable and predictable. Which is what we need for performance regression benchmarking.
      
      On a time schedule (or on commit, during branch development) benchmarks are set off on the runner, and then a script is run `benchmark_log_tool.py` which parses the benchmark output and pushes it into a pre-configured OpenSearch document connected to an OpenSearch dashboard. Members of the team can examine benchmark performance changes on the dashboard.
      
      As time progresses we can add different benchmarks to the suite which gets run.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9723
      
      Reviewed By: pdillinger
      
      Differential Revision: D35555626
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: c6a905ca04494495c3784cfbb991f5ab90c807ee
      2f4a0ffe
  5. 04 6月, 2022 11 次提交
    • Y
      Add a simple example of backup and restore (#10054) · 560906ab
      yite.gu 提交于
      Summary:
      Add a simple example of backup and restore
      Signed-off-by: NYiteGu <ess_gyt@qq.com>
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10054
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36678141
      
      Pulled By: ajkr
      
      fbshipit-source-id: 43545356baddb4c2c76c62cd63d7a3238d1f8a00
      560906ab
    • L
      Add wide column serialization primitives (#9915) · e9c74bc4
      Levi Tamasi 提交于
      Summary:
      The patch adds some low-level logic that can be used to serialize/deserialize
      a sorted vector of wide columns to/from a simple binary searchable string
      representation. Currently, there is no user-facing API; this will be implemented in
      subsequent stages.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9915
      
      Test Plan: `make check`
      
      Reviewed By: siying
      
      Differential Revision: D35978076
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 33f5f6628ec3bcd8c8beab363b1978ac047a8788
      e9c74bc4
    • Y
      Point-lookup returns timestamps of Delete and SingleDelete (#10056) · 3e02c6e0
      Yanqin Jin 提交于
      Summary:
      If caller specifies a non-null `timestamp` argument in `DB::Get()` or a non-null `timestamps` in `DB::MultiGet()`,
      RocksDB will return the timestamps of the point tombstones.
      
      Note: DeleteRange is still unsupported.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10056
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D36677956
      
      Pulled By: riversand963
      
      fbshipit-source-id: 2d7af02cc7237b1829cd269086ea895a49d501ae
      3e02c6e0
    • H
      Increase ChargeTableReaderTest/ChargeTableReaderTest.Basic error tolerance... · 4bdcc801
      Hui Xiao 提交于
      Increase ChargeTableReaderTest/ChargeTableReaderTest.Basic error tolerance rate from 1% to 5% (#10113)
      
      Summary:
      **Context:**
      https://github.com/facebook/rocksdb/pull/9748 added support to charge table reader memory to block cache. In the test `ChargeTableReaderTest/ChargeTableReaderTest.Basic`, it estimated the table reader memory, calculated the expected number of table reader opened based on this estimation and asserted this number with actual number. The expected number of table reader opened calculated based on estimated table reader memory will not be 100% accurate and should have tolerance for error. It was previously set to 1% and recently encountered an assertion failure that `(opened_table_reader_num) <= (max_table_reader_num_capped_upper_bound), actual: 375 or 376 vs 374` where `opened_table_reader_num` is the actual opened one and `max_table_reader_num_capped_upper_bound` is the estimated opened one (=371 * 1.01). I believe it's safe to increase error tolerance from 1% to 5% hence there is this PR.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10113
      
      Test Plan: - CI again succeeds.
      
      Reviewed By: ajkr
      
      Differential Revision: D36911556
      
      Pulled By: hx235
      
      fbshipit-source-id: 259687dd77b450fea0f5658a5b567a1d31d4b1f7
      4bdcc801
    • Z
      cmake: add an option to skip thirdparty.inc on Windows (#10110) · c1018b75
      Zeyi (Rice) Fan 提交于
      Summary:
      When building RocksDB with getdeps on Windows, `thirdparty.inc` get in the way since `FindXXXX.cmake` are working properly now.
      
      This PR adds an option to skip that file when building RocksDB so we can disable it.
      
      FB: see [D36905191](https://www.internalfb.com/diff/D36905191).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10110
      
      Reviewed By: siying
      
      Differential Revision: D36913882
      
      Pulled By: fanzeyi
      
      fbshipit-source-id: 33d36841dc0d4fe87f51e1d9fd2b158a3adab88f
      c1018b75
    • L
      Fix some bugs in verify_random_db.sh (#10112) · 7d36bc42
      Levi Tamasi 提交于
      Summary:
      The patch attempts to fix three bugs in `verify_random_db.sh`:
      1) https://github.com/facebook/rocksdb/pull/9937 changed the default for
      `--try_load_options` to true in the script's use case, so we have to
      explicitly set it to false if the corresponding argument of the script
      is 0. This should fix the issue we've been seeing with our forward
      compatibility tests where 7.3 is unable to open a database created by
      the version on main after adding a new configuration option.
      2) The script seems to support two "extra parameters"; however,
      in practice, if the second one was set, only that one was passed on to
      `ldb`. Now both get forwarded.
      3) When running the `diff` command, the base DB directory was passed as
      the second argument instead of the file containing the `ldb` output
      (this actually seems to work, probably accidentally though).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10112
      
      Reviewed By: pdillinger
      
      Differential Revision: D36911363
      
      Pulled By: ltamasi
      
      fbshipit-source-id: fe29db4e28d373cee51a12322c59050fc50e926d
      7d36bc42
    • Y
      Fix a bug in WAL tracking (#10087) · d739de63
      Yanqin Jin 提交于
      Summary:
      Closing https://github.com/facebook/rocksdb/issues/10080
      
      When `SyncWAL()` calls `MarkLogsSynced()`, even if there is only one active WAL file,
      this event should still be added to the MANIFEST.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10087
      
      Test Plan: make check
      
      Reviewed By: ajkr
      
      Differential Revision: D36797580
      
      Pulled By: riversand963
      
      fbshipit-source-id: 24184c9dd606b3939a454ed41de6e868d1519999
      d739de63
    • G
      Add support for FastLRUCache in cache_bench (#10095) · eb99e080
      Guido Tagliavini Ponce 提交于
      Summary:
      cache_bench can now run with FastLRUCache.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10095
      
      Test Plan:
      - Temporarily add an ``assert(false)`` in the execution path that sets up the FastLRUCache. Run ``make -j24 cache_bench``. Then test the appropriate code is used by running ``./cache_bench -cache_type=fast_lru_cache`` and checking that the assert is called. Repeat for LRUCache.
      - Verify that FastLRUCache (currently a clone of LRUCache) has similar latency distribution than LRUCache, by comparing the outputs of ``./cache_bench -cache_type=fast_lru_cache`` and ``./cache_bench -cache_type=lru_cache``.
      
      Reviewed By: pdillinger
      
      Differential Revision: D36875834
      
      Pulled By: guidotag
      
      fbshipit-source-id: eb2ad0bb32c2717a258a6ac66ed736e06f826cd8
      eb99e080
    • Z
      Add default impl to dir close (#10101) · 21906d66
      zczhu 提交于
      Summary:
      As pointed by anand1976 in his [comment](https://github.com/facebook/rocksdb/pull/10049#pullrequestreview-994255819), previous implementation is not backward-compatible. In this implementation, the default implementation `return Status::NotSupported("Close")` or `return IOStatus::NotSupported("Close")` is added for `Close()` function for `*Directory` classes.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10101
      
      Test Plan: DBBasicTest.DBCloseAllDirectoryFDs
      
      Reviewed By: anand1976
      
      Differential Revision: D36899346
      
      Pulled By: littlepig2013
      
      fbshipit-source-id: 430624793362f330cbb8837960f0e8712a944ab9
      21906d66
    • G
      Add support for FastLRUCache in db_bench. (#10096) · cf856077
      Guido Tagliavini Ponce 提交于
      Summary:
      db_bench can now run with FastLRUCache.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10096
      
      Test Plan:
      - Temporarily add an ``assert(false)`` in the execution path that sets up the FastLRUCache. Run ``make -j24 db_bench``. Then test the appropriate code is used by running ``./db_bench -cache_type=fast_lru_cache`` and checking that the assert is called. Repeat for LRUCache.
      - Verify that FastLRUCache (currently a clone of LRUCache) produces similar benchmark data than LRUCache, by comparing the outputs of ``./db_bench -benchmarks=fillseq,fillrandom,readseq,readrandom -cache_type=fast_lru_cache`` and ``./db_bench -benchmarks=fillseq,fillrandom,readseq,readrandom -cache_type=lru_cache``.
      
      Reviewed By: gitbw95
      
      Differential Revision: D36898774
      
      Pulled By: guidotag
      
      fbshipit-source-id: f9f6b6f6da124f88b21b3c8dee742fbb04eff773
      cf856077
    • Y
      Temporarily disable wal compression (#10108) · 2b3c50c4
      Yanqin Jin 提交于
      Summary:
      Will re-enable after fixing the bug in https://github.com/facebook/rocksdb/issues/10099 and https://github.com/facebook/rocksdb/issues/10097.
      Right now, the priority is https://github.com/facebook/rocksdb/issues/10087, but the bug in WAL compression prevents the mini crash test from passing.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10108
      
      Reviewed By: pdillinger
      
      Differential Revision: D36897214
      
      Pulled By: riversand963
      
      fbshipit-source-id: d64dc52738222d5f66003f7731dc46eaeed812be
      2b3c50c4
  6. 03 6月, 2022 7 次提交
    • M
      Enhance to support more tuning options, and universal and integrated… (#9704) · 5506954b
      Mark Callaghan 提交于
      Summary:
      … BlobDB for all tests
      
      This does two big things:
      * provides more tuning options
      * supports universal and integrated BlobDB for all of the benchmarks that are leveled-only
      
      It does several smaller things, and I will list a few
      * sets l0_slowdown_writes_trigger which wasn't set before this diff.
      * improves readability in report.tsv by using smaller field names in the header
      * adds more columns to report.tsv
      
      report.tsv before this diff:
      ```
      ops_sec mb_sec  total_size_gb   level0_size_gb  sum_gb  write_amplification     write_mbps      usec_op percentile_50   percentile_75   percentile_99   percentile_99.9 percentile_99.99        uptime  stall_time      stall_percent   test_name       test_date      rocksdb_version  job_id
      823294  329.8   0.0     21.5    21.5    1.0     183.4   1.2     1.0     1.0     3       6       14      120     00:00:0.000     0.0     fillseq.wal_disabled.v400       2022-03-16T15:46:45.000-07:00   7.0
      326520  130.8   0.0     0.0     0.0     0.0     0       12.2    139.8   155.1   170     234     250     60      00:00:0.000     0.0     multireadrandom.t4      2022-03-16T15:48:47.000-07:00   7.0
      86313   345.7   0.0     0.0     0.0     0.0     0       46.3    44.8    50.6    75      84      108     60      00:00:0.000     0.0     revrangewhilewriting.t4 2022-03-16T15:50:48.000-07:00   7.0
      101294  405.7   0.0     0.1     0.1     1.0     1.6     39.5    40.4    45.9    64      75      103     62      00:00:0.000     0.0     fwdrangewhilewriting.t4 2022-03-16T15:52:50.000-07:00   7.0
      258141  103.4   0.0     0.1     1.2     18.2    19.8    15.5    14.3    18.1    28      34      48      62      00:00:0.000     0.0     readwhilewriting.t4     2022-03-16T15:54:51.000-07:00   7.0
      334690  134.1   0.0     7.6     18.7    4.2     308.8   12.0    11.8    13.7    21      30      62      62      00:00:0.000     0.0     overwrite.t4.s0 2022-03-16T15:56:53.000-07:00   7.0
      ```
      report.tsv with this diff:
      ```
      ops_sec mb_sec  lsm_sz  blob_sz c_wgb   w_amp   c_mbps  c_wsecs c_csecs b_rgb   b_wgb   usec_op p50     p99     p99.9   p99.99  pmax    uptime  stall%  Nstall  u_cpu   s_cpu   rss     test    date    version job_id
      831144  332.9   22GB    0.0GB,  21.7    1.0     185.1   264     262     0       0       1.2     1.0     3       6       14      9198    120     0.0     0       0.4     0.0     0.7     fillseq.wal_disabled.v400       2022-03-16T16:21:23     7.0
      325229  130.3   22GB    0.0GB,  0.0             0.0     0       0       0       0       12.3    139.8   170     237     249     572     60      0.0     0       0.4     0.1     1.2     multireadrandom.t4      2022-03-16T16:23:25     7.0
      312920  125.3   26GB    0.0GB,  11.1    2.6     189.3   115     113     0       0       12.8    11.8    21      34      1255    6442    60      0.2     1       0.7     0.1     0.6     overwritesome.t4.s0     2022-03-16T16:25:27     7.0
      81698   327.2   25GB    0.0GB,  0.0             0.0     0       0       0       0       48.9    46.2    79      246     369     9445    60      0.0     0       0.4     0.1     1.4     revrangewhilewriting.t4 2022-03-16T16:30:21     7.0
      92484   370.4   25GB    0.0GB,  0.1     1.5     1.1     1       0       0       0       43.2    42.3    75      103     110     9512    62      0.0     0       0.4     0.1     1.4     fwdrangewhilewriting.t4 2022-03-16T16:32:24     7.0
      241661  96.8    25GB    0.0GB,  0.1     1.5     1.1     1       0       0       0       16.5    17.1    30      34      49      9092    62      0.0     0       0.4     0.1     1.4     readwhilewriting.t4     2022-03-16T16:34:27     7.0
      305234  122.3   30GB    0.0GB,  12.1    2.7     201.7   127     124     0       0       13.1    11.8    21      128     1934    6339    62      0.0     0       0.7     0.1     0.7     overwrite.t4.s0 2022-03-16T16:36:30     7.0
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9704
      
      Test Plan: run it
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36864627
      
      Pulled By: mdcallag
      
      fbshipit-source-id: d5af1cfc258a16865210163fa6fd1b803ab1a7d3
      5506954b
    • L
      Fix Java build (#10105) · 7b2c0140
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10105
      
      Reviewed By: cbi42
      
      Differential Revision: D36891073
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 16487ec708fc96add2a1ebc2d98f6439dfc852ca
      7b2c0140
    • L
      Fix LITE build (#10106) · b8fe7df2
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10106
      
      Reviewed By: cbi42
      
      Differential Revision: D36891284
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 304ffa84549201659feb0b74d6ba54a83f08906b
      b8fe7df2
    • Z
      Add comments/permit unchecked error to close_db_dir pull requests (#10093) · e88d8935
      zczhu 提交于
      Summary:
      In [close_db_dir](https://github.com/facebook/rocksdb/pull/10049) pull request, some merging conflicts occurred (some comments and one line `s.PermitUncheckedError()` are missing). This pull request aims to put them back.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10093
      
      Reviewed By: ajkr
      
      Differential Revision: D36884117
      
      Pulled By: littlepig2013
      
      fbshipit-source-id: 8c0e2a8793fc52804067c511843bd1ff4912c1c3
      e88d8935
    • Y
      Install zstd on CircleCI linux (#10102) · ed50ccd1
      Yanqin Jin 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10102
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36885468
      
      Pulled By: riversand963
      
      fbshipit-source-id: 6ed5b62dda8fe0f4be4b66d09bdec0134cf4500c
      ed50ccd1
    • G
      Make it possible to enable blob files starting from a certain LSM tree level (#10077) · e6432dfd
      Gang Liao 提交于
      Summary:
      Currently, if blob files are enabled (i.e. `enable_blob_files` is true), large values are extracted both during flush/recovery (when SST files are written into level 0 of the LSM tree) and during compaction into any LSM tree level. For certain use cases that have a mix of short-lived and long-lived values, it might make sense to support extracting large values only during compactions whose output level is greater than or equal to a specified LSM tree level (e.g. compactions into L1/L2/... or above). This could reduce the space amplification caused by large values that are turned into garbage shortly after being written at the price of some write amplification incurred by long-lived values whose extraction to blob files is delayed.
      
      In order to achieve this, we would like to do the following:
      - Add a new configuration option `blob_file_starting_level` (default: 0) to `AdvancedColumnFamilyOptions` (and `MutableCFOptions` and extend the related logic)
      - Instantiate `BlobFileBuilder` in `BuildTable` (used during flush and recovery, where the LSM tree level is L0) and `CompactionJob` iff `enable_blob_files` is set and the LSM tree level is `>= blob_file_starting_level`
      - Add unit tests for the new functionality, and add the new option to our stress tests (`db_stress` and `db_crashtest.py` )
      - Add the new option to our benchmarking tool `db_bench` and the BlobDB benchmark script `run_blob_bench.sh`
      - Add the new option to the `ldb` tool (see https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool)
      - Ideally extend the C and Java bindings with the new option
      - Update the BlobDB wiki to document the new option.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10077
      
      Reviewed By: ltamasi
      
      Differential Revision: D36884156
      
      Pulled By: gangliao
      
      fbshipit-source-id: 942bab025f04633edca8564ed64791cb5e31627d
      e6432dfd
    • J
      Add kLastTemperature as temperature high bound (#10044) · a0200315
      Jay Zhuang 提交于
      Summary:
      Only used as temperature high bound for current code, may
      increase with more temperatures added.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10044
      
      Test Plan: ci
      
      Reviewed By: siying
      
      Differential Revision: D36633410
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: eecdfa7623c31778c31d789902eacf78aad7b482
      a0200315
  7. 02 6月, 2022 8 次提交
    • G
      Support specifying blob garbage collection parameters when CompactRange() (#10073) · 3dc6ebaf
      Gang Liao 提交于
      Summary:
      Garbage collection is generally controlled by the BlobDB configuration options `enable_blob_garbage_collection` and `blob_garbage_collection_age_cutoff`. However, there might be use cases where we would want to temporarily override these options while performing a manual compaction. (One use case would be doing a full key-space manual compaction with full=100% garbage collection age cutoff in order to minimize the space occupied by the database.) Our goal here is to make it possible to override the configured GC parameters when using the `CompactRange` API to perform manual compactions. This PR would involve:
      
      - Extending the `CompactRangeOptions` structure so clients can both force-enable and force-disable GC, as well as use a different cutoff than what's currently configured
      - Storing whether blob GC should actually be enabled during a certain manual compaction and the cutoff to use in the `Compaction` object (considering the above overrides) and passing it to `CompactionIterator` via `CompactionProxy`
      - Updating the BlobDB wiki to document the new options.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10073
      
      Test Plan: Adding unit tests and adding the new options to the stress test tool.
      
      Reviewed By: ltamasi
      
      Differential Revision: D36848700
      
      Pulled By: gangliao
      
      fbshipit-source-id: c878ef101d1c612429999f513453c319f75d78e9
      3dc6ebaf
    • Z
      Explicitly closing all directory file descriptors (#10049) · 65893ad9
      Zichen Zhu 提交于
      Summary:
      Currently, the DB directory file descriptor is left open until the deconstruction process (`DB::Close()` does not close the file descriptor). To verify this, comment out the lines between `db_ = nullptr` and `db_->Close()` (line 512, 513, 514, 515 in ldb_cmd.cc) to leak the ``db_'' object, build `ldb` tool and run
      ```
      strace --trace=open,openat,close ./ldb --db=$TEST_TMPDIR --ignore_unknown_options put K1 V1 --create_if_missing
      ```
      There is one directory file descriptor that is not closed in the strace log.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10049
      
      Test Plan: Add a new unit test DBBasicTest.DBCloseAllDirectoryFDs: Open a database with different WAL directory and three different data directories, and all directory file descriptors should be closed after calling Close(). Explicitly call Close() after a directory file descriptor is not used so that the counter of directory open and close should be equivalent.
      
      Reviewed By: ajkr, hx235
      
      Differential Revision: D36722135
      
      Pulled By: littlepig2013
      
      fbshipit-source-id: 07bdc2abc417c6b30997b9bbef1f79aa757b21ff
      65893ad9
    • G
      Add support for FastLRUCache in stress and crash tests. (#10081) · b4d0e041
      Guido Tagliavini Ponce 提交于
      Summary:
      Stress tests can run with the experimental FastLRUCache. Crash tests randomly choose between LRUCache and FastLRUCache.
      
      Since only LRUCache supports a secondary cache, we validate the `--secondary_cache_uri` and `--cache_type` flags---when `--secondary_cache_uri` is set, the `--cache_type` is set to `lru_cache`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10081
      
      Test Plan:
      - To test that the FastLRUCache is used and the stress test runs successfully, run `make -j24 CRASH_TEST_EXT_ARGS=—duration=960 blackbox_crash_test_with_atomic_flush`. The cache type should sometimes be `fast_lru_cache`.
      - To test the flag validation, run `make -j24 CRASH_TEST_EXT_ARGS="--duration=960 --secondary_cache_uri=x" blackbox_crash_test_with_atomic_flush` multiple times. The test will always be aborted (which is okay). Check that the cache type is always `lru_cache`.
      
      Reviewed By: anand1976
      
      Differential Revision: D36839908
      
      Pulled By: guidotag
      
      fbshipit-source-id: ebcdfdcd12ec04c96c09ae5b9c9d1e613bdd1725
      b4d0e041
    • A
      Update History.md for #9922 (#10092) · 45b1c788
      Akanksha Mahajan 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10092
      
      Reviewed By: riversand963
      
      Differential Revision: D36832311
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 8fb1cf90b1d4dddebbfbeebeddb15f6905968e9b
      45b1c788
    • J
      Get current LogFileNumberSize the same as log_writer (#10086) · 5864900c
      Jay Zhuang 提交于
      Summary:
      `db_impl.alive_log_files_` is used to track the WAL size in `db_impl.logs_`.
      Get the `LogFileNumberSize` obj in `alive_log_files_` the same time as `log_writer` to keep them consistent.
      For this issue, it's not safe to do `deque::reverse_iterator::operator*` and `deque::pop_front()` concurrently,
      so remove the tail cache.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10086
      
      Test Plan:
      ```
      # on Windows
      gtest-parallel ./db_test --gtest_filter=DBTest.FileCreationRandomFailure -r 1000 -w 100
      ```
      
      Reviewed By: riversand963
      
      Differential Revision: D36822373
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 5e738051dfc7bcf6a15d85ba25e6365df6b6a6af
      5864900c
    • C
      Add bug fix to HISTORY.md (#10091) · 463873f1
      Changyu Bi 提交于
      Summary:
      Add to HISTORY.md the bug fixed in https://github.com/facebook/rocksdb/issues/10051
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10091
      
      Reviewed By: ajkr
      
      Differential Revision: D36821861
      
      Pulled By: cbi42
      
      fbshipit-source-id: 598812fab88f65c0147ece53cff55cf4ea73aac6
      463873f1
    • P
      Reduce risk of backup or checkpoint missing a WAL file (#10083) · a00cffaf
      Peter Dillinger 提交于
      Summary:
      We recently saw a case in crash test in which a WAL file in the
      middle of the list of live WALs was not included in the backup, so the
      DB was not openable due to missing WAL. We are not sure why, but this
      change should at least turn that into a backup-time failure by ensuring
      all the WAL files expected by the manifest (according to VersionSet) are
      included in `GetSortedWalFiles()` (used by `GetLiveFilesStorageInfo()`,
      `BackupEngine`, and `Checkpoint`)
      
      Related: to maximize the effectiveness of
      track_and_verify_wals_in_manifest with GetSortedWalFiles() during
      checkpoint/backup, we will now sync WAL in GetLiveFilesStorageInfo()
      when track_and_verify_wals_in_manifest=true.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10083
      
      Test Plan: added new unit test for the check in GetSortedWalFiles()
      
      Reviewed By: ajkr
      
      Differential Revision: D36791608
      
      Pulled By: pdillinger
      
      fbshipit-source-id: a27bcf0213fc7ab177760fede50d4375d579afa6
      a00cffaf
    • A
      Persist the new MANIFEST after successfully syncing the new WAL during recovery (#9922) · d04df275
      Akanksha Mahajan 提交于
      Summary:
      In case of non-TransactionDB and avoid_flush_during_recovery = true, RocksDB won't
      flush the data from WAL to L0 for all column families if possible. As a
      result, not all column families can increase their log_numbers, and
      min_log_number_to_keep won't change.
      For transaction DB (.allow_2pc), even with the flush, there may be old WAL files that it must not delete because they can contain data of uncommitted transactions and min_log_number_to_keep won't change.
      If we persist a new MANIFEST with
      advanced log_numbers for some column families, then during a second
      crash after persisting the MANIFEST, RocksDB will see some column
      families' log_numbers larger than the corrupted wal, and the "column family inconsistency" error will be hit, causing recovery to fail.
      
      As a solution, RocksDB will persist the new MANIFEST after successfully syncing the new WAL.
      If a future recovery starts from the new MANIFEST, then it means the new WAL is successfully synced. Due to the sentinel empty write batch at the beginning, kPointInTimeRecovery of WAL is guaranteed to go after this point.
      If future recovery starts from the old MANIFEST, it means the writing the new MANIFEST failed. We won't have the "SST ahead of WAL" error.
      Currently, RocksDB DB::Open() may creates and writes to two new MANIFEST files even before recovery succeeds. This PR buffers the edits in a structure and writes to a new MANIFEST after recovery is successful
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9922
      
      Test Plan:
      1. Update unit tests to fail without this change
      2. make crast_test -j
      
      Branch with unit test and no fix  https://github.com/facebook/rocksdb/pull/9942 to keep track of unit test (without fix)
      
      Reviewed By: riversand963
      
      Differential Revision: D36043701
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 5760970db0a0920fb73d3c054a4155733500acd9
      d04df275
  8. 01 6月, 2022 3 次提交
  9. 31 5月, 2022 1 次提交