1. 06 6月, 2022 1 次提交
  2. 05 6月, 2022 1 次提交
    • A
      CI Benchmarking with CircleCI Runner and OpenSearch Dashboard (EB 1088) (#9723) · 2f4a0ffe
      Alan Paxton 提交于
      Summary:
      CircleCI runner based benchmarking. A runner is a dedicate machine configured for CircleCI to perform work on. Our work is a repeatable benchmark, the `benchmark-linux` job in `config.yml`
      
      A runner, in CircleCI terminology, is a machine that is managed by the client (us) rather than running on CircleCI resources in the cloud. This means that we define and configure the iron, and that therefore the performance is repeatable and predictable. Which is what we need for performance regression benchmarking.
      
      On a time schedule (or on commit, during branch development) benchmarks are set off on the runner, and then a script is run `benchmark_log_tool.py` which parses the benchmark output and pushes it into a pre-configured OpenSearch document connected to an OpenSearch dashboard. Members of the team can examine benchmark performance changes on the dashboard.
      
      As time progresses we can add different benchmarks to the suite which gets run.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9723
      
      Reviewed By: pdillinger
      
      Differential Revision: D35555626
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: c6a905ca04494495c3784cfbb991f5ab90c807ee
      2f4a0ffe
  3. 04 6月, 2022 11 次提交
    • Y
      Add a simple example of backup and restore (#10054) · 560906ab
      yite.gu 提交于
      Summary:
      Add a simple example of backup and restore
      Signed-off-by: NYiteGu <ess_gyt@qq.com>
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10054
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36678141
      
      Pulled By: ajkr
      
      fbshipit-source-id: 43545356baddb4c2c76c62cd63d7a3238d1f8a00
      560906ab
    • L
      Add wide column serialization primitives (#9915) · e9c74bc4
      Levi Tamasi 提交于
      Summary:
      The patch adds some low-level logic that can be used to serialize/deserialize
      a sorted vector of wide columns to/from a simple binary searchable string
      representation. Currently, there is no user-facing API; this will be implemented in
      subsequent stages.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9915
      
      Test Plan: `make check`
      
      Reviewed By: siying
      
      Differential Revision: D35978076
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 33f5f6628ec3bcd8c8beab363b1978ac047a8788
      e9c74bc4
    • Y
      Point-lookup returns timestamps of Delete and SingleDelete (#10056) · 3e02c6e0
      Yanqin Jin 提交于
      Summary:
      If caller specifies a non-null `timestamp` argument in `DB::Get()` or a non-null `timestamps` in `DB::MultiGet()`,
      RocksDB will return the timestamps of the point tombstones.
      
      Note: DeleteRange is still unsupported.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10056
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D36677956
      
      Pulled By: riversand963
      
      fbshipit-source-id: 2d7af02cc7237b1829cd269086ea895a49d501ae
      3e02c6e0
    • H
      Increase ChargeTableReaderTest/ChargeTableReaderTest.Basic error tolerance... · 4bdcc801
      Hui Xiao 提交于
      Increase ChargeTableReaderTest/ChargeTableReaderTest.Basic error tolerance rate from 1% to 5% (#10113)
      
      Summary:
      **Context:**
      https://github.com/facebook/rocksdb/pull/9748 added support to charge table reader memory to block cache. In the test `ChargeTableReaderTest/ChargeTableReaderTest.Basic`, it estimated the table reader memory, calculated the expected number of table reader opened based on this estimation and asserted this number with actual number. The expected number of table reader opened calculated based on estimated table reader memory will not be 100% accurate and should have tolerance for error. It was previously set to 1% and recently encountered an assertion failure that `(opened_table_reader_num) <= (max_table_reader_num_capped_upper_bound), actual: 375 or 376 vs 374` where `opened_table_reader_num` is the actual opened one and `max_table_reader_num_capped_upper_bound` is the estimated opened one (=371 * 1.01). I believe it's safe to increase error tolerance from 1% to 5% hence there is this PR.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10113
      
      Test Plan: - CI again succeeds.
      
      Reviewed By: ajkr
      
      Differential Revision: D36911556
      
      Pulled By: hx235
      
      fbshipit-source-id: 259687dd77b450fea0f5658a5b567a1d31d4b1f7
      4bdcc801
    • Z
      cmake: add an option to skip thirdparty.inc on Windows (#10110) · c1018b75
      Zeyi (Rice) Fan 提交于
      Summary:
      When building RocksDB with getdeps on Windows, `thirdparty.inc` get in the way since `FindXXXX.cmake` are working properly now.
      
      This PR adds an option to skip that file when building RocksDB so we can disable it.
      
      FB: see [D36905191](https://www.internalfb.com/diff/D36905191).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10110
      
      Reviewed By: siying
      
      Differential Revision: D36913882
      
      Pulled By: fanzeyi
      
      fbshipit-source-id: 33d36841dc0d4fe87f51e1d9fd2b158a3adab88f
      c1018b75
    • L
      Fix some bugs in verify_random_db.sh (#10112) · 7d36bc42
      Levi Tamasi 提交于
      Summary:
      The patch attempts to fix three bugs in `verify_random_db.sh`:
      1) https://github.com/facebook/rocksdb/pull/9937 changed the default for
      `--try_load_options` to true in the script's use case, so we have to
      explicitly set it to false if the corresponding argument of the script
      is 0. This should fix the issue we've been seeing with our forward
      compatibility tests where 7.3 is unable to open a database created by
      the version on main after adding a new configuration option.
      2) The script seems to support two "extra parameters"; however,
      in practice, if the second one was set, only that one was passed on to
      `ldb`. Now both get forwarded.
      3) When running the `diff` command, the base DB directory was passed as
      the second argument instead of the file containing the `ldb` output
      (this actually seems to work, probably accidentally though).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10112
      
      Reviewed By: pdillinger
      
      Differential Revision: D36911363
      
      Pulled By: ltamasi
      
      fbshipit-source-id: fe29db4e28d373cee51a12322c59050fc50e926d
      7d36bc42
    • Y
      Fix a bug in WAL tracking (#10087) · d739de63
      Yanqin Jin 提交于
      Summary:
      Closing https://github.com/facebook/rocksdb/issues/10080
      
      When `SyncWAL()` calls `MarkLogsSynced()`, even if there is only one active WAL file,
      this event should still be added to the MANIFEST.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10087
      
      Test Plan: make check
      
      Reviewed By: ajkr
      
      Differential Revision: D36797580
      
      Pulled By: riversand963
      
      fbshipit-source-id: 24184c9dd606b3939a454ed41de6e868d1519999
      d739de63
    • G
      Add support for FastLRUCache in cache_bench (#10095) · eb99e080
      Guido Tagliavini Ponce 提交于
      Summary:
      cache_bench can now run with FastLRUCache.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10095
      
      Test Plan:
      - Temporarily add an ``assert(false)`` in the execution path that sets up the FastLRUCache. Run ``make -j24 cache_bench``. Then test the appropriate code is used by running ``./cache_bench -cache_type=fast_lru_cache`` and checking that the assert is called. Repeat for LRUCache.
      - Verify that FastLRUCache (currently a clone of LRUCache) has similar latency distribution than LRUCache, by comparing the outputs of ``./cache_bench -cache_type=fast_lru_cache`` and ``./cache_bench -cache_type=lru_cache``.
      
      Reviewed By: pdillinger
      
      Differential Revision: D36875834
      
      Pulled By: guidotag
      
      fbshipit-source-id: eb2ad0bb32c2717a258a6ac66ed736e06f826cd8
      eb99e080
    • Z
      Add default impl to dir close (#10101) · 21906d66
      zczhu 提交于
      Summary:
      As pointed by anand1976 in his [comment](https://github.com/facebook/rocksdb/pull/10049#pullrequestreview-994255819), previous implementation is not backward-compatible. In this implementation, the default implementation `return Status::NotSupported("Close")` or `return IOStatus::NotSupported("Close")` is added for `Close()` function for `*Directory` classes.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10101
      
      Test Plan: DBBasicTest.DBCloseAllDirectoryFDs
      
      Reviewed By: anand1976
      
      Differential Revision: D36899346
      
      Pulled By: littlepig2013
      
      fbshipit-source-id: 430624793362f330cbb8837960f0e8712a944ab9
      21906d66
    • G
      Add support for FastLRUCache in db_bench. (#10096) · cf856077
      Guido Tagliavini Ponce 提交于
      Summary:
      db_bench can now run with FastLRUCache.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10096
      
      Test Plan:
      - Temporarily add an ``assert(false)`` in the execution path that sets up the FastLRUCache. Run ``make -j24 db_bench``. Then test the appropriate code is used by running ``./db_bench -cache_type=fast_lru_cache`` and checking that the assert is called. Repeat for LRUCache.
      - Verify that FastLRUCache (currently a clone of LRUCache) produces similar benchmark data than LRUCache, by comparing the outputs of ``./db_bench -benchmarks=fillseq,fillrandom,readseq,readrandom -cache_type=fast_lru_cache`` and ``./db_bench -benchmarks=fillseq,fillrandom,readseq,readrandom -cache_type=lru_cache``.
      
      Reviewed By: gitbw95
      
      Differential Revision: D36898774
      
      Pulled By: guidotag
      
      fbshipit-source-id: f9f6b6f6da124f88b21b3c8dee742fbb04eff773
      cf856077
    • Y
      Temporarily disable wal compression (#10108) · 2b3c50c4
      Yanqin Jin 提交于
      Summary:
      Will re-enable after fixing the bug in https://github.com/facebook/rocksdb/issues/10099 and https://github.com/facebook/rocksdb/issues/10097.
      Right now, the priority is https://github.com/facebook/rocksdb/issues/10087, but the bug in WAL compression prevents the mini crash test from passing.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10108
      
      Reviewed By: pdillinger
      
      Differential Revision: D36897214
      
      Pulled By: riversand963
      
      fbshipit-source-id: d64dc52738222d5f66003f7731dc46eaeed812be
      2b3c50c4
  4. 03 6月, 2022 7 次提交
    • M
      Enhance to support more tuning options, and universal and integrated… (#9704) · 5506954b
      Mark Callaghan 提交于
      Summary:
      … BlobDB for all tests
      
      This does two big things:
      * provides more tuning options
      * supports universal and integrated BlobDB for all of the benchmarks that are leveled-only
      
      It does several smaller things, and I will list a few
      * sets l0_slowdown_writes_trigger which wasn't set before this diff.
      * improves readability in report.tsv by using smaller field names in the header
      * adds more columns to report.tsv
      
      report.tsv before this diff:
      ```
      ops_sec mb_sec  total_size_gb   level0_size_gb  sum_gb  write_amplification     write_mbps      usec_op percentile_50   percentile_75   percentile_99   percentile_99.9 percentile_99.99        uptime  stall_time      stall_percent   test_name       test_date      rocksdb_version  job_id
      823294  329.8   0.0     21.5    21.5    1.0     183.4   1.2     1.0     1.0     3       6       14      120     00:00:0.000     0.0     fillseq.wal_disabled.v400       2022-03-16T15:46:45.000-07:00   7.0
      326520  130.8   0.0     0.0     0.0     0.0     0       12.2    139.8   155.1   170     234     250     60      00:00:0.000     0.0     multireadrandom.t4      2022-03-16T15:48:47.000-07:00   7.0
      86313   345.7   0.0     0.0     0.0     0.0     0       46.3    44.8    50.6    75      84      108     60      00:00:0.000     0.0     revrangewhilewriting.t4 2022-03-16T15:50:48.000-07:00   7.0
      101294  405.7   0.0     0.1     0.1     1.0     1.6     39.5    40.4    45.9    64      75      103     62      00:00:0.000     0.0     fwdrangewhilewriting.t4 2022-03-16T15:52:50.000-07:00   7.0
      258141  103.4   0.0     0.1     1.2     18.2    19.8    15.5    14.3    18.1    28      34      48      62      00:00:0.000     0.0     readwhilewriting.t4     2022-03-16T15:54:51.000-07:00   7.0
      334690  134.1   0.0     7.6     18.7    4.2     308.8   12.0    11.8    13.7    21      30      62      62      00:00:0.000     0.0     overwrite.t4.s0 2022-03-16T15:56:53.000-07:00   7.0
      ```
      report.tsv with this diff:
      ```
      ops_sec mb_sec  lsm_sz  blob_sz c_wgb   w_amp   c_mbps  c_wsecs c_csecs b_rgb   b_wgb   usec_op p50     p99     p99.9   p99.99  pmax    uptime  stall%  Nstall  u_cpu   s_cpu   rss     test    date    version job_id
      831144  332.9   22GB    0.0GB,  21.7    1.0     185.1   264     262     0       0       1.2     1.0     3       6       14      9198    120     0.0     0       0.4     0.0     0.7     fillseq.wal_disabled.v400       2022-03-16T16:21:23     7.0
      325229  130.3   22GB    0.0GB,  0.0             0.0     0       0       0       0       12.3    139.8   170     237     249     572     60      0.0     0       0.4     0.1     1.2     multireadrandom.t4      2022-03-16T16:23:25     7.0
      312920  125.3   26GB    0.0GB,  11.1    2.6     189.3   115     113     0       0       12.8    11.8    21      34      1255    6442    60      0.2     1       0.7     0.1     0.6     overwritesome.t4.s0     2022-03-16T16:25:27     7.0
      81698   327.2   25GB    0.0GB,  0.0             0.0     0       0       0       0       48.9    46.2    79      246     369     9445    60      0.0     0       0.4     0.1     1.4     revrangewhilewriting.t4 2022-03-16T16:30:21     7.0
      92484   370.4   25GB    0.0GB,  0.1     1.5     1.1     1       0       0       0       43.2    42.3    75      103     110     9512    62      0.0     0       0.4     0.1     1.4     fwdrangewhilewriting.t4 2022-03-16T16:32:24     7.0
      241661  96.8    25GB    0.0GB,  0.1     1.5     1.1     1       0       0       0       16.5    17.1    30      34      49      9092    62      0.0     0       0.4     0.1     1.4     readwhilewriting.t4     2022-03-16T16:34:27     7.0
      305234  122.3   30GB    0.0GB,  12.1    2.7     201.7   127     124     0       0       13.1    11.8    21      128     1934    6339    62      0.0     0       0.7     0.1     0.7     overwrite.t4.s0 2022-03-16T16:36:30     7.0
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9704
      
      Test Plan: run it
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36864627
      
      Pulled By: mdcallag
      
      fbshipit-source-id: d5af1cfc258a16865210163fa6fd1b803ab1a7d3
      5506954b
    • L
      Fix Java build (#10105) · 7b2c0140
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10105
      
      Reviewed By: cbi42
      
      Differential Revision: D36891073
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 16487ec708fc96add2a1ebc2d98f6439dfc852ca
      7b2c0140
    • L
      Fix LITE build (#10106) · b8fe7df2
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10106
      
      Reviewed By: cbi42
      
      Differential Revision: D36891284
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 304ffa84549201659feb0b74d6ba54a83f08906b
      b8fe7df2
    • Z
      Add comments/permit unchecked error to close_db_dir pull requests (#10093) · e88d8935
      zczhu 提交于
      Summary:
      In [close_db_dir](https://github.com/facebook/rocksdb/pull/10049) pull request, some merging conflicts occurred (some comments and one line `s.PermitUncheckedError()` are missing). This pull request aims to put them back.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10093
      
      Reviewed By: ajkr
      
      Differential Revision: D36884117
      
      Pulled By: littlepig2013
      
      fbshipit-source-id: 8c0e2a8793fc52804067c511843bd1ff4912c1c3
      e88d8935
    • Y
      Install zstd on CircleCI linux (#10102) · ed50ccd1
      Yanqin Jin 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10102
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36885468
      
      Pulled By: riversand963
      
      fbshipit-source-id: 6ed5b62dda8fe0f4be4b66d09bdec0134cf4500c
      ed50ccd1
    • G
      Make it possible to enable blob files starting from a certain LSM tree level (#10077) · e6432dfd
      Gang Liao 提交于
      Summary:
      Currently, if blob files are enabled (i.e. `enable_blob_files` is true), large values are extracted both during flush/recovery (when SST files are written into level 0 of the LSM tree) and during compaction into any LSM tree level. For certain use cases that have a mix of short-lived and long-lived values, it might make sense to support extracting large values only during compactions whose output level is greater than or equal to a specified LSM tree level (e.g. compactions into L1/L2/... or above). This could reduce the space amplification caused by large values that are turned into garbage shortly after being written at the price of some write amplification incurred by long-lived values whose extraction to blob files is delayed.
      
      In order to achieve this, we would like to do the following:
      - Add a new configuration option `blob_file_starting_level` (default: 0) to `AdvancedColumnFamilyOptions` (and `MutableCFOptions` and extend the related logic)
      - Instantiate `BlobFileBuilder` in `BuildTable` (used during flush and recovery, where the LSM tree level is L0) and `CompactionJob` iff `enable_blob_files` is set and the LSM tree level is `>= blob_file_starting_level`
      - Add unit tests for the new functionality, and add the new option to our stress tests (`db_stress` and `db_crashtest.py` )
      - Add the new option to our benchmarking tool `db_bench` and the BlobDB benchmark script `run_blob_bench.sh`
      - Add the new option to the `ldb` tool (see https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool)
      - Ideally extend the C and Java bindings with the new option
      - Update the BlobDB wiki to document the new option.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10077
      
      Reviewed By: ltamasi
      
      Differential Revision: D36884156
      
      Pulled By: gangliao
      
      fbshipit-source-id: 942bab025f04633edca8564ed64791cb5e31627d
      e6432dfd
    • J
      Add kLastTemperature as temperature high bound (#10044) · a0200315
      Jay Zhuang 提交于
      Summary:
      Only used as temperature high bound for current code, may
      increase with more temperatures added.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10044
      
      Test Plan: ci
      
      Reviewed By: siying
      
      Differential Revision: D36633410
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: eecdfa7623c31778c31d789902eacf78aad7b482
      a0200315
  5. 02 6月, 2022 8 次提交
    • G
      Support specifying blob garbage collection parameters when CompactRange() (#10073) · 3dc6ebaf
      Gang Liao 提交于
      Summary:
      Garbage collection is generally controlled by the BlobDB configuration options `enable_blob_garbage_collection` and `blob_garbage_collection_age_cutoff`. However, there might be use cases where we would want to temporarily override these options while performing a manual compaction. (One use case would be doing a full key-space manual compaction with full=100% garbage collection age cutoff in order to minimize the space occupied by the database.) Our goal here is to make it possible to override the configured GC parameters when using the `CompactRange` API to perform manual compactions. This PR would involve:
      
      - Extending the `CompactRangeOptions` structure so clients can both force-enable and force-disable GC, as well as use a different cutoff than what's currently configured
      - Storing whether blob GC should actually be enabled during a certain manual compaction and the cutoff to use in the `Compaction` object (considering the above overrides) and passing it to `CompactionIterator` via `CompactionProxy`
      - Updating the BlobDB wiki to document the new options.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10073
      
      Test Plan: Adding unit tests and adding the new options to the stress test tool.
      
      Reviewed By: ltamasi
      
      Differential Revision: D36848700
      
      Pulled By: gangliao
      
      fbshipit-source-id: c878ef101d1c612429999f513453c319f75d78e9
      3dc6ebaf
    • Z
      Explicitly closing all directory file descriptors (#10049) · 65893ad9
      Zichen Zhu 提交于
      Summary:
      Currently, the DB directory file descriptor is left open until the deconstruction process (`DB::Close()` does not close the file descriptor). To verify this, comment out the lines between `db_ = nullptr` and `db_->Close()` (line 512, 513, 514, 515 in ldb_cmd.cc) to leak the ``db_'' object, build `ldb` tool and run
      ```
      strace --trace=open,openat,close ./ldb --db=$TEST_TMPDIR --ignore_unknown_options put K1 V1 --create_if_missing
      ```
      There is one directory file descriptor that is not closed in the strace log.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10049
      
      Test Plan: Add a new unit test DBBasicTest.DBCloseAllDirectoryFDs: Open a database with different WAL directory and three different data directories, and all directory file descriptors should be closed after calling Close(). Explicitly call Close() after a directory file descriptor is not used so that the counter of directory open and close should be equivalent.
      
      Reviewed By: ajkr, hx235
      
      Differential Revision: D36722135
      
      Pulled By: littlepig2013
      
      fbshipit-source-id: 07bdc2abc417c6b30997b9bbef1f79aa757b21ff
      65893ad9
    • G
      Add support for FastLRUCache in stress and crash tests. (#10081) · b4d0e041
      Guido Tagliavini Ponce 提交于
      Summary:
      Stress tests can run with the experimental FastLRUCache. Crash tests randomly choose between LRUCache and FastLRUCache.
      
      Since only LRUCache supports a secondary cache, we validate the `--secondary_cache_uri` and `--cache_type` flags---when `--secondary_cache_uri` is set, the `--cache_type` is set to `lru_cache`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10081
      
      Test Plan:
      - To test that the FastLRUCache is used and the stress test runs successfully, run `make -j24 CRASH_TEST_EXT_ARGS=—duration=960 blackbox_crash_test_with_atomic_flush`. The cache type should sometimes be `fast_lru_cache`.
      - To test the flag validation, run `make -j24 CRASH_TEST_EXT_ARGS="--duration=960 --secondary_cache_uri=x" blackbox_crash_test_with_atomic_flush` multiple times. The test will always be aborted (which is okay). Check that the cache type is always `lru_cache`.
      
      Reviewed By: anand1976
      
      Differential Revision: D36839908
      
      Pulled By: guidotag
      
      fbshipit-source-id: ebcdfdcd12ec04c96c09ae5b9c9d1e613bdd1725
      b4d0e041
    • A
      Update History.md for #9922 (#10092) · 45b1c788
      Akanksha Mahajan 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10092
      
      Reviewed By: riversand963
      
      Differential Revision: D36832311
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 8fb1cf90b1d4dddebbfbeebeddb15f6905968e9b
      45b1c788
    • J
      Get current LogFileNumberSize the same as log_writer (#10086) · 5864900c
      Jay Zhuang 提交于
      Summary:
      `db_impl.alive_log_files_` is used to track the WAL size in `db_impl.logs_`.
      Get the `LogFileNumberSize` obj in `alive_log_files_` the same time as `log_writer` to keep them consistent.
      For this issue, it's not safe to do `deque::reverse_iterator::operator*` and `deque::pop_front()` concurrently,
      so remove the tail cache.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10086
      
      Test Plan:
      ```
      # on Windows
      gtest-parallel ./db_test --gtest_filter=DBTest.FileCreationRandomFailure -r 1000 -w 100
      ```
      
      Reviewed By: riversand963
      
      Differential Revision: D36822373
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 5e738051dfc7bcf6a15d85ba25e6365df6b6a6af
      5864900c
    • C
      Add bug fix to HISTORY.md (#10091) · 463873f1
      Changyu Bi 提交于
      Summary:
      Add to HISTORY.md the bug fixed in https://github.com/facebook/rocksdb/issues/10051
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10091
      
      Reviewed By: ajkr
      
      Differential Revision: D36821861
      
      Pulled By: cbi42
      
      fbshipit-source-id: 598812fab88f65c0147ece53cff55cf4ea73aac6
      463873f1
    • P
      Reduce risk of backup or checkpoint missing a WAL file (#10083) · a00cffaf
      Peter Dillinger 提交于
      Summary:
      We recently saw a case in crash test in which a WAL file in the
      middle of the list of live WALs was not included in the backup, so the
      DB was not openable due to missing WAL. We are not sure why, but this
      change should at least turn that into a backup-time failure by ensuring
      all the WAL files expected by the manifest (according to VersionSet) are
      included in `GetSortedWalFiles()` (used by `GetLiveFilesStorageInfo()`,
      `BackupEngine`, and `Checkpoint`)
      
      Related: to maximize the effectiveness of
      track_and_verify_wals_in_manifest with GetSortedWalFiles() during
      checkpoint/backup, we will now sync WAL in GetLiveFilesStorageInfo()
      when track_and_verify_wals_in_manifest=true.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10083
      
      Test Plan: added new unit test for the check in GetSortedWalFiles()
      
      Reviewed By: ajkr
      
      Differential Revision: D36791608
      
      Pulled By: pdillinger
      
      fbshipit-source-id: a27bcf0213fc7ab177760fede50d4375d579afa6
      a00cffaf
    • A
      Persist the new MANIFEST after successfully syncing the new WAL during recovery (#9922) · d04df275
      Akanksha Mahajan 提交于
      Summary:
      In case of non-TransactionDB and avoid_flush_during_recovery = true, RocksDB won't
      flush the data from WAL to L0 for all column families if possible. As a
      result, not all column families can increase their log_numbers, and
      min_log_number_to_keep won't change.
      For transaction DB (.allow_2pc), even with the flush, there may be old WAL files that it must not delete because they can contain data of uncommitted transactions and min_log_number_to_keep won't change.
      If we persist a new MANIFEST with
      advanced log_numbers for some column families, then during a second
      crash after persisting the MANIFEST, RocksDB will see some column
      families' log_numbers larger than the corrupted wal, and the "column family inconsistency" error will be hit, causing recovery to fail.
      
      As a solution, RocksDB will persist the new MANIFEST after successfully syncing the new WAL.
      If a future recovery starts from the new MANIFEST, then it means the new WAL is successfully synced. Due to the sentinel empty write batch at the beginning, kPointInTimeRecovery of WAL is guaranteed to go after this point.
      If future recovery starts from the old MANIFEST, it means the writing the new MANIFEST failed. We won't have the "SST ahead of WAL" error.
      Currently, RocksDB DB::Open() may creates and writes to two new MANIFEST files even before recovery succeeds. This PR buffers the edits in a structure and writes to a new MANIFEST after recovery is successful
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9922
      
      Test Plan:
      1. Update unit tests to fail without this change
      2. make crast_test -j
      
      Branch with unit test and no fix  https://github.com/facebook/rocksdb/pull/9942 to keep track of unit test (without fix)
      
      Reviewed By: riversand963
      
      Differential Revision: D36043701
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 5760970db0a0920fb73d3c054a4155733500acd9
      d04df275
  6. 01 6月, 2022 3 次提交
  7. 31 5月, 2022 4 次提交
  8. 28 5月, 2022 1 次提交
    • J
      Fix compile error in Clang 13 (#10033) · 4eb7b35f
      Jaepil Jeong 提交于
      Summary:
      This PR fixes the following compilation error in Clang 13, which was tested on macOS 12.4.
      
      ```
      ❯ ninja clean && ninja
      [1/1] Cleaning all built files...
      Cleaning... 0 files.
      [198/315] Building CXX object CMakeFiles/rocksdb.dir/util/cleanable.cc.o
      FAILED: CMakeFiles/rocksdb.dir/util/cleanable.cc.o
      ccache /opt/homebrew/opt/llvm/bin/clang++ -DGFLAGS=1 -DGFLAGS_IS_A_DLL=0 -DHAVE_FULLFSYNC -DJEMALLOC_NO_DEMANGLE -DLZ4 -DOS_MACOSX -DROCKSDB_JEMALLOC -DROCKSDB_LIB_IO_POSIX -DROCKSDB_NO_DYNAMIC_EXTENSION -DROCKSDB_PLATFORM_POSIX -DSNAPPY -DTBB -DZLIB -DZSTD -I/Users/jaepil/work/deepsearch/deps/cpp/rocksdb -I/Users/jaepil/work/deepsearch/deps/cpp/rocksdb/include -I/Users/jaepil/app/include -I/opt/homebrew/include -I/opt/homebrew/opt/llvm/include -W -Wextra -Wall -pthread -Wsign-compare -Wshadow -Wno-unused-parameter -Wno-unused-variable -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -Wno-strict-aliasing -Wno-invalid-offsetof -fno-omit-frame-pointer -momit-leaf-frame-pointer -march=armv8-a+crc+crypto -Wno-unused-function -Werror -O3 -DNDEBUG -DROCKSDB_USE_RTTI -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk -std=gnu++17 -MD -MT CMakeFiles/rocksdb.dir/util/cleanable.cc.o -MF CMakeFiles/rocksdb.dir/util/cleanable.cc.o.d -o CMakeFiles/rocksdb.dir/util/cleanable.cc.o -c /Users/jaepil/work/deepsearch/deps/cpp/rocksdb/util/cleanable.cc
      /Users/jaepil/work/deepsearch/deps/cpp/rocksdb/util/cleanable.cc:24:65: error: no member named 'move' in namespace 'std'
      Cleanable::Cleanable(Cleanable&& other) noexcept { *this = std::move(other); }
                                                                 ~~~~~^
      /Users/jaepil/work/deepsearch/deps/cpp/rocksdb/util/cleanable.cc:126:16: error: no member named 'move' in namespace 'std'
        *this = std::move(from);
                ~~~~~^
      2 errors generated.
      [209/315] Building CXX object CMakeFiles/rocksdb.dir/tools/block_cache_analyzer/block_cache_trace_analyzer.cc.o
      ninja: build stopped: subcommand failed.
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10033
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36580562
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0f6b241d186ed528ad62d259af2857d2c2b4ded1
      4eb7b35f
  9. 27 5月, 2022 4 次提交
    • Y
      Fail DB::Open() if logger cannot be created (#9984) · 514f0b09
      Yanqin Jin 提交于
      Summary:
      For regular db instance and secondary instance, we return error and refuse to open DB if Logger creation fails.
      
      Our current code allows it, but it is really difficult to debug because
      there will be no LOG files. The same for OPTIONS file, which will be explored in another PR.
      
      Furthermore, Arena::AllocateAligned(size_t bytes, size_t huge_page_size, Logger* logger) has an
      assertion as the following:
      
      ```cpp
      #ifdef MAP_HUGETLB
      if (huge_page_size > 0 && bytes > 0) {
        assert(logger != nullptr);
      }
      #endif
      ```
      
      It can be removed.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9984
      
      Test Plan: make check
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36347754
      
      Pulled By: riversand963
      
      fbshipit-source-id: 529798c0511d2eaa2f0fd40cf7e61c4cbc6bc57e
      514f0b09
    • G
      Pass the size of blob files to SstFileManager during DB open (#10062) · e2285157
      Gang Liao 提交于
      Summary:
      RocksDB uses the (no longer aptly named) SST file manager (see https://github.com/facebook/rocksdb/wiki/Managing-Disk-Space-Utilization) to track and potentially limit the space used by SST and blob files (as well as to rate-limit the deletion of these data files). The SST file manager tracks the SST and blob file sizes in an in-memory hash map, which has to be rebuilt during DB open. File sizes can be generally obtained by querying the file system; however, there is a performance optimization possibility here since the sizes of SST and blob files are also tracked in the RocksDB MANIFEST, so we can simply pass the file sizes stored there instead of consulting the file system for each file. Currently, this optimization is only implemented for SST files; we would like to extend it to blob files as well.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10062
      
      Test Plan:
      Add unit tests for the change to the test suite
      ltamasi riversand963  akankshamahajan15
      
      Reviewed By: ltamasi
      
      Differential Revision: D36726621
      
      Pulled By: gangliao
      
      fbshipit-source-id: 4010dc46ef7306142f1c2e0d1c3bf75b196ef82a
      e2285157
    • Y
      Add timestamp support to secondary instance (#10061) · 8c4ea7b8
      Yu Zhang 提交于
      Summary:
      This PR adds timestamp support to the secondary DB instance.
      
      With this, these timestamp related APIs are supported:
      
      ReadOptions.timestamp : read should return the latest data visible to this specified timestamp
      Iterator::timestamp() : returns the timestamp associated with the key, value
      DB:Get(..., std::string* timestamp) : returns the timestamp associated with the key, value in timestamp
      
      Test plan (on devserver):
      ```
      $COMPILE_WITH_ASAN=1 make -j24 all
      $./db_secondary_test --gtest_filter=DBSecondaryTestWithTimestamp*
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10061
      
      Reviewed By: riversand963
      
      Differential Revision: D36722915
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: 644ada39e4e51164a759593478c38285e0c1a666
      8c4ea7b8
    • A
      Disable file ingestion in crash test for CF consistency (#10067) · f6e45382
      Andrew Kryczka 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10067
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36727948
      
      Pulled By: ajkr
      
      fbshipit-source-id: a3502730412c01ba63d822a5d4bf56f8bae8fcb2
      f6e45382