1. 24 7月, 2021 1 次提交
    • M
      Checkpoint dir options fix (#8572) · 55f7ded8
      Merlin Mao 提交于
      Summary:
      Originally the 2 options `db_log_dir` and `wal_dir` will be reused in a snapshot db since the options files are just copied. By default, if `wal_dir` was not set when a db was created, it is set to the db's dir. Therefore, the snapshot db will use the same WAL dir. If both the original db and the snapshot db write to or delete from the WAL dir, one may modify or delete files which belong to the other. The same applies to `db_log_dir` as well, but as info log files are not copied or linked, it is simpler for this option.
      
      2 arguments are added to `Checkpoint::CreateCheckpoint()`, allowing to override these 2 options.
      
      `wal_dir`:  If the function argument `wal_dir` is empty, or set to the original db location, or the checkpoint location, the snapshot's `wal_dir` option will be updated to the checkpoint location. Otherwise, the absolute path specified in the argument will be used. During checkpointing, live WAL files will be copied or linked the new location, instead of the current WAL dir specified in the original db.
      
      `db_log_dir`: Same as `wal_dir`, but no files will be copied or linked.
      
      A new unit test was added: `CheckpointTest.CheckpointWithOptionsDirsTest`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8572
      
      Test Plan:
      New unit test
      ```
      checkpoint_test --gtest_filter="CheckpointTest.CheckpointWithOptionsDirsTest"
      ```
      
      Output
      ```
      Note: Google Test filter = CheckpointTest.CheckpointWithOptionsDirsTest
      [==========] Running 1 test from 1 test case.
      [----------] Global test environment set-up.
      [----------] 1 test from CheckpointTest
      [ RUN      ] CheckpointTest.CheckpointWithOptionsDirsTest
      [       OK ] CheckpointTest.CheckpointWithOptionsDirsTest (11712 ms)
      [----------] 1 test from CheckpointTest (11712 ms total)
      
      [----------] Global test environment tear-down
      [==========] 1 test from 1 test case ran. (11713 ms total)
      [  PASSED  ] 1 test.
      ```
      This test will fail without this patch. Just modify the code to remove the 2 arguments introduced in this patch in `CreateCheckpoint()`.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29832761
      
      Pulled By: autopear
      
      fbshipit-source-id: e6a639b4d674380df82998c0839e79cab695fe29
      55f7ded8
  2. 23 7月, 2021 5 次提交
    • D
      Fix a minor issue with initializing the test path (#8555) · 3b277252
      Drewryz 提交于
      Summary:
      The PerThreadDBPath has already specified a slash. It does not need to be specified when initializing the test path.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8555
      
      Reviewed By: ajkr
      
      Differential Revision: D29758399
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 6d2b878523e3e8580536e2829cb25489844d9011
      3b277252
    • B
      Retire superfluous functions introduced in earlier mempurge PRs. (#8558) · c521a9ab
      Baptiste Lemaire 提交于
      Summary:
      The main challenge to make the memtable garbage collection prototype (nicknamed `mempurge`) was to not get rid of WAL files that contain unflushed (but mempurged) data. That was successfully guaranteed by not writing the VersionEdit to the MANIFEST file after a successful mempurge.
      By not writing VersionEdits to the `MANIFEST` file after a succesful mempurge operation, we do not change the earliest log file number that contains unflushed data: `cfd->GetLogNumber()` (`cfd->SetLogNumber()` is only called in `VersionSet::ProcessManifestWrites`). As a result, a number of functions introduced earlier just for the mempurge operation are not obscolete/redundant. (e.g.: `FlushJob::ExtractEarliestLogFileNumber`), and this PR aims at cleaning up all these now-unnecessary functions. In particular, we no longer need to store the earliest log file number in the `MemTable` struct itself. This PR therefore also reverts the `MemTable` struct to its original form.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8558
      
      Test Plan: Already included in `db_flush_test.cc`.
      
      Reviewed By: anand1976
      
      Differential Revision: D29764351
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 0f43b260fa270251862512f397d3f24ee62e8437
      c521a9ab
    • Z
      Analyze MultiGet in trace_analyzer (#8575) · 61c9bd49
      Zhichao Cao 提交于
      Summary:
      Now we can analyze the MultiGet queries in the trace file and generate a set of the statistic and analysis files. Note that, when one MultiGet access N keys, we count each sub-get-query individually. But the over all query number is still the MultiGet not the sub-get-query.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8575
      
      Test Plan: added new unit test and make check
      
      Reviewed By: anand1976
      
      Differential Revision: D29860633
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: a132128527f36828d266df8e36e3ec626c2170be
      61c9bd49
    • Y
      Return error if trying to open secondary on missing or inaccessible primary (#8200) · 2e538817
      Yanqin Jin 提交于
      Summary:
      If the primary's CURRENT file is missing or inaccessible, the secondary should not hang
      trying repeatedly to switch to the next MANIFEST.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8200
      
      Test Plan: make check
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D27840627
      
      Pulled By: riversand963
      
      fbshipit-source-id: 071fed97cbab1bc5cdefd1dc235e5cd406c174e1
      2e538817
    • J
      Fix an race condition during multiple DB opening (#8574) · c4a503f3
      Jay Zhuang 提交于
      Summary:
      ObjectLibrary is shared between multiple DB instances, the
      Register() could have race condition.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8574
      
      Test Plan: pass the failed test
      
      Reviewed By: ajkr
      
      Differential Revision: D29855096
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 541eed0bd495d2c963d858d81e7eabf1ba16153c
      c4a503f3
  3. 22 7月, 2021 4 次提交
    • P
      Remove TaskLimiterToken::ReleaseOnce for fix (#8567) · 84eef260
      Peter Dillinger 提交于
      Summary:
      Rare TSAN and valgrind failures are caused by unnecessary
      reading of a field on the TaskLimiterToken::limiter_ for an assertion
      after the token has been released and the limiter destroyed. To simplify
      we can simply destroy the token before triggering DB shutdown
      (potentially destroying the limiter). This makes the ReleaseOnce logic
      unnecessary.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8567
      
      Test Plan: watch for more failures in CI
      
      Reviewed By: ajkr
      
      Differential Revision: D29811795
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 135549ebb98fe4f176d1542ed85d5bd6350a40b3
      84eef260
    • S
      Complete the fix of stress open WAL drop fix (#8570) · 9b41082d
      sdong 提交于
      Summary:
      https://github.com/facebook/rocksdb/pull/8548 is not complete. We should instead cover all cases writable files are buffered, not just when failures are ingested. Extend it to any case where failures are ingested in DB open.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8570
      
      Test Plan: Run db_stress and see it doesn't break
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29830415
      
      fbshipit-source-id: 94449a0468fb2f7eec17423724008c9c63b2445d
      9b41082d
    • J
      Avoid updating option if there's no value updated (#8518) · 42eaa45c
      Jay Zhuang 提交于
      Summary:
      Try avoid expensive updating options operation if
      `SetDBOptions()` does not change any option value.
      Skip updating is not guaranteed, for example, changing `bytes_per_sync`
      to `0` may still trigger updating, as the value could be sanitized.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8518
      
      Test Plan: added unittest
      
      Reviewed By: riversand963
      
      Differential Revision: D29672639
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: b7931de62ceea6f1bdff0d1209adf1197d3ed1f4
      42eaa45c
    • B
      Add overwrite_probability for filluniquerandom benchmark in db_bench (#8569) · 6b4cdacf
      Baptiste Lemaire 提交于
      Summary:
      Add flags `overwrite_probability` and `overwrite_window_size` flag to `db_bench`.
      Add the possibility of performing a `filluniquerandom` benchmark with an overwrite probability.
      For each write operation, there is a probability _p_ that the write is an overwrite (_p_=`overwrite_probability`).
      When an overwrite is decided, the key is randomly chosen from the last _N_ keys previously inserted into the DB (with _N_=`overwrite_window_size`).
      When a pure write is decided, the key inserted into the DB is unique and therefore will not be an overwrite.
      The `overwrite_window_size` is used so that the user can decide if the overwrite are mostly targeting recently inserted keys (when `overwrite_window_size` is small compared to the total number of writes), or can also target keys inserted "a long time ago" (when `overwrite_window_size` is comparable to total number of writes).
      Note that total number of writes = # of unique insertions + # of overwrites.
      No unit test specifically added.
      Local testing show the following **throughputs** for `filluniquerandom` with 1M total writes:
      - bypass the code inserts (no `overwrite_probability` flag specified): ~14.0MB/s
      - `overwrite_probability=0.99`, `overwrite_window_size=10`: ~17.0MB/s
      - `overwrite_probability=0.10`, `overwrite_window_size=10`: ~14.0MB/s
      - `overwrite_probability=0.99`, `overwrite_window_size=1M`: ~14.5MB/s
      - `overwrite_probability=0.10`, `overwrite_window_size=1M`: ~14.0MB/s
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8569
      
      Reviewed By: pdillinger
      
      Differential Revision: D29818631
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: d472b4ea4e457a4da7c4ee4f14b40cccd6a4587a
      6b4cdacf
  4. 21 7月, 2021 2 次提交
  5. 20 7月, 2021 3 次提交
  6. 19 7月, 2021 1 次提交
  7. 17 7月, 2021 9 次提交
    • S
      Change to code for trimmed memtable history is to released outside DB mutex (#8530) · 9e885939
      sdong 提交于
      Summary:
      Currently, the code shows that we delete memtables immedately after it is trimmed from history. Although it should never happen as the super version still holds the memtable, which is only switched after it, it feels a good practice not to do it, but use clean it up in the standard way: put it to WriteContext and clean it after DB mutex.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8530
      
      Test Plan: Run all existing tests.
      
      Reviewed By: ajkr
      
      Differential Revision: D29703410
      
      fbshipit-source-id: 21d8068ac6377de4b6fa7a89697195742659fde4
      9e885939
    • J
      Update HISTORY.md and version.h 6.23 release (#8552) · c04a86a0
      Jay Zhuang 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8552
      
      Reviewed By: ajkr
      
      Differential Revision: D29746828
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 17d564895ae9cb675d455e73626b9a6717db6279
      c04a86a0
    • M
      Remove extra double quote in options.h (#8550) · 3455ab0e
      Merlin Mao 提交于
      Summary:
      There is an extra "  in options.h (`"index block""`)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8550
      
      Test Plan: None
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29746077
      
      Pulled By: autopear
      
      fbshipit-source-id: 2e5117296e5414b7c7440d990926bc1e567a0b4f
      3455ab0e
    • S
      DB Stress Reopen write failure to skip WAL (#8548) · 39a07c96
      sdong 提交于
      Summary:
      When DB Stress enables write failure in reopen, WAL files are also created with a wrapper writalbe file which buffers write until fsync. However, crash test currently expects all writes to WAL is persistent. This is at odd with the unsynced bytes dropped. To work it around temporarily, we disable WAL write failure for now.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8548
      
      Test Plan: Run db_stress. Manual printf to make sure only WAL files are skipped.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29745095
      
      fbshipit-source-id: 1879dd2c01abad7879ca243ee94570ec47c347f3
      39a07c96
    • J
      Minor Makefile update to exclude microbench as dependency (#8523) · a379dae4
      Jay Zhuang 提交于
      Summary:
      Otherwise the build may report warning about missing
      `benchmark.h` for some targets, the error won't break the build.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8523
      
      Test Plan:
      `make blackbox_ubsan_crash_test` on a machine without
      benchmark lib installed.
      
      Reviewed By: pdillinger
      
      Differential Revision: D29682478
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: e1261fbcda46bc6bd3cd39b7b03b7f78927d0430
      a379dae4
    • M
      Allow CreateFromString to work on complex URIs (#8547) · ac37bfde
      mrambacher 提交于
      Summary:
      Some URIs for creating instances (ala SecondaryCache) use complex URIs like (cache://name;prop=value).  These URIs were treated as name-value properties.  With this change, if the URI does not contain an "id=XX" setting, it will be treated as a single string value (and not an ID and map of name-value properties).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8547
      
      Reviewed By: anand1976
      
      Differential Revision: D29741386
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 0621f62bec3a6699a7b66c7c0b5634b2856653aa
      ac37bfde
    • P
      Don't hold DB mutex for block cache entry stat scans (#8538) · df5dc73b
      Peter Dillinger 提交于
      Summary:
      I previously didn't notice the DB mutex was being held during
      block cache entry stat scans, probably because I primarily checked for
      read performance regressions, because they require the block cache and
      are traditionally latency-sensitive.
      
      This change does some refactoring to avoid holding DB mutex and to
      avoid triggering and waiting for a scan in GetProperty("rocksdb.cfstats").
      Some tests have to be updated because now the stats collector is
      populated in the Cache aggressively on DB startup rather than lazily.
      (I hope to clean up some of this added complexity in the future.)
      
      This change also ensures proper treatment of need_out_of_mutex for
      non-int DB properties.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8538
      
      Test Plan:
      Added unit test logic that uses sync points to fail if the DB mutex
      is held during a scan, covering the various ways that a scan might be
      triggered.
      
      Performance test - the known impact to holding the DB mutex is on
      TransactionDB, and the easiest way to see the impact is to hack the
      scan code to almost always miss and take an artificially long time
      scanning. Here I've injected an unconditional 5s sleep at the call to
      ApplyToAllEntries.
      
      Before (hacked):
      
          $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     433.219 micros/op 2308 ops/sec;    0.1 MB/s ( transactions:78999 aborts:0)
          rocksdb.db.write.micros P50 : 16.135883 P95 : 36.622503 P99 : 66.036115 P100 : 5000614.000000 COUNT : 149677 SUM : 8364856
          $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     448.802 micros/op 2228 ops/sec;    0.1 MB/s ( transactions:75999 aborts:0)
          rocksdb.db.write.micros P50 : 16.629221 P95 : 37.320607 P99 : 72.144341 P100 : 5000871.000000 COUNT : 143995 SUM : 13472323
      
      Notice the 5s P100 write time.
      
      After (hacked):
      
          $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     303.645 micros/op 3293 ops/sec;    0.1 MB/s ( transactions:98999 aborts:0)
          rocksdb.db.write.micros P50 : 16.061871 P95 : 33.978834 P99 : 60.018017 P100 : 616315.000000 COUNT : 187619 SUM : 4097407
          $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     310.383 micros/op 3221 ops/sec;    0.1 MB/s ( transactions:96999 aborts:0)
          rocksdb.db.write.micros P50 : 16.270026 P95 : 35.786844 P99 : 64.302878 P100 : 603088.000000 COUNT : 183819 SUM : 4095918
      
      P100 write is now ~0.6s. Not good, but it's the same even if I completely bypass all the scanning code:
      
          $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     311.365 micros/op 3211 ops/sec;    0.1 MB/s ( transactions:96999 aborts:0)
          rocksdb.db.write.micros P50 : 16.274362 P95 : 36.221184 P99 : 68.809783 P100 : 649808.000000 COUNT : 183819 SUM : 4156767
          $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     308.395 micros/op 3242 ops/sec;    0.1 MB/s ( transactions:97999 aborts:0)
          rocksdb.db.write.micros P50 : 16.106222 P95 : 37.202403 P99 : 67.081875 P100 : 598091.000000 COUNT : 185714 SUM : 4098832
      
      No substantial difference.
      
      Reviewed By: siying
      
      Differential Revision: D29738847
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 1c5c155f5a1b62e4fea0fd4eeb515a8b7474027b
      df5dc73b
    • S
      db_bench seekrandom with multiDB should only create iterators queried (#7818) · 1e5b631e
      sdong 提交于
      Summary:
      Right now, db_bench with seekrandom and multiple DB setup creates iterator for all DBs just to query one of them. It's different from most real workloads. Fix it by only creating iterators that will be queried.
      
      Also fix a bug that DBs are not destroyed in multi-DB mode.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7818
      
      Test Plan: Run db_bench with single/multiDB X using/not using tailing iterator with ASAN build, and validate the behavior is expected.
      
      Reviewed By: ajkr
      
      Differential Revision: D25720226
      
      fbshipit-source-id: c2ff7ff7120e5ba64287a30b057c5d29b2cbe20b
      1e5b631e
    • B
      Crashtest mempurge (#8545) · 0229a88d
      Baptiste Lemaire 提交于
      Summary:
      Add `experiemental_allow_mempurge` flag support for `db_stress` and `db_crashtest.py`, with a `false` default value.
      I succesfully tested locally both `whitebox` and `blackbox` crash tests with `experiemental_allow_mempurge` flag set as true.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8545
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D29734513
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 24316c0eccf6caf409e95c035f31d822c66714ae
      0229a88d
  8. 16 7月, 2021 7 次提交
  9. 15 7月, 2021 1 次提交
  10. 14 7月, 2021 2 次提交
  11. 13 7月, 2021 5 次提交