1. 17 7月, 2021 7 次提交
    • M
      Remove extra double quote in options.h (#8550) · 3455ab0e
      Merlin Mao 提交于
      Summary:
      There is an extra "  in options.h (`"index block""`)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8550
      
      Test Plan: None
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29746077
      
      Pulled By: autopear
      
      fbshipit-source-id: 2e5117296e5414b7c7440d990926bc1e567a0b4f
      3455ab0e
    • S
      DB Stress Reopen write failure to skip WAL (#8548) · 39a07c96
      sdong 提交于
      Summary:
      When DB Stress enables write failure in reopen, WAL files are also created with a wrapper writalbe file which buffers write until fsync. However, crash test currently expects all writes to WAL is persistent. This is at odd with the unsynced bytes dropped. To work it around temporarily, we disable WAL write failure for now.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8548
      
      Test Plan: Run db_stress. Manual printf to make sure only WAL files are skipped.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29745095
      
      fbshipit-source-id: 1879dd2c01abad7879ca243ee94570ec47c347f3
      39a07c96
    • J
      Minor Makefile update to exclude microbench as dependency (#8523) · a379dae4
      Jay Zhuang 提交于
      Summary:
      Otherwise the build may report warning about missing
      `benchmark.h` for some targets, the error won't break the build.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8523
      
      Test Plan:
      `make blackbox_ubsan_crash_test` on a machine without
      benchmark lib installed.
      
      Reviewed By: pdillinger
      
      Differential Revision: D29682478
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: e1261fbcda46bc6bd3cd39b7b03b7f78927d0430
      a379dae4
    • M
      Allow CreateFromString to work on complex URIs (#8547) · ac37bfde
      mrambacher 提交于
      Summary:
      Some URIs for creating instances (ala SecondaryCache) use complex URIs like (cache://name;prop=value).  These URIs were treated as name-value properties.  With this change, if the URI does not contain an "id=XX" setting, it will be treated as a single string value (and not an ID and map of name-value properties).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8547
      
      Reviewed By: anand1976
      
      Differential Revision: D29741386
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 0621f62bec3a6699a7b66c7c0b5634b2856653aa
      ac37bfde
    • P
      Don't hold DB mutex for block cache entry stat scans (#8538) · df5dc73b
      Peter Dillinger 提交于
      Summary:
      I previously didn't notice the DB mutex was being held during
      block cache entry stat scans, probably because I primarily checked for
      read performance regressions, because they require the block cache and
      are traditionally latency-sensitive.
      
      This change does some refactoring to avoid holding DB mutex and to
      avoid triggering and waiting for a scan in GetProperty("rocksdb.cfstats").
      Some tests have to be updated because now the stats collector is
      populated in the Cache aggressively on DB startup rather than lazily.
      (I hope to clean up some of this added complexity in the future.)
      
      This change also ensures proper treatment of need_out_of_mutex for
      non-int DB properties.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8538
      
      Test Plan:
      Added unit test logic that uses sync points to fail if the DB mutex
      is held during a scan, covering the various ways that a scan might be
      triggered.
      
      Performance test - the known impact to holding the DB mutex is on
      TransactionDB, and the easiest way to see the impact is to hack the
      scan code to almost always miss and take an artificially long time
      scanning. Here I've injected an unconditional 5s sleep at the call to
      ApplyToAllEntries.
      
      Before (hacked):
      
          $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     433.219 micros/op 2308 ops/sec;    0.1 MB/s ( transactions:78999 aborts:0)
          rocksdb.db.write.micros P50 : 16.135883 P95 : 36.622503 P99 : 66.036115 P100 : 5000614.000000 COUNT : 149677 SUM : 8364856
          $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     448.802 micros/op 2228 ops/sec;    0.1 MB/s ( transactions:75999 aborts:0)
          rocksdb.db.write.micros P50 : 16.629221 P95 : 37.320607 P99 : 72.144341 P100 : 5000871.000000 COUNT : 143995 SUM : 13472323
      
      Notice the 5s P100 write time.
      
      After (hacked):
      
          $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     303.645 micros/op 3293 ops/sec;    0.1 MB/s ( transactions:98999 aborts:0)
          rocksdb.db.write.micros P50 : 16.061871 P95 : 33.978834 P99 : 60.018017 P100 : 616315.000000 COUNT : 187619 SUM : 4097407
          $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     310.383 micros/op 3221 ops/sec;    0.1 MB/s ( transactions:96999 aborts:0)
          rocksdb.db.write.micros P50 : 16.270026 P95 : 35.786844 P99 : 64.302878 P100 : 603088.000000 COUNT : 183819 SUM : 4095918
      
      P100 write is now ~0.6s. Not good, but it's the same even if I completely bypass all the scanning code:
      
          $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     311.365 micros/op 3211 ops/sec;    0.1 MB/s ( transactions:96999 aborts:0)
          rocksdb.db.write.micros P50 : 16.274362 P95 : 36.221184 P99 : 68.809783 P100 : 649808.000000 COUNT : 183819 SUM : 4156767
          $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     308.395 micros/op 3242 ops/sec;    0.1 MB/s ( transactions:97999 aborts:0)
          rocksdb.db.write.micros P50 : 16.106222 P95 : 37.202403 P99 : 67.081875 P100 : 598091.000000 COUNT : 185714 SUM : 4098832
      
      No substantial difference.
      
      Reviewed By: siying
      
      Differential Revision: D29738847
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 1c5c155f5a1b62e4fea0fd4eeb515a8b7474027b
      df5dc73b
    • S
      db_bench seekrandom with multiDB should only create iterators queried (#7818) · 1e5b631e
      sdong 提交于
      Summary:
      Right now, db_bench with seekrandom and multiple DB setup creates iterator for all DBs just to query one of them. It's different from most real workloads. Fix it by only creating iterators that will be queried.
      
      Also fix a bug that DBs are not destroyed in multi-DB mode.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7818
      
      Test Plan: Run db_bench with single/multiDB X using/not using tailing iterator with ASAN build, and validate the behavior is expected.
      
      Reviewed By: ajkr
      
      Differential Revision: D25720226
      
      fbshipit-source-id: c2ff7ff7120e5ba64287a30b057c5d29b2cbe20b
      1e5b631e
    • B
      Crashtest mempurge (#8545) · 0229a88d
      Baptiste Lemaire 提交于
      Summary:
      Add `experiemental_allow_mempurge` flag support for `db_stress` and `db_crashtest.py`, with a `false` default value.
      I succesfully tested locally both `whitebox` and `blackbox` crash tests with `experiemental_allow_mempurge` flag set as true.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8545
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D29734513
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 24316c0eccf6caf409e95c035f31d822c66714ae
      0229a88d
  2. 16 7月, 2021 7 次提交
  3. 15 7月, 2021 1 次提交
  4. 14 7月, 2021 2 次提交
  5. 13 7月, 2021 7 次提交
  6. 12 7月, 2021 2 次提交
    • A
      Correct CVS -> CSV typo (#8513) · 5afd1e30
      Adam Retter 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8513
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29654066
      
      Pulled By: mrambacher
      
      fbshipit-source-id: b8f492fe21edd37fe1f1c5a4a0e9153f58bbf3e2
      5afd1e30
    • A
      Avoid passing existing BG error to WriteStatusCheck (#8511) · d1b70b05
      anand76 提交于
      Summary:
      In ```DBImpl::WriteImpl()```, we call ```PreprocessWrite()``` which, among other things, checks the BG error and returns it set. This return status is later on passed to ```WriteStatusCheck()```, which calls ```SetBGError()```. This results in a spurious call, and info logs, on every user write request. We should avoid passing the ```PreprocessWrite()``` return status to ```WriteStatusCheck()```, as the former would have called ```SetBGError()``` already if it encountered any new errors, such as error when creating a new WAL file.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8511
      
      Test Plan: Run existing tests
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29639917
      
      Pulled By: anand1976
      
      fbshipit-source-id: 19234163969e1645dbeb273712aaf5cd9ea2b182
      d1b70b05
  7. 10 7月, 2021 2 次提交
    • B
      Make mempurge a background process (equivalent to in-memory compaction). (#8505) · 837705ad
      Baptiste Lemaire 提交于
      Summary:
      In https://github.com/facebook/rocksdb/issues/8454, I introduced a new process baptized `MemPurge` (memtable garbage collection). This new PR is built upon this past mempurge prototype.
      In this PR, I made the `mempurge` process a background task, which provides superior performance since the mempurge process does not cling on the db_mutex anymore, and addresses severe restrictions from the past iteration (including a scenario where the past mempurge was failling, when a memtable was mempurged but was still referred to by an iterator/snapshot/...).
      Now the mempurge process ressembles an in-memory compaction process: the stack of immutable memtables is filtered out, and the useful payload is used to populate an output memtable. If the output memtable is filled at more than 60% capacity (arbitrary heuristic) the mempurge process is aborted and a regular flush process takes place, else the output memtable is kept in the immutable memtable stack. Note that adding this output memtable to the `imm()` memtable stack does not trigger another flush process, so that the flush thread can go to sleep at the end of a successful mempurge.
      MemPurge is activated by making the `experimental_allow_mempurge` flag `true`. When activated, the `MemPurge` process will always happen when the flush reason is `kWriteBufferFull`.
      The 3 unit tests confirm that this process supports `Put`, `Get`, `Delete`, `DeleteRange` operators and is compatible with `Iterators` and `CompactionFilters`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8505
      
      Reviewed By: pdillinger
      
      Differential Revision: D29619283
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 8a99bee76b63a8211bff1a00e0ae32360aaece95
      837705ad
    • Q
      Add ribbon filter to C API (#8486) · bb485e98
      qieqieplus 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8486
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29625501
      
      Pulled By: pdillinger
      
      fbshipit-source-id: e6e2a455ae62a71f3a202278a751b9bba17ad03c
      bb485e98
  8. 09 7月, 2021 3 次提交
  9. 08 7月, 2021 2 次提交
    • S
      FaultInjectionTestFS::DeleteFilesCreatedAfterLastDirSync() to recover… (#8501) · b1a53db3
      sdong 提交于
      Summary:
      … small overwritten files.
      If a file is overwritten with renamed and the parent path is not synced, FaultInjectionTestFS::DeleteFilesCreatedAfterLastDirSync() will delete the file. However, RocksDB relies on file renaming to be atomic no matter whether the parent directory is synced or not, and the current behavior breaks the assumption and caused some false positive: https://github.com/facebook/rocksdb/pull/8489
      
      Since the atomic renaming is used in CURRENT files, to fix the problem, in FaultInjectionTestFS::DeleteFilesCreatedAfterLastDirSync(), we recover the state of overwritten file if the file is small.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8501
      
      Test Plan: Run stress test for a while and see it doesn't break.
      
      Reviewed By: anand1976
      
      Differential Revision: D29594384
      
      fbshipit-source-id: 589b5c2f0a9d2aca53752d7bdb0231efa5b3ae92
      b1a53db3
    • A
      Move slow valgrind tests behind -DROCKSDB_FULL_VALGRIND_RUN (#8475) · ed8eb436
      Andrew Kryczka 提交于
      Summary:
      Various tests had disabled valgrind due to it slowing down and timing
      out (as is the case right now) the CI runs. Where a test was disabled with no comment,
      I assumed slowness was the cause. For these tests that were slow under
      valgrind, as well as the ones identified in https://github.com/facebook/rocksdb/issues/8352, this PR moves them
      behind the compiler flag `-DROCKSDB_FULL_VALGRIND_RUN`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8475
      
      Test Plan: running `make full_valgrind_test`, `make valgrind_test`, `make check`; will verify they appear working correctly
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29504843
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2aac90749cfbd30d5ce11cb29a07a1b9314eeea7
      ed8eb436
  10. 07 7月, 2021 6 次提交
  11. 02 7月, 2021 1 次提交
    • B
      Memtable "MemPurge" prototype (#8454) · 9dc887ec
      Baptiste Lemaire 提交于
      Summary:
      Implement an experimental feature called "MemPurge", which consists in purging "garbage" bytes out of a memtable and reuse the memtable struct instead of making it immutable and eventually flushing its content to storage.
      The prototype is by default deactivated and is not intended for use. It is intended for correctness and validation testing. At the moment, the "MemPurge" feature can be switched on by using the `options.experimental_allow_mempurge` flag. For this early stage, when the allow_mempurge flag is set to `true`, all the flush operations will be rerouted to perform a MemPurge. This is a temporary design decision that will give us the time to explore meaningful heuristics to use MemPurge at the right time for relevant workloads . Moreover, the current MemPurge operation only supports `Puts`, `Deletes`, `DeleteRange` operations, and handles `Iterators` as well as `CompactionFilter`s that are invoked at flush time .
      Three unit tests are added to `db_flush_test.cc` to test if MemPurge works correctly (and checks that the previously mentioned operations are fully supported thoroughly tested).
      One noticeable design decision is the timing of the MemPurge operation in the memtable workflow: for this prototype, the mempurge happens when the memtable is switched (and usually made immutable). This is an inefficient process because it implies that the entirety of the MemPurge operation happens while holding the db_mutex. Future commits will make the MemPurge operation a background task (akin to the regular flush operation) and aim at drastically enhancing the performance of this operation. The MemPurge is also not fully "WAL-compatible" yet, but when the WAL is full, or when the regular MemPurge operation fails (or when the purged memtable still needs to be flushed), a regular flush operation takes place. Later commits will also correct these behaviors.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8454
      
      Reviewed By: anand1976
      
      Differential Revision: D29433971
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 6af48213554e35048a7e03816955100a80a26dc5
      9dc887ec