1. 22 7月, 2021 3 次提交
    • S
      Complete the fix of stress open WAL drop fix (#8570) · 9b41082d
      sdong 提交于
      Summary:
      https://github.com/facebook/rocksdb/pull/8548 is not complete. We should instead cover all cases writable files are buffered, not just when failures are ingested. Extend it to any case where failures are ingested in DB open.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8570
      
      Test Plan: Run db_stress and see it doesn't break
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29830415
      
      fbshipit-source-id: 94449a0468fb2f7eec17423724008c9c63b2445d
      9b41082d
    • J
      Avoid updating option if there's no value updated (#8518) · 42eaa45c
      Jay Zhuang 提交于
      Summary:
      Try avoid expensive updating options operation if
      `SetDBOptions()` does not change any option value.
      Skip updating is not guaranteed, for example, changing `bytes_per_sync`
      to `0` may still trigger updating, as the value could be sanitized.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8518
      
      Test Plan: added unittest
      
      Reviewed By: riversand963
      
      Differential Revision: D29672639
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: b7931de62ceea6f1bdff0d1209adf1197d3ed1f4
      42eaa45c
    • B
      Add overwrite_probability for filluniquerandom benchmark in db_bench (#8569) · 6b4cdacf
      Baptiste Lemaire 提交于
      Summary:
      Add flags `overwrite_probability` and `overwrite_window_size` flag to `db_bench`.
      Add the possibility of performing a `filluniquerandom` benchmark with an overwrite probability.
      For each write operation, there is a probability _p_ that the write is an overwrite (_p_=`overwrite_probability`).
      When an overwrite is decided, the key is randomly chosen from the last _N_ keys previously inserted into the DB (with _N_=`overwrite_window_size`).
      When a pure write is decided, the key inserted into the DB is unique and therefore will not be an overwrite.
      The `overwrite_window_size` is used so that the user can decide if the overwrite are mostly targeting recently inserted keys (when `overwrite_window_size` is small compared to the total number of writes), or can also target keys inserted "a long time ago" (when `overwrite_window_size` is comparable to total number of writes).
      Note that total number of writes = # of unique insertions + # of overwrites.
      No unit test specifically added.
      Local testing show the following **throughputs** for `filluniquerandom` with 1M total writes:
      - bypass the code inserts (no `overwrite_probability` flag specified): ~14.0MB/s
      - `overwrite_probability=0.99`, `overwrite_window_size=10`: ~17.0MB/s
      - `overwrite_probability=0.10`, `overwrite_window_size=10`: ~14.0MB/s
      - `overwrite_probability=0.99`, `overwrite_window_size=1M`: ~14.5MB/s
      - `overwrite_probability=0.10`, `overwrite_window_size=1M`: ~14.0MB/s
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8569
      
      Reviewed By: pdillinger
      
      Differential Revision: D29818631
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: d472b4ea4e457a4da7c4ee4f14b40cccd6a4587a
      6b4cdacf
  2. 21 7月, 2021 2 次提交
  3. 20 7月, 2021 3 次提交
  4. 19 7月, 2021 1 次提交
  5. 17 7月, 2021 9 次提交
    • S
      Change to code for trimmed memtable history is to released outside DB mutex (#8530) · 9e885939
      sdong 提交于
      Summary:
      Currently, the code shows that we delete memtables immedately after it is trimmed from history. Although it should never happen as the super version still holds the memtable, which is only switched after it, it feels a good practice not to do it, but use clean it up in the standard way: put it to WriteContext and clean it after DB mutex.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8530
      
      Test Plan: Run all existing tests.
      
      Reviewed By: ajkr
      
      Differential Revision: D29703410
      
      fbshipit-source-id: 21d8068ac6377de4b6fa7a89697195742659fde4
      9e885939
    • J
      Update HISTORY.md and version.h 6.23 release (#8552) · c04a86a0
      Jay Zhuang 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8552
      
      Reviewed By: ajkr
      
      Differential Revision: D29746828
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 17d564895ae9cb675d455e73626b9a6717db6279
      c04a86a0
    • M
      Remove extra double quote in options.h (#8550) · 3455ab0e
      Merlin Mao 提交于
      Summary:
      There is an extra "  in options.h (`"index block""`)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8550
      
      Test Plan: None
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29746077
      
      Pulled By: autopear
      
      fbshipit-source-id: 2e5117296e5414b7c7440d990926bc1e567a0b4f
      3455ab0e
    • S
      DB Stress Reopen write failure to skip WAL (#8548) · 39a07c96
      sdong 提交于
      Summary:
      When DB Stress enables write failure in reopen, WAL files are also created with a wrapper writalbe file which buffers write until fsync. However, crash test currently expects all writes to WAL is persistent. This is at odd with the unsynced bytes dropped. To work it around temporarily, we disable WAL write failure for now.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8548
      
      Test Plan: Run db_stress. Manual printf to make sure only WAL files are skipped.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29745095
      
      fbshipit-source-id: 1879dd2c01abad7879ca243ee94570ec47c347f3
      39a07c96
    • J
      Minor Makefile update to exclude microbench as dependency (#8523) · a379dae4
      Jay Zhuang 提交于
      Summary:
      Otherwise the build may report warning about missing
      `benchmark.h` for some targets, the error won't break the build.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8523
      
      Test Plan:
      `make blackbox_ubsan_crash_test` on a machine without
      benchmark lib installed.
      
      Reviewed By: pdillinger
      
      Differential Revision: D29682478
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: e1261fbcda46bc6bd3cd39b7b03b7f78927d0430
      a379dae4
    • M
      Allow CreateFromString to work on complex URIs (#8547) · ac37bfde
      mrambacher 提交于
      Summary:
      Some URIs for creating instances (ala SecondaryCache) use complex URIs like (cache://name;prop=value).  These URIs were treated as name-value properties.  With this change, if the URI does not contain an "id=XX" setting, it will be treated as a single string value (and not an ID and map of name-value properties).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8547
      
      Reviewed By: anand1976
      
      Differential Revision: D29741386
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 0621f62bec3a6699a7b66c7c0b5634b2856653aa
      ac37bfde
    • P
      Don't hold DB mutex for block cache entry stat scans (#8538) · df5dc73b
      Peter Dillinger 提交于
      Summary:
      I previously didn't notice the DB mutex was being held during
      block cache entry stat scans, probably because I primarily checked for
      read performance regressions, because they require the block cache and
      are traditionally latency-sensitive.
      
      This change does some refactoring to avoid holding DB mutex and to
      avoid triggering and waiting for a scan in GetProperty("rocksdb.cfstats").
      Some tests have to be updated because now the stats collector is
      populated in the Cache aggressively on DB startup rather than lazily.
      (I hope to clean up some of this added complexity in the future.)
      
      This change also ensures proper treatment of need_out_of_mutex for
      non-int DB properties.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8538
      
      Test Plan:
      Added unit test logic that uses sync points to fail if the DB mutex
      is held during a scan, covering the various ways that a scan might be
      triggered.
      
      Performance test - the known impact to holding the DB mutex is on
      TransactionDB, and the easiest way to see the impact is to hack the
      scan code to almost always miss and take an artificially long time
      scanning. Here I've injected an unconditional 5s sleep at the call to
      ApplyToAllEntries.
      
      Before (hacked):
      
          $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     433.219 micros/op 2308 ops/sec;    0.1 MB/s ( transactions:78999 aborts:0)
          rocksdb.db.write.micros P50 : 16.135883 P95 : 36.622503 P99 : 66.036115 P100 : 5000614.000000 COUNT : 149677 SUM : 8364856
          $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     448.802 micros/op 2228 ops/sec;    0.1 MB/s ( transactions:75999 aborts:0)
          rocksdb.db.write.micros P50 : 16.629221 P95 : 37.320607 P99 : 72.144341 P100 : 5000871.000000 COUNT : 143995 SUM : 13472323
      
      Notice the 5s P100 write time.
      
      After (hacked):
      
          $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     303.645 micros/op 3293 ops/sec;    0.1 MB/s ( transactions:98999 aborts:0)
          rocksdb.db.write.micros P50 : 16.061871 P95 : 33.978834 P99 : 60.018017 P100 : 616315.000000 COUNT : 187619 SUM : 4097407
          $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     310.383 micros/op 3221 ops/sec;    0.1 MB/s ( transactions:96999 aborts:0)
          rocksdb.db.write.micros P50 : 16.270026 P95 : 35.786844 P99 : 64.302878 P100 : 603088.000000 COUNT : 183819 SUM : 4095918
      
      P100 write is now ~0.6s. Not good, but it's the same even if I completely bypass all the scanning code:
      
          $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     311.365 micros/op 3211 ops/sec;    0.1 MB/s ( transactions:96999 aborts:0)
          rocksdb.db.write.micros P50 : 16.274362 P95 : 36.221184 P99 : 68.809783 P100 : 649808.000000 COUNT : 183819 SUM : 4156767
          $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     308.395 micros/op 3242 ops/sec;    0.1 MB/s ( transactions:97999 aborts:0)
          rocksdb.db.write.micros P50 : 16.106222 P95 : 37.202403 P99 : 67.081875 P100 : 598091.000000 COUNT : 185714 SUM : 4098832
      
      No substantial difference.
      
      Reviewed By: siying
      
      Differential Revision: D29738847
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 1c5c155f5a1b62e4fea0fd4eeb515a8b7474027b
      df5dc73b
    • S
      db_bench seekrandom with multiDB should only create iterators queried (#7818) · 1e5b631e
      sdong 提交于
      Summary:
      Right now, db_bench with seekrandom and multiple DB setup creates iterator for all DBs just to query one of them. It's different from most real workloads. Fix it by only creating iterators that will be queried.
      
      Also fix a bug that DBs are not destroyed in multi-DB mode.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7818
      
      Test Plan: Run db_bench with single/multiDB X using/not using tailing iterator with ASAN build, and validate the behavior is expected.
      
      Reviewed By: ajkr
      
      Differential Revision: D25720226
      
      fbshipit-source-id: c2ff7ff7120e5ba64287a30b057c5d29b2cbe20b
      1e5b631e
    • B
      Crashtest mempurge (#8545) · 0229a88d
      Baptiste Lemaire 提交于
      Summary:
      Add `experiemental_allow_mempurge` flag support for `db_stress` and `db_crashtest.py`, with a `false` default value.
      I succesfully tested locally both `whitebox` and `blackbox` crash tests with `experiemental_allow_mempurge` flag set as true.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8545
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D29734513
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 24316c0eccf6caf409e95c035f31d822c66714ae
      0229a88d
  6. 16 7月, 2021 7 次提交
  7. 15 7月, 2021 1 次提交
  8. 14 7月, 2021 2 次提交
  9. 13 7月, 2021 7 次提交
  10. 12 7月, 2021 2 次提交
    • A
      Correct CVS -> CSV typo (#8513) · 5afd1e30
      Adam Retter 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8513
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29654066
      
      Pulled By: mrambacher
      
      fbshipit-source-id: b8f492fe21edd37fe1f1c5a4a0e9153f58bbf3e2
      5afd1e30
    • A
      Avoid passing existing BG error to WriteStatusCheck (#8511) · d1b70b05
      anand76 提交于
      Summary:
      In ```DBImpl::WriteImpl()```, we call ```PreprocessWrite()``` which, among other things, checks the BG error and returns it set. This return status is later on passed to ```WriteStatusCheck()```, which calls ```SetBGError()```. This results in a spurious call, and info logs, on every user write request. We should avoid passing the ```PreprocessWrite()``` return status to ```WriteStatusCheck()```, as the former would have called ```SetBGError()``` already if it encountered any new errors, such as error when creating a new WAL file.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8511
      
      Test Plan: Run existing tests
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29639917
      
      Pulled By: anand1976
      
      fbshipit-source-id: 19234163969e1645dbeb273712aaf5cd9ea2b182
      d1b70b05
  11. 10 7月, 2021 2 次提交
    • B
      Make mempurge a background process (equivalent to in-memory compaction). (#8505) · 837705ad
      Baptiste Lemaire 提交于
      Summary:
      In https://github.com/facebook/rocksdb/issues/8454, I introduced a new process baptized `MemPurge` (memtable garbage collection). This new PR is built upon this past mempurge prototype.
      In this PR, I made the `mempurge` process a background task, which provides superior performance since the mempurge process does not cling on the db_mutex anymore, and addresses severe restrictions from the past iteration (including a scenario where the past mempurge was failling, when a memtable was mempurged but was still referred to by an iterator/snapshot/...).
      Now the mempurge process ressembles an in-memory compaction process: the stack of immutable memtables is filtered out, and the useful payload is used to populate an output memtable. If the output memtable is filled at more than 60% capacity (arbitrary heuristic) the mempurge process is aborted and a regular flush process takes place, else the output memtable is kept in the immutable memtable stack. Note that adding this output memtable to the `imm()` memtable stack does not trigger another flush process, so that the flush thread can go to sleep at the end of a successful mempurge.
      MemPurge is activated by making the `experimental_allow_mempurge` flag `true`. When activated, the `MemPurge` process will always happen when the flush reason is `kWriteBufferFull`.
      The 3 unit tests confirm that this process supports `Put`, `Get`, `Delete`, `DeleteRange` operators and is compatible with `Iterators` and `CompactionFilters`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8505
      
      Reviewed By: pdillinger
      
      Differential Revision: D29619283
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 8a99bee76b63a8211bff1a00e0ae32360aaece95
      837705ad
    • Q
      Add ribbon filter to C API (#8486) · bb485e98
      qieqieplus 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8486
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29625501
      
      Pulled By: pdillinger
      
      fbshipit-source-id: e6e2a455ae62a71f3a202278a751b9bba17ad03c
      bb485e98
  12. 09 7月, 2021 1 次提交
    • J
      Add micro-benchmark support (#8493) · 5dd18a8d
      Jay Zhuang 提交于
      Summary:
      Add google benchmark for microbench.
      Add ribbon_bench for benchmark ribbon filter vs. other filters.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8493
      
      Test Plan:
      added test to CI
      To run the benchmark on devhost:
      Install benchmark: `$ sudo dnf install google-benchmark-devel`
      Build and run:
      `$ ROCKSDB_NO_FBCODE=1 DEBUG_LEVEL=0 make microbench`
      or with cmake:
      `$ mkdir build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DWITH_BENCHMARK=1 && make microbench`
      
      Reviewed By: pdillinger
      
      Differential Revision: D29589649
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 8fed13b562bef4472f161ecacec1ab6b18911dff
      5dd18a8d