1. 04 8月, 2021 3 次提交
  2. 03 8月, 2021 5 次提交
    • B
      Fix db stress crash mempurge (#8604) · b2781522
      Baptiste Lemaire 提交于
      Summary:
      The db_stress crash was caused by a call to `IsFlushPending()` made by a stats function which triggered an `assert([false])`, which I didn't plan when I created the `trigger_flush` bool. It turns out that this bool variable is not useful: I created it because I thought the `imm_flush_needed` atomic bool would actually trigger a flush.
      It turns out that this bool is only checked in `IsFlushPending` - this is its only use - and a flush is triggered by either a background thread checking on the imm array, or by an explicit call to `SchedulePendingFlush` which creates a flush request, that is then added to a flush request queue.
      In this PR, I reverted the MemtableList::Add function to what it was before my changes.
      I tested the fix by running the exact command line that deterministically triggered the assert error (see below), which confirmed that this is where the error was coming from.
      I also run `db_crashtest.py whitebox` and `blackbox` for a couple hours locally before committing this PR.
      Experiment run:
      
      ```./db_stress --acquire_snapshot_one_in=0 --allow_concurrent_memtable_write=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=76.90653425292307 --bottommost_compression_type=disable --cache_index_and_filter_blocks=1 --cache_size=1048576 --checkpoint_one_in=1000000 --checksum_type=kCRC32c --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=0 --compaction_ttl=2 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=zstd --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --db=/dev/shm/rocksdb/rocksdb_crashtest_blackbox --db_write_buffer_size=0 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --enable_compaction_filter=1 --enable_pipelined_write=0 --expected_values_path=/dev/shm/rocksdb/rocksdb_crashtest_expected --experimental_allow_mempurge=1 --experimental_mempurge_policy=kAlternate --fail_if_options_file_error=1 --file_checksum_impl=none --flush_one_in=1000000 --format_version=2 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=14 --index_type=0 --iterpercent=0 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=False --long_running_snapshots=1 --mark_for_compaction_one_file_in=10 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=100000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtablerep=skip_list --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --open_files=-1 --open_metadata_write_fault_one_in=8 --open_read_fault_one_in=32 --open_write_fault_one_in=16 --ops_per_thread=100000000 --optimize_filters_for_memory=1 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=0 --pause_background_one_in=1000000 --periodic_compaction_seconds=1000 --prefix_size=-1 --prefixpercent=0 --progress_reports=0 --read_fault_one_in=0 --readpercent=60 --recycle_log_file_num=1 --reopen=20 --set_options_one_in=0 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=0 --subcompactions=3 --sync=1 --sync_fault_injection=False --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=1 --unpartitioned_pinning=3 --use_clock_cache=0 --use_direct_io_for_flush_and_compaction=1 --use_direct_reads=0 --use_full_merge_v1=1 --use_merge=0 --use_multiget=0 --use_ribbon_filter=1 --user_timestamp_size=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=35```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8604
      
      Reviewed By: pdillinger
      
      Differential Revision: D30047295
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: b9e379bfa3d6b9bd2b275725fb0bca4bd81a3dbe
      b2781522
    • M
      Revert checkpoint fix (#8607) · 4811115b
      Merlin Mao 提交于
      Summary:
      PR https://github.com/facebook/rocksdb/pull/8572 looses custom types in the options file. Need more API changes to fix this issue. Revert this PR.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8607
      
      Reviewed By: ajkr
      
      Differential Revision: D30058289
      
      Pulled By: autopear
      
      fbshipit-source-id: 78f5a154c0bf193e8441bae4a36fa79b95277fd4
      4811115b
    • L
      Fix a race in ColumnFamilyData::UnrefAndTryDelete (#8605) · 3f7e9298
      Levi Tamasi 提交于
      Summary:
      The `ColumnFamilyData::UnrefAndTryDelete` code currently on the trunk
      unlocks the DB mutex before destroying the `ThreadLocalPtr` holding
      the per-thread `SuperVersion` pointers when the only remaining reference
      is the back reference from `super_version_`. The idea behind this was to
      break the circular dependency between `ColumnFamilyData` and `SuperVersion`:
      when the penultimate reference goes away, `ColumnFamilyData` can clean up
      the `SuperVersion`, which can in turn clean up `ColumnFamilyData`. (Assuming there
      is a `SuperVersion` and it is not referenced by anything else.) However,
      unlocking the mutex throws a wrench in this plan by making it possible for another thread
      to jump in and take another reference to the `ColumnFamilyData`, keeping the
      object alive in a zombie `ThreadLocalPtr`-less state. This can cause issues like
      https://github.com/facebook/rocksdb/issues/8440 ,
      https://github.com/facebook/rocksdb/issues/8382 ,
      and might also explain the `was_last_ref` assertion failures from the `ColumnFamilySet`
      destructor we sometimes observe during close in our stress tests.
      
      Digging through the archives, this unlocking goes way back to 2014 (or earlier). The original
      rationale was that `SuperVersionUnrefHandle` used to lock the mutex so it can call
      `SuperVersion::Cleanup`; however, this logic turned out to be deadlock-prone.
      https://github.com/facebook/rocksdb/pull/3510 fixed the deadlock but left the
      unlocking in place. https://github.com/facebook/rocksdb/pull/6147 then introduced
      the circular dependency and associated cleanup logic described above (in order
      to enable iterators to keep the `ColumnFamilyData` for dropped column families alive),
      and moved the unlocking-relocking snippet to its present location in `UnrefAndTryDelete`.
      Finally, https://github.com/facebook/rocksdb/pull/7749 fixed a memory leak but
      apparently exacerbated the race by (otherwise correctly) switching to `UnrefAndTryDelete`
      in `SuperVersion::Cleanup`.
      
      The patch simply eliminates the unlocking and relocking, which has been unnecessary
      ever since https://github.com/facebook/rocksdb/issues/3510 made `SuperVersionUnrefHandle` lock-free.
      This closes the window during which another thread could increase the reference count,
      and hopefully fixes the issues above.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8605
      
      Test Plan: Ran `make check` and stress tests locally.
      
      Reviewed By: pdillinger
      
      Differential Revision: D30051035
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 8fe559e4b4ad69fc142579f8bc393ef525918528
      3f7e9298
    • Y
      Fix a issue with initializing blob header buffer (#8537) · 8e91bd90
      yangzaorang 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8537
      
      Reviewed By: ajkr
      
      Differential Revision: D29838132
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: e3e78d5f85f240a1800ace417a8b634f74488e41
      8e91bd90
    • M
      Allow to use a string as a delimiter in StringAppendOperator (#8536) · 8f52972c
      Mikhail Golubev 提交于
      Summary:
      An arbitrary string can be used as a delimiter in StringAppend merge operator
      flavor. In particular, it allows using an empty string, combining binary values for
      the same key byte-to-byte one next to another.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8536
      
      Reviewed By: mrambacher
      
      Differential Revision: D29962120
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 4ef5d846a47835cf428a11200409e30e2dbffc4f
      8f52972c
  3. 31 7月, 2021 2 次提交
    • M
      Allow WAL dir to change with db dir (#8582) · ab7f7c9e
      mrambacher 提交于
      Summary:
      Prior to this change, the "wal_dir"  DBOption would always be set (defaults to dbname) when the DBOptions were sanitized.  Because of this setitng in the options file, it was not possible to rename/relocate a database directory after it had been created and use the existing options file.
      
      After this change, the "wal_dir" option is only set under specific circumstances.  Methods were added to the ImmutableDBOptions class to see if it is set and if it is set to something other than the dbname.  Additionally, a method was added to retrieve the effective value of the WAL dir (either the option or the dbname/path).
      
      Tests were added to the core and ldb to test that a database could be created and renamed without issue.  Additional tests for various permutations of wal_dir were also added.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8582
      
      Reviewed By: pdillinger, autopear
      
      Differential Revision: D29881122
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 67d3d033dc8813d59917b0a3fba2550c0efd6dfb
      ab7f7c9e
    • Y
      Several simple local code clean-ups (#8565) · 066b5112
      Yanqin Jin 提交于
      Summary:
      This PR tries to remove some unnecessary checks as well as unreachable code blocks to
      improve readability. An obvious non-public API method naming typo is also corrected.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8565
      
      Test Plan: make check
      
      Reviewed By: lth
      
      Differential Revision: D29963984
      
      Pulled By: riversand963
      
      fbshipit-source-id: cc96e8f09890e5cfe9b20eadb63bdca5484c150a
      066b5112
  4. 30 7月, 2021 3 次提交
    • P
      Fix insecure internal API for GetImpl (#8590) · 1d34cd79
      Peter Dillinger 提交于
      Summary:
      Calling the GetImpl function could leave reference to a local
      callback function in a field of a parameter struct. As this is
      performance-critical code, I'm not going to attempt to sanitize this
      code too much, but make the existing hack a bit cleaner by reverting
      what it overwrites in the input struct.
      
      Added SaveAndRestore utility class to make that easier.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8590
      
      Test Plan:
      added unit test for SaveAndRestore; existing tests for
      GetImpl
      
      Reviewed By: riversand963
      
      Differential Revision: D29947983
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 2f608853f970bc06724e834cc84dcc4b8599ddeb
      1d34cd79
    • B
      Create fillanddeleteuniquerandom benchmark (db_bench), with new option flags. (#8593) · 9501279d
      Baptiste Lemaire 提交于
      Summary:
      Introduction of a new `fillanddeleteuniquerandom` benchmark (`db_bench`) with 5 new option flags to simulate a benchmark where the following sequence is repeated multiple times:
      "A set of keys S1 is inserted ('`disposable entries`'), then after some delay another set of keys S2 is inserted ('`persistent entries`') and the first set of keys S1 is deleted. S2 artificially represents the insertion of hypothetical results from some undefined computation done on the first set of keys S1. The next sequence can start as soon as the last disposable entry in the set S1 of this sequence is inserted, if the `delay` is non negligible."
      New flags:
      - `disposable_entries_delete_delay`: minimum delay in microseconds between insertion of the last `disposable` entry, and the start of the insertion of the first `persistent` entry.
      - `disposable_entries_batch_size`: number of `disposable` entries inserted at the beginning of each sequence.
      - `disposable_entries_value_size`: size of the random `value` string for the `disposable` entries.
      - `persistent_entries_batch_size`: number of `persistent` entries inserted at the end of each sequence, right before the deletion of the `disposable` entries starts.
      - `persistent_entries_value_size`: size of the random value string for the `persistent` entries.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8593
      
      Reviewed By: pdillinger
      
      Differential Revision: D29974436
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: f578033e5b45e8268ba6fa6f38f4770c2e6e801d
      9501279d
    • S
      DB::GetSortedWalFiles() to ensure file deletion is disabled (#8591) · e8f218cb
      sdong 提交于
      Summary:
      If DB::GetSortedWalFiles() runs without file deletion disbled, file might get deleted in the middle and error is returned to users. It makes the function hard to use. Fix it by disabling file deletion if it is not done.
      
      Fix another minor issue of logging within DB mutex, which should not be done unless a major failure happens.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8591
      
      Test Plan: Run all existing tests
      
      Reviewed By: pdillinger
      
      Differential Revision: D29969412
      
      fbshipit-source-id: d5f42b5271608a35b9b07687ce18157d7447b0de
      e8f218cb
  5. 29 7月, 2021 4 次提交
    • P
      Some fixes and enhancements to `ldb repair` (#8544) · 0804b44f
      Peter Dillinger 提交于
      Summary:
      * Basic handling of SST file with just range tombstones rather than
      failing assertion about smallest_seqno <= largest_seqno
      * Adds --verbose option so that there exists a way to see the INFO
      output from Repairer.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8544
      
      Test Plan: unit test added, manual testing for --verbose
      
      Reviewed By: ajkr
      
      Differential Revision: D29954805
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 696af25805fc36cc178b04ba6045922a22625fd9
      0804b44f
    • J
      Fix HISTORY.md for #8518 (#8594) · a4b8ac9a
      Jay Zhuang 提交于
      Summary:
      PR https://github.com/facebook/rocksdb/issues/8518 merge the change to wrong section.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8594
      
      Reviewed By: riversand963
      
      Differential Revision: D29974565
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 51c930d93fbdb406fe31ff73c96548a6f88b9965
      a4b8ac9a
    • J
      Replace macros in compaction_iterator.cc with inline functions (#8592) · e0ff365a
      jimmycleary 提交于
      Summary:
      Internal task T96186510.
      
      Created new inline member functions in `CompactionIterator`,
      `DefinitelyInSnapshot`, `DefinitelyNotInSnapshot`, and
      `InEarliestSnapshot` to replace the macros at the top of
      `compaction_iterator.cc`.
      
      Placed the definitions in `compaction_iterator.h` in accordance with
      Google's style guide for inline functions. Separated the declarations
      and definitions, and only placed the `inline` keyword on the
      definitions, in line with ISO CPP recommendations.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8592
      
      Test Plan: Ran `make check`.  Successful build and all tests appeared to pass.
      
      Reviewed By: riversand963
      
      Differential Revision: D29966782
      
      Pulled By: jimmycFB
      
      fbshipit-source-id: 3584290bbbabf862e9ab58852281f46d37f58be6
      e0ff365a
    • B
      Add experimental mempurge policy flag to db_stress. (#8588) · d6006f9c
      Baptiste Lemaire 提交于
      Summary:
      Add `experimental_mempurge_policy` flag to `db_stress` and `db_crashtest.py`.
      This flag is only read if the `experimental_allow_mempurge` flag is set to `true`. This flag can take the following values: `kAlways`, and `kAlternate` (default).
      - `kAlways`: a flush is always redirected to a mempurge. If the mempurge aborts, the a regular flush proceeds.
      - `kAlternate`: if one or more of the flush input memtables is an mempurge output memtable, then a flush is performed, else a mempurge is carried out. Similar to kAlways, if a mempurge aborts, the FlushJob proceeds to a regular flush to storage.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8588
      
      Reviewed By: pdillinger
      
      Differential Revision: D29934251
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 90c1debed2029b9915d066914556547507c33dae
      d6006f9c
  6. 28 7月, 2021 4 次提交
    • P
      Fix use-after-free on implicit temporary FileOptions (#8571) · 74b7c0d2
      Peter Dillinger 提交于
      Summary:
      FileOptions has an implicit conversion from EnvOptions and some
      internal APIs take `const FileOptions&` and save the reference, which is
      counter to Google C++ guidelines,
      
      > Avoid defining functions that require a const reference parameter to outlive the call, because const reference parameters bind to temporaries. Instead, find a way to eliminate the lifetime requirement (for example, by copying the parameter), or pass it by const pointer and document the lifetime and non-null requirements.
      
      This is at least a problem for repair.cc, which passes an EnvOptions to
      TableCache(), which would save a reference to the temporary copy as
      FileOptions. This was unfortunately only caught as a side effect of
      changes in https://github.com/facebook/rocksdb/issues/8544.
      
      This change fixes the repair.cc case and updates the involved internal
      APIs that save a reference to use `const FileOptions*` instead.
      
      Unfortunately, I don't know how to get any of our sanitizers to reliably
      report bugs like this, so I can't rule out more existing in our
      codebase.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8571
      
      Test Plan:
      Test that issues seen with https://github.com/facebook/rocksdb/issues/8544 are fixed (can reproduce on
      AWS EC2)
      
      Reviewed By: ajkr
      
      Differential Revision: D29943890
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 95f9c5251548777b4dc994c1a083dd2add5799c9
      74b7c0d2
    • P
      Fix missing Handle release in TableCache::GetRangeTombstoneIterator (#8589) · e352bd57
      Peter Dillinger 提交于
      Summary:
      This appears to be little used code so not a major bug, but is
      blocking https://github.com/facebook/rocksdb/issues/8544
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8589
      
      Test Plan:
      Added regression test to the end of
      DBRangeDelTest::TableEvictedDuringScan. Without this fix, ASAN reports
      memory leak.
      
      Reviewed By: ajkr
      
      Differential Revision: D29943623
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f7115fa6d4440aef83888ff609aa03d09216463b
      e352bd57
    • Z
      Add MultiGet to replay (#8577) · eec79b39
      Zhichao Cao 提交于
      Summary:
      When the trace contains the MultiGet record, with this PR, it can replay the MultiGet.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8577
      
      Test Plan: make check and replay the real trace.
      
      Reviewed By: anand1976
      
      Differential Revision: D29864060
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 5288d4fc9b6a3cb331de1e0c635d4e044dcb534a
      eec79b39
    • A
      Pass extra db_stress args to fbcode crash tests (#8587) · ddf439c5
      anand76 提交于
      Summary:
      Allow extra arguments to be passed to db_stress in fbcode crash tests by the ```rocksdb-lego-determinator``` invoker.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8587
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29940217
      
      Pulled By: anand1976
      
      fbshipit-source-id: 17cbcd2def60eff2a895553f917694496c4742aa
      ddf439c5
  7. 27 7月, 2021 3 次提交
    • M
      Make EventListener into a Customizable Class (#8473) · 3aee4fbd
      mrambacher 提交于
      Summary:
      - Added Type/CreateFromString
      - Added ability to load EventListeners to DBOptions
      - Since EventListeners did not previously have a Name(), defaulted to "".  If there is no name, the listener cannot be loaded from the ObjectRegistry.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8473
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29901488
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 2d3a4aa6db1562ac03e7ad41b360e3521d486254
      3aee4fbd
    • A
      Add periodic_compaction_seconds option to RocksJava (#8579) · 9ddb55a8
      Anatolii Zhmaiev 提交于
      Summary:
      Fixes https://github.com/facebook/rocksdb/issues/8578
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8579
      
      Reviewed By: ajkr
      
      Differential Revision: D29895081
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 3e4120e26a3e8252f8301d657c0aaa0b8550cddf
      9ddb55a8
    • B
      Add simple heuristics for experimental mempurge. (#8583) · 4361d6d1
      Baptiste Lemaire 提交于
      Summary:
      Add `experimental_mempurge_policy` option flag and introduce two new `MemPurge` (Memtable Garbage Collection) policies: 'ALWAYS' and 'ALTERNATE'. Default value: ALTERNATE.
      `ALWAYS`: every flush will first go through a `MemPurge` process. If the output is too big to fit into a single memtable, then the mempurge is aborted and a regular flush process carries on. `ALWAYS` is designed for user that need to reduce the number of L0 SST file created to a strict minimum, and can afford a small dent in performance (possibly hits to CPU usage, read efficiency, and maximum burst write throughput).
      `ALTERNATE`: a flush is transformed into a `MemPurge` except if one of the memtables being flushed is the product of a previous `MemPurge`. `ALTERNATE` is a good tradeoff between reduction in number of L0 SST files created and performance. `ALTERNATE` perform particularly well for completely random garbage ratios, or garbage ratios anywhere in (0%,50%], and even higher when there is a wild variability in garbage ratios.
      This PR also includes support for `experimental_mempurge_policy` in `db_bench`.
      Testing was done locally by replacing all the `MemPurge` policies of the unit tests with `ALTERNATE`, as well as local testing with `db_crashtest.py` `whitebox` and `blackbox`. Overall, if an `ALWAYS` mempurge policy passes the tests, there is no reasons why an `ALTERNATE` policy would fail, and therefore the mempurge policy was set to `ALWAYS` for all mempurge unit tests.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8583
      
      Reviewed By: pdillinger
      
      Differential Revision: D29888050
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: e2cf26646d66679f6f5fb29842624615610759c1
      4361d6d1
  8. 24 7月, 2021 4 次提交
    • J
      Disable DistributedMutex test by default (#8584) · daf7e77a
      Jay Zhuang 提交于
      Summary:
      DistributedMutex hasn't been used in the code base and enabling
      `USE_FOLLY_DISTRIBUTED_MUTEX` only runs the mutex tests from third-party
      lib. So disabling it for now.
      The implementation may also out of date, should re-sync with folly before
      using.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8584
      
      Test Plan: CI
      
      Reviewed By: ajkr
      
      Differential Revision: D29888960
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 3e75f73386c6ed03efb96a1400258d602a724f17
      daf7e77a
    • L
      db_bench_tool.cc: fix copy - paste (#8553) · 2febf1c4
      leipeng 提交于
      Summary:
      PR https://github.com/facebook/rocksdb/issues/8519 fix db_bench_tool.cc for MSVC build errors by simply copy-paste, this PR fix the copy-paste while also works for MSVC.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8553
      
      Reviewed By: ajkr
      
      Differential Revision: D29838056
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 0cd60c146b87a355c3dc1061dfe813169d75cea4
      2febf1c4
    • L
      CompactionJob::Install(): fix log truncation (#8563) · 4171e3db
      leipeng 提交于
      Summary:
      event log info may be truncated, the default buffer size is 512, this PR changes buffer size to 8192.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8563
      
      Reviewed By: ajkr
      
      Differential Revision: D29838229
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 00c5dea3caff0641a209f02c972e92d65b505f50
      4171e3db
    • M
      Checkpoint dir options fix (#8572) · 55f7ded8
      Merlin Mao 提交于
      Summary:
      Originally the 2 options `db_log_dir` and `wal_dir` will be reused in a snapshot db since the options files are just copied. By default, if `wal_dir` was not set when a db was created, it is set to the db's dir. Therefore, the snapshot db will use the same WAL dir. If both the original db and the snapshot db write to or delete from the WAL dir, one may modify or delete files which belong to the other. The same applies to `db_log_dir` as well, but as info log files are not copied or linked, it is simpler for this option.
      
      2 arguments are added to `Checkpoint::CreateCheckpoint()`, allowing to override these 2 options.
      
      `wal_dir`:  If the function argument `wal_dir` is empty, or set to the original db location, or the checkpoint location, the snapshot's `wal_dir` option will be updated to the checkpoint location. Otherwise, the absolute path specified in the argument will be used. During checkpointing, live WAL files will be copied or linked the new location, instead of the current WAL dir specified in the original db.
      
      `db_log_dir`: Same as `wal_dir`, but no files will be copied or linked.
      
      A new unit test was added: `CheckpointTest.CheckpointWithOptionsDirsTest`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8572
      
      Test Plan:
      New unit test
      ```
      checkpoint_test --gtest_filter="CheckpointTest.CheckpointWithOptionsDirsTest"
      ```
      
      Output
      ```
      Note: Google Test filter = CheckpointTest.CheckpointWithOptionsDirsTest
      [==========] Running 1 test from 1 test case.
      [----------] Global test environment set-up.
      [----------] 1 test from CheckpointTest
      [ RUN      ] CheckpointTest.CheckpointWithOptionsDirsTest
      [       OK ] CheckpointTest.CheckpointWithOptionsDirsTest (11712 ms)
      [----------] 1 test from CheckpointTest (11712 ms total)
      
      [----------] Global test environment tear-down
      [==========] 1 test from 1 test case ran. (11713 ms total)
      [  PASSED  ] 1 test.
      ```
      This test will fail without this patch. Just modify the code to remove the 2 arguments introduced in this patch in `CreateCheckpoint()`.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29832761
      
      Pulled By: autopear
      
      fbshipit-source-id: e6a639b4d674380df82998c0839e79cab695fe29
      55f7ded8
  9. 23 7月, 2021 5 次提交
    • D
      Fix a minor issue with initializing the test path (#8555) · 3b277252
      Drewryz 提交于
      Summary:
      The PerThreadDBPath has already specified a slash. It does not need to be specified when initializing the test path.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8555
      
      Reviewed By: ajkr
      
      Differential Revision: D29758399
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 6d2b878523e3e8580536e2829cb25489844d9011
      3b277252
    • B
      Retire superfluous functions introduced in earlier mempurge PRs. (#8558) · c521a9ab
      Baptiste Lemaire 提交于
      Summary:
      The main challenge to make the memtable garbage collection prototype (nicknamed `mempurge`) was to not get rid of WAL files that contain unflushed (but mempurged) data. That was successfully guaranteed by not writing the VersionEdit to the MANIFEST file after a successful mempurge.
      By not writing VersionEdits to the `MANIFEST` file after a succesful mempurge operation, we do not change the earliest log file number that contains unflushed data: `cfd->GetLogNumber()` (`cfd->SetLogNumber()` is only called in `VersionSet::ProcessManifestWrites`). As a result, a number of functions introduced earlier just for the mempurge operation are not obscolete/redundant. (e.g.: `FlushJob::ExtractEarliestLogFileNumber`), and this PR aims at cleaning up all these now-unnecessary functions. In particular, we no longer need to store the earliest log file number in the `MemTable` struct itself. This PR therefore also reverts the `MemTable` struct to its original form.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8558
      
      Test Plan: Already included in `db_flush_test.cc`.
      
      Reviewed By: anand1976
      
      Differential Revision: D29764351
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 0f43b260fa270251862512f397d3f24ee62e8437
      c521a9ab
    • Z
      Analyze MultiGet in trace_analyzer (#8575) · 61c9bd49
      Zhichao Cao 提交于
      Summary:
      Now we can analyze the MultiGet queries in the trace file and generate a set of the statistic and analysis files. Note that, when one MultiGet access N keys, we count each sub-get-query individually. But the over all query number is still the MultiGet not the sub-get-query.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8575
      
      Test Plan: added new unit test and make check
      
      Reviewed By: anand1976
      
      Differential Revision: D29860633
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: a132128527f36828d266df8e36e3ec626c2170be
      61c9bd49
    • Y
      Return error if trying to open secondary on missing or inaccessible primary (#8200) · 2e538817
      Yanqin Jin 提交于
      Summary:
      If the primary's CURRENT file is missing or inaccessible, the secondary should not hang
      trying repeatedly to switch to the next MANIFEST.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8200
      
      Test Plan: make check
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D27840627
      
      Pulled By: riversand963
      
      fbshipit-source-id: 071fed97cbab1bc5cdefd1dc235e5cd406c174e1
      2e538817
    • J
      Fix an race condition during multiple DB opening (#8574) · c4a503f3
      Jay Zhuang 提交于
      Summary:
      ObjectLibrary is shared between multiple DB instances, the
      Register() could have race condition.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8574
      
      Test Plan: pass the failed test
      
      Reviewed By: ajkr
      
      Differential Revision: D29855096
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 541eed0bd495d2c963d858d81e7eabf1ba16153c
      c4a503f3
  10. 22 7月, 2021 4 次提交
    • P
      Remove TaskLimiterToken::ReleaseOnce for fix (#8567) · 84eef260
      Peter Dillinger 提交于
      Summary:
      Rare TSAN and valgrind failures are caused by unnecessary
      reading of a field on the TaskLimiterToken::limiter_ for an assertion
      after the token has been released and the limiter destroyed. To simplify
      we can simply destroy the token before triggering DB shutdown
      (potentially destroying the limiter). This makes the ReleaseOnce logic
      unnecessary.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8567
      
      Test Plan: watch for more failures in CI
      
      Reviewed By: ajkr
      
      Differential Revision: D29811795
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 135549ebb98fe4f176d1542ed85d5bd6350a40b3
      84eef260
    • S
      Complete the fix of stress open WAL drop fix (#8570) · 9b41082d
      sdong 提交于
      Summary:
      https://github.com/facebook/rocksdb/pull/8548 is not complete. We should instead cover all cases writable files are buffered, not just when failures are ingested. Extend it to any case where failures are ingested in DB open.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8570
      
      Test Plan: Run db_stress and see it doesn't break
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29830415
      
      fbshipit-source-id: 94449a0468fb2f7eec17423724008c9c63b2445d
      9b41082d
    • J
      Avoid updating option if there's no value updated (#8518) · 42eaa45c
      Jay Zhuang 提交于
      Summary:
      Try avoid expensive updating options operation if
      `SetDBOptions()` does not change any option value.
      Skip updating is not guaranteed, for example, changing `bytes_per_sync`
      to `0` may still trigger updating, as the value could be sanitized.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8518
      
      Test Plan: added unittest
      
      Reviewed By: riversand963
      
      Differential Revision: D29672639
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: b7931de62ceea6f1bdff0d1209adf1197d3ed1f4
      42eaa45c
    • B
      Add overwrite_probability for filluniquerandom benchmark in db_bench (#8569) · 6b4cdacf
      Baptiste Lemaire 提交于
      Summary:
      Add flags `overwrite_probability` and `overwrite_window_size` flag to `db_bench`.
      Add the possibility of performing a `filluniquerandom` benchmark with an overwrite probability.
      For each write operation, there is a probability _p_ that the write is an overwrite (_p_=`overwrite_probability`).
      When an overwrite is decided, the key is randomly chosen from the last _N_ keys previously inserted into the DB (with _N_=`overwrite_window_size`).
      When a pure write is decided, the key inserted into the DB is unique and therefore will not be an overwrite.
      The `overwrite_window_size` is used so that the user can decide if the overwrite are mostly targeting recently inserted keys (when `overwrite_window_size` is small compared to the total number of writes), or can also target keys inserted "a long time ago" (when `overwrite_window_size` is comparable to total number of writes).
      Note that total number of writes = # of unique insertions + # of overwrites.
      No unit test specifically added.
      Local testing show the following **throughputs** for `filluniquerandom` with 1M total writes:
      - bypass the code inserts (no `overwrite_probability` flag specified): ~14.0MB/s
      - `overwrite_probability=0.99`, `overwrite_window_size=10`: ~17.0MB/s
      - `overwrite_probability=0.10`, `overwrite_window_size=10`: ~14.0MB/s
      - `overwrite_probability=0.99`, `overwrite_window_size=1M`: ~14.5MB/s
      - `overwrite_probability=0.10`, `overwrite_window_size=1M`: ~14.0MB/s
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8569
      
      Reviewed By: pdillinger
      
      Differential Revision: D29818631
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: d472b4ea4e457a4da7c4ee4f14b40cccd6a4587a
      6b4cdacf
  11. 21 7月, 2021 2 次提交
  12. 20 7月, 2021 1 次提交