1. 29 3月, 2021 1 次提交
  2. 24 3月, 2021 1 次提交
    • Y
      Fix an error while running db_crashtest for non-user-ts tests (#8091) · e1aa8c16
      Yanqin Jin 提交于
      Summary:
      Fix the following error while running `make crash_test`
      ```
      Traceback (most recent call last):
        File "tools/db_crashtest.py", line 705, in <module>
          main()
        File "tools/db_crashtest.py", line 696, in main
          blackbox_crash_main(args, unknown_args)
        File "tools/db_crashtest.py", line 479, in blackbox_crash_main
          + list({'db': dbname}.items())), unknown_args)
        File "tools/db_crashtest.py", line 414, in gen_cmd
          finalzied_params = finalize_and_sanitize(params)
        File "tools/db_crashtest.py", line 331, in finalize_and_sanitize
          dest_params.get("user_timestamp_size") > 0):
      TypeError: '>' not supported between instances of 'NoneType' and 'int'
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8091
      
      Test Plan: make crash_test
      
      Reviewed By: ltamasi
      
      Differential Revision: D27268276
      
      Pulled By: riversand963
      
      fbshipit-source-id: ed2873b9587ecc51e24abc35ef2bd3d91fb1ed1b
      e1aa8c16
  3. 23 3月, 2021 2 次提交
    • Y
      Add user-defined timestamps to db_stress (#8061) · 08144bc2
      Yanqin Jin 提交于
      Summary:
      Add some basic test for user-defined timestamp to db_stress. Currently,
      read with timestamp always tries to read using the current timestamp.
      Due to the per-key timestamp-sequence ordering constraint, we only add timestamp-
      related tests to the `NonBatchedOpsStressTest` since this test serializes accesses
      to the same key and uses a file to cross-check data correctness.
      The timestamp feature is not supported in a number of components, e.g. Merge, SingleDelete,
      DeleteRange, CompactionFilter, Readonly instance, secondary instance, SST file ingestion, transaction,
      etc. Therefore, db_stress should exit if user enables both timestamp and these features at the same
      time. The (currently) incompatible features can be found in
      `CheckAndSetOptionsForUserTimestamp`.
      
      This PR also fixes a bug triggered when timestamp is enabled together with
      `index_type=kBinarySearchWithFirstKey`. This bug fix will also be in another separate PR
      with more unit tests coverage. Fixing it here because I do not want to exclude the index type
      from crash test.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8061
      
      Test Plan: make crash_test_with_ts
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D27056282
      
      Pulled By: riversand963
      
      fbshipit-source-id: c3e00ad1023fdb9ebbdf9601ec18270c5e2925a9
      08144bc2
    • L
      Adjust the set of potential min_blob_size values in stress/crash tests (#8085) · 0d800dad
      Levi Tamasi 提交于
      Summary:
      Since our stress/crash tests by default generate values of size 8, 16, or 24,
      it does not make much sense to set `min_blob_size` to 256. The patch
      updates the set of potential `min_blob_size` values in the crash test
      script and in `db_stress` where it might be set dynamically using
      `SetOptions`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8085
      
      Test Plan: Ran `make check` and tried the crash test script.
      
      Reviewed By: riversand963
      
      Differential Revision: D27238620
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 4a96f9944b1ed9220d3045c5ab0b34c49009aeee
      0d800dad
  4. 02 3月, 2021 1 次提交
    • Y
      Enable compact filter for blob in dbstress and dbbench (#8011) · 1f11d07f
      Yanqin Jin 提交于
      Summary:
      As title.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8011
      
      Test Plan:
      ```
      ./db_bench -enable_blob_files=1 -use_keep_filter=1 -disable_auto_compactions=1
      /db_stress -enable_blob_files=1 -enable_compaction_filter=1 -acquire_snapshot_one_in=0 -compact_range_one_in=0 -iterpercent=0 -test_batches_snapshots=0 -readpercent=10 -prefixpercent=20 -writepercent=55 -delpercent=15 -continuous_verification_interval=0
      ```
      
      Reviewed By: ltamasi
      
      Differential Revision: D26736061
      
      Pulled By: riversand963
      
      fbshipit-source-id: 1c7834903c28431ce23324c4f259ed71255614e2
      1f11d07f
  5. 20 2月, 2021 1 次提交
    • A
      Limit buffering for collecting samples for compression dictionary (#7970) · d904233d
      Andrew Kryczka 提交于
      Summary:
      For dictionary compression, we need to collect some representative samples of the data to be compressed, which we use to either generate or train (when `CompressionOptions::zstd_max_train_bytes > 0`) a dictionary. Previously, the strategy was to buffer all the data blocks during flush, and up to the target file size during compaction. That strategy allowed us to randomly pick samples from as wide a range as possible that'd be guaranteed to land in a single output file.
      
      However, some users try to make huge files in memory-constrained environments, where this strategy can cause OOM. This PR introduces an option, `CompressionOptions::max_dict_buffer_bytes`, that limits how much data blocks are buffered before we switch to unbuffered mode (which means creating the per-SST dictionary, writing out the buffered data, and compressing/writing new blocks as soon as they are built). It is not strict as we currently buffer more than just data blocks -- also keys are buffered. But it does make a step towards giving users predictable memory usage.
      
      Related changes include:
      
      - Changed sampling for dictionary compression to select unique data blocks when there is limited availability of data blocks
      - Made use of `BlockBuilder::SwapAndReset()` to save an allocation+memcpy when buffering data blocks for building a dictionary
      - Changed `ParseBoolean()` to accept an input containing characters after the boolean. This is necessary since, with this PR, a value for `CompressionOptions::enabled` is no longer necessarily the final component in the `CompressionOptions` string.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7970
      
      Test Plan:
      - updated `CompressionOptions` unit tests to verify limit is respected (to the extent expected in the current implementation) in various scenarios of flush/compaction to bottommost/non-bottommost level
      - looked at jemalloc heap profiles right before and after switching to unbuffered mode during flush/compaction. Verified memory usage in buffering is proportional to the limit set.
      
      Reviewed By: pdillinger
      
      Differential Revision: D26467994
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3da4ef9fba59974e4ef40e40c01611002c861465
      d904233d
  6. 18 2月, 2021 1 次提交
    • L
      Add checkpoint support to BlobDB (#7959) · dab4fe5b
      Levi Tamasi 提交于
      Summary:
      The patch adds checkpoint support to BlobDB. Blob files are hard linked or
      copied, depending on whether the checkpoint directory is on the same filesystem
      or not, similarly to table files.
      
      TODO: Add support for blob files to `ExportColumnFamily` and to the checksum
      verification logic used by backup/restore.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7959
      
      Test Plan: Ran `make check` and the crash test for a while.
      
      Reviewed By: riversand963
      
      Differential Revision: D26434768
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 994be55a8dc08133028250760fca440d2c7c4dc5
      dab4fe5b
  7. 03 2月, 2021 1 次提交
    • L
      Add the integrated BlobDB to the stress/crash tests (#7900) · 0288bdbc
      Levi Tamasi 提交于
      Summary:
      The patch adds support for the options related to the new BlobDB implementation
      to `db_stress`, including support for dynamically adjusting them using `SetOptions`
      when `set_options_one_in` and a new flag `allow_setting_blob_options_dynamically`
      are specified. (The latter is used to prevent the options from being enabled when
      incompatible features are in use.)
      
      The patch also updates the `db_stress` help messages of the existing stacked BlobDB
      related options to clarify that they pertain to the old implementation. In addition, it
      adds the new BlobDB to the crash test script. In order to prevent a combinatorial explosion
      of jobs and still perform whitebox/blackbox testing (including under ASAN/TSAN/UBSAN),
      and to also test BlobDB in conjunction with atomic flush and transactions, the script sets
      the BlobDB options in 10% of normal/`cf_consistency`/`txn` crash test runs.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7900
      
      Test Plan: Ran `make check` and `db_stress`/`db_crashtest.py` with various options.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D26094913
      
      Pulled By: ltamasi
      
      fbshipit-source-id: c2ef3391a05e43a9687f24e297df05f4a5584814
      0288bdbc
  8. 30 1月, 2021 1 次提交
    • A
      Integrity protection for live updates to WriteBatch (#7748) · 78ee8564
      Andrew Kryczka 提交于
      Summary:
      This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
      
      The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
      
      When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
      
      Test Plan:
      - an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
      - add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
      - [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
      
      Reviewed By: pdillinger
      
      Differential Revision: D25754492
      
      Pulled By: ajkr
      
      fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
      78ee8564
  9. 18 12月, 2020 1 次提交
  10. 13 11月, 2020 1 次提交
    • P
      Experimental (production candidate) SST schema for Ribbon filter (#7658) · 60af9643
      Peter Dillinger 提交于
      Summary:
      Added experimental public API for Ribbon filter:
      NewExperimentalRibbonFilterPolicy(). This experimental API will
      take a "Bloom equivalent" bits per key, and configure the Ribbon
      filter for the same FP rate as Bloom would have but ~30% space
      savings. (Note: optimize_filters_for_memory is not yet implemented
      for Ribbon filter. That can be added with no effect on schema.)
      
      Internally, the Ribbon filter is configured using a "one_in_fp_rate"
      value, which is 1 over desired FP rate. For example, use 100 for 1%
      FP rate. I'm expecting this will be used in the future for configuring
      Bloom-like filters, as I expect people to more commonly hold constant
      the filter accuracy and change the space vs. time trade-off, rather than
      hold constant the space (per key) and change the accuracy vs. time
      trade-off, though we might make that available.
      
      ### Benchmarking
      
      ```
      $ ./filter_bench -impl=2 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing
      Building...
      Build avg ns/key: 34.1341
      Number of filters: 1993
      Total size (MB): 238.488
      Reported total allocated memory (MB): 262.875
      Reported internal fragmentation: 10.2255%
      Bits/key stored: 10.0029
      ----------------------------
      Mixed inside/outside queries...
        Single filter net ns/op: 18.7508
        Random filter net ns/op: 258.246
          Average FP rate %: 0.968672
      ----------------------------
      Done. (For more info, run with -legend or -help.)
      $ ./filter_bench -impl=3 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing
      Building...
      Build avg ns/key: 130.851
      Number of filters: 1993
      Total size (MB): 168.166
      Reported total allocated memory (MB): 183.211
      Reported internal fragmentation: 8.94626%
      Bits/key stored: 7.05341
      ----------------------------
      Mixed inside/outside queries...
        Single filter net ns/op: 58.4523
        Random filter net ns/op: 363.717
          Average FP rate %: 0.952978
      ----------------------------
      Done. (For more info, run with -legend or -help.)
      ```
      
      168.166 / 238.488 = 0.705  -> 29.5% space reduction
      
      130.851 / 34.1341 = 3.83x construction time for this Ribbon filter vs. lastest Bloom filter (could make that as little as about 2.5x for less space reduction)
      
      ### Working around a hashing "flaw"
      
      bloom_test discovered a flaw in the simple hashing applied in
      StandardHasher when num_starts == 1 (num_slots == 128), showing an
      excessively high FP rate.  The problem is that when many entries, on the
      order of number of hash bits or kCoeffBits, are associated with the same
      start location, the correlation between the CoeffRow and ResultRow (for
      efficiency) can lead to a solution that is "universal," or nearly so, for
      entries mapping to that start location. (Normally, variance in start
      location breaks the effective association between CoeffRow and
      ResultRow; the same value for CoeffRow is effectively different if start
      locations are different.) Without kUseSmash and with num_starts > 1 (thus
      num_starts ~= num_slots), this flaw should be completely irrelevant.  Even
      with 10M slots, the chances of a single slot having just 16 (or more)
      entries map to it--not enough to cause an FP problem, which would be local
      to that slot if it happened--is 1 in millions. This spreadsheet formula
      shows that: =1/(10000000*(1 - POISSON(15, 1, TRUE)))
      
      As kUseSmash==false (the setting for Standard128RibbonBitsBuilder) is
      intended for CPU efficiency of filters with many more entries/slots than
      kCoeffBits, a very reasonable work-around is to disallow num_starts==1
      when !kUseSmash, by making the minimum non-zero number of slots
      2*kCoeffBits. This is the work-around I've applied. This also means that
      the new Ribbon filter schema (Standard128RibbonBitsBuilder) is not
      space-efficient for less than a few hundred entries. Because of this, I
      have made it fall back on constructing a Bloom filter, under existing
      schema, when that is more space efficient for small filters. (We can
      change this in the future if we want.)
      
      TODO: better unit tests for this case in ribbon_test, and probably
      update StandardHasher for kUseSmash case so that it can scale nicely to
      small filters.
      
      ### Other related changes
      
      * Add Ribbon filter to stress/crash test
      * Add Ribbon filter to filter_bench as -impl=3
      * Add option string support, as in "filter_policy=experimental_ribbon:5.678;"
      where 5.678 is the Bloom equivalent bits per key.
      * Rename internal mode BloomFilterPolicy::kAuto to kAutoBloom
      * Add a general BuiltinFilterBitsBuilder::CalculateNumEntry based on
      binary searching CalculateSpace (inefficient), so that subclasses
      (especially experimental ones) don't have to provide an efficient
      implementation inverting CalculateSpace.
      * Minor refactor FastLocalBloomBitsBuilder for new base class
      XXH3pFilterBitsBuilder shared with new Standard128RibbonBitsBuilder,
      which allows the latter to fall back on Bloom construction in some
      extreme cases.
      * Mostly updated bloom_test for Ribbon filter, though a test like
      FullBloomTest::Schema is a next TODO to ensure schema stability
      (in case this becomes production-ready schema as it is).
      * Add some APIs to ribbon_impl.h for configuring Ribbon filters.
      Although these are reasonably covered by bloom_test, TODO more unit
      tests in ribbon_test
      * Added a "tool" FindOccupancyForSuccessRate to ribbon_test to get data
      for constructing the linear approximations in GetNumSlotsFor95PctSuccess.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7658
      
      Test Plan:
      Some unit tests updated but other testing is left TODO. This
      is considered experimental but laying down schema compatibility as early
      as possible in case it proves production-quality. Also tested in
      stress/crash test.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D24899349
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 9715f3e6371c959d923aea8077c9423c7a9f82b8
      60af9643
  11. 11 11月, 2020 1 次提交
    • A
      Fix crash test to run in DEBUG_LEVEL=0 mode in tmpfs (#7643) · 20260514
      Akanksha Mahajan 提交于
      Summary:
      crash tests donot run in DEBUG_MODE=0 on tmpfs when
      use_direct_reads/use_direct_io_for_flush_and_compaction is set randomly because
      direct I/O is not supported on tmpfs and tests exit.
      
      Fix: Sanitize direct I/O read options in DEBUG_LEVEL=0 so that crash
      tests can run in tmpfs. When mmap_reads is set, direct I/O reads options are
      unset so we can sanitize direct I/O reads options in case of tmpfs as well.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7643
      
      Test Plan:
      1. export DEBUG_LEVEL=0; export TEST_TMPDIR="/dev/shm";
                 export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0";
                 make crash_test -j64
                 2. In DEBUG_LEVEL=1 mode:  make crash_test -j64
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D24766550
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 021720b2343c12c72004f84b26147625d3991d9e
      20260514
  12. 04 11月, 2020 1 次提交
  13. 13 10月, 2020 1 次提交
  14. 12 10月, 2020 1 次提交
    • A
      Redesign block cache pinning API (#7520) · 75d3b6fd
      Andrew Kryczka 提交于
      Summary:
      The old flag-based APIs (`BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache` and `BlockBasedTableOptions::pin_top_level_index_and_filter`) were insufficient for our needs. For example, it was impossible to pin only unpartitioned meta-blocks, which could prevent block cache contention when turning on dictionary compression or during a migration to partitioned indexes/filters. It was also impossible to pin all meta-blocks in memory while having predictable memory usage via block cache. If we had continued adding flags to address these scenarios, they would have had significant overlap causing confusion. Instead, this PR deprecates the flags and starts a new API with non-overlapping options.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7520
      
      Test Plan:
      - new unit test
      - added new options to stress/crash test and ran for a while: `$ python tools/db_crashtest.py blackbox --simple --max_key=1000000 -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 --interval=10 -value_size_mult=33 -column_families=1 -reopen=0`
      
      Reviewed By: pdillinger
      
      Differential Revision: D24200034
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3fa7cfc71e7960f7a867511dd6ae5834dd73b13e
      75d3b6fd
  15. 09 10月, 2020 1 次提交
  16. 01 10月, 2020 1 次提交
    • S
      Stress test to support paranoid_file_checks (#7473) · aedcaaef
      sdong 提交于
      Summary:
      It's important to make sure no false positive is reported when options.paranoid_file_checks is used. Add it to stress test and a place holder in crash test. It is disabled in crash test as there appears to be a bug causing false positive.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7473
      
      Test Plan: Run crash test
      
      Reviewed By: ajkr
      
      Differential Revision: D24026939
      
      fbshipit-source-id: 89102acb45cf041776775ce44a4eef4b0f3a380c
      aedcaaef
  17. 09 9月, 2020 1 次提交
    • P
      Fix backup/restore in stress/crash test (#7357) · 4e258d3e
      Peter Dillinger 提交于
      Summary:
      (1) Skip check on specific key if restoring an old backup
      (small minority of cases) because it can fail in those cases. (2) Remove
      an old assertion about number of column families and number of keys
      passed in, which is broken by atomic flush (cf_consistency) test. Like
      other code (for better or worse) assume a single key and iterate over
      column families. (3) Apply mock_direct_io to NewSequentialFile so that
      db_stress backup works on /dev/shm.
      
      Also add more context to output in case of backup/restore db_stress
      failure.
      
      Also a minor fix to BackupEngine to report first failure status in
      creating new backup, and drop another clue about the potential
      source of a "Backup failed" status.
      
      Reverts "Disable backup/restore stress test (https://github.com/facebook/rocksdb/issues/7350)"
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7357
      
      Test Plan:
      Using backup_one_in=10000,
      "USE_CLANG=1 make crash_test_with_atomic_flush" for 30+ minutes
      "USE_CLANG=1 make blackbox_crash_test" for 30+ minutes
      And with use_direct_reads with TEST_TMPDIR=/dev/shm/rocksdb
      
      Reviewed By: riversand963
      
      Differential Revision: D23567244
      
      Pulled By: pdillinger
      
      fbshipit-source-id: e77171c2e8394d173917e36898c02dead1c40b77
      4e258d3e
  18. 05 9月, 2020 1 次提交
  19. 04 9月, 2020 2 次提交
    • P
      Add file checksum to stress/crash test (#7343) · 06ad5dd2
      Peter Dillinger 提交于
      Summary:
      This change has the crash test randomly select from a few file
      checksum implementations, or nullptr, for DB file_checksum_gen_factory.
      For compatibility across runs on same DB, each non-null factory can
      understand all the other functions, but the default changes.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7343
      
      Test Plan:
      'make blackbox_crash_test' for a while, including with some
      debug output to ensure code is being exercised.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D23494580
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 73bbc7ca32c1adaf619134c0c830f12894880b8a
      06ad5dd2
    • P
      Fix, enable, and enhance backup/restore in db_stress (#7348) · 499c9448
      Peter Dillinger 提交于
      Summary:
      Although added to db_stress, testing of backup/restore
      was never integrated into the crash test, originally concerned about
      performance. I've enabled it now and to address the peformance concern,
      testing backup/restore is always skipped once the db exceeds a certain
      size threshold, default 100MB. This should provide sufficient
      opportunity for testing BackupEngine without bogging down everything
      else with heavier and heavier operations.
      
      Also fixed backup/restore in db_stress by making sure PurgeOldBackups
      can remove manifest files, which are normally kept around for db_stress.
      
      Added more coverage of backup options, and up to three backups being
      saved in one backup directory (in some cases).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7348
      
      Test Plan:
      ran 'make blackbox_crash_test' for a while, with heightened
      probabilitly of taking backups (1/10k). Also confirmed with some debug
      output that the code is being covered, TestBackupRestore only takes
      a few seconds to complete when triggered, and even at 1/10k and ~50MB
      database, there's <,~ 1 thread testing backups at any time.
      
      Reviewed By: ajkr
      
      Differential Revision: D23510835
      
      Pulled By: pdillinger
      
      fbshipit-source-id: b6b8735591808141f81f10773ac31634cf03b6c0
      499c9448
  20. 11 8月, 2020 1 次提交
    • A
      Mark files for compaction in stress/crash tests (#7231) · 7eebe6d3
      Andrew Kryczka 提交于
      Summary:
      The mechanism to mark files for compaction is most commonly used in
      delete-triggered compaction. This PR adds an option to exercise the
      marking mechanism on random files created by db_stress. This PR also
      enables that option in db_crashtest.py on its db_stress runs at random.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7231
      
      Test Plan:
      - ran some minified crash tests; verified they succeed and we see `"compaction_reason": "FilesMarkedForCompaction"` regularly in the logs.
      
      ```
      $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --duration=600 --interval=30 --max_key=10000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33
      $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py whitebox --duration=600 --interval=30 --max_key=1000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33 --random_kill_odd=8887
      ```
      
      Reviewed By: anand1976
      
      Differential Revision: D23025156
      
      Pulled By: ajkr
      
      fbshipit-source-id: a404c467ebc12afa94dae35956ea9b372f592a96
      7eebe6d3
  21. 15 7月, 2020 1 次提交
  22. 24 6月, 2020 1 次提交
  23. 23 6月, 2020 1 次提交
    • P
      Minimize memory internal fragmentation for Bloom filters (#6427) · 5b2bbacb
      Peter Dillinger 提交于
      Summary:
      New experimental option BBTO::optimize_filters_for_memory builds
      filters that maximize their use of "usable size" from malloc_usable_size,
      which is also used to compute block cache charges.
      
      Rather than always "rounding up," we track state in the
      BloomFilterPolicy object to mix essentially "rounding down" and
      "rounding up" so that the average FP rate of all generated filters is
      the same as without the option. (YMMV as heavily accessed filters might
      be unluckily lower accuracy.)
      
      Thus, the option near-minimizes what the block cache considers as
      "memory used" for a given target Bloom filter false positive rate and
      Bloom filter implementation. There are no forward or backward
      compatibility issues with this change, though it only works on the
      format_version=5 Bloom filter.
      
      With Jemalloc, we see about 10% reduction in memory footprint (and block
      cache charge) for Bloom filters, but 1-2% increase in storage footprint,
      due to encoding efficiency losses (FP rate is non-linear with bits/key).
      
      Why not weighted random round up/down rather than state tracking? By
      only requiring malloc_usable_size, we don't actually know what the next
      larger and next smaller usable sizes for the allocator are. We pick a
      requested size, accept and use whatever usable size it has, and use the
      difference to inform our next choice. This allows us to narrow in on the
      right balance without tracking/predicting usable sizes.
      
      Why not weight history of generated filter false positive rates by
      number of keys? This could lead to excess skew in small filters after
      generating a large filter.
      
      Results from filter_bench with jemalloc (irrelevant details omitted):
      
          (normal keys/filter, but high variance)
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9
          Build avg ns/key: 29.6278
          Number of filters: 5516
          Total size (MB): 200.046
          Reported total allocated memory (MB): 220.597
          Reported internal fragmentation: 10.2732%
          Bits/key stored: 10.0097
          Average FP rate %: 0.965228
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
          Build avg ns/key: 30.5104
          Number of filters: 5464
          Total size (MB): 200.015
          Reported total allocated memory (MB): 200.322
          Reported internal fragmentation: 0.153709%
          Bits/key stored: 10.1011
          Average FP rate %: 0.966313
      
          (very few keys / filter, optimization not as effective due to ~59 byte
           internal fragmentation in blocked Bloom filter representation)
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9
          Build avg ns/key: 29.5649
          Number of filters: 162950
          Total size (MB): 200.001
          Reported total allocated memory (MB): 224.624
          Reported internal fragmentation: 12.3117%
          Bits/key stored: 10.2951
          Average FP rate %: 0.821534
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
          Build avg ns/key: 31.8057
          Number of filters: 159849
          Total size (MB): 200
          Reported total allocated memory (MB): 208.846
          Reported internal fragmentation: 4.42297%
          Bits/key stored: 10.4948
          Average FP rate %: 0.811006
      
          (high keys/filter)
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9
          Build avg ns/key: 29.7017
          Number of filters: 164
          Total size (MB): 200.352
          Reported total allocated memory (MB): 221.5
          Reported internal fragmentation: 10.5552%
          Bits/key stored: 10.0003
          Average FP rate %: 0.969358
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
          Build avg ns/key: 30.7131
          Number of filters: 160
          Total size (MB): 200.928
          Reported total allocated memory (MB): 200.938
          Reported internal fragmentation: 0.00448054%
          Bits/key stored: 10.1852
          Average FP rate %: 0.963387
      
      And from db_bench (block cache) with jemalloc:
      
          $ ./db_bench -db=/dev/shm/dbbench.no_optimize -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false
          $ ./db_bench -db=/dev/shm/dbbench -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -optimize_filters_for_memory -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false
          $ (for FILE in /dev/shm/dbbench.no_optimize/*.sst; do ./sst_dump --file=$FILE --show_properties | grep 'filter block' ; done) | awk '{ t += $4; } END { print t; }'
          17063835
          $ (for FILE in /dev/shm/dbbench/*.sst; do ./sst_dump --file=$FILE --show_properties | grep 'filter block' ; done) | awk '{ t += $4; } END { print t; }'
          17430747
          $ #^ 2.1% additional filter storage
          $ ./db_bench -db=/dev/shm/dbbench.no_optimize -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000
          rocksdb.block.cache.index.add COUNT : 33
          rocksdb.block.cache.index.bytes.insert COUNT : 8440400
          rocksdb.block.cache.filter.add COUNT : 33
          rocksdb.block.cache.filter.bytes.insert COUNT : 21087528
          rocksdb.bloom.filter.useful COUNT : 4963889
          rocksdb.bloom.filter.full.positive COUNT : 1214081
          rocksdb.bloom.filter.full.true.positive COUNT : 1161999
          $ #^ 1.04 % observed FP rate
          $ ./db_bench -db=/dev/shm/dbbench -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -optimize_filters_for_memory -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000
          rocksdb.block.cache.index.add COUNT : 33
          rocksdb.block.cache.index.bytes.insert COUNT : 8448592
          rocksdb.block.cache.filter.add COUNT : 33
          rocksdb.block.cache.filter.bytes.insert COUNT : 18220328
          rocksdb.bloom.filter.useful COUNT : 5360933
          rocksdb.bloom.filter.full.positive COUNT : 1321315
          rocksdb.bloom.filter.full.true.positive COUNT : 1262999
          $ #^ 1.08 % observed FP rate, 13.6% less memory usage for filters
      
      (Due to specific key density, this example tends to generate filters that are "worse than average" for internal fragmentation. "Better than average" cases can show little or no improvement.)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6427
      
      Test Plan: unit test added, 'make check' with gcc, clang and valgrind
      
      Reviewed By: siying
      
      Differential Revision: D22124374
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f3e3aa152f9043ddf4fae25799e76341d0d8714e
      5b2bbacb
  24. 20 6月, 2020 2 次提交
  25. 19 6月, 2020 1 次提交
    • A
      add `CompactionFilter` to stress/crash tests (#6988) · 775dc623
      Andrew Kryczka 提交于
      Summary:
      Added a `CompactionFilter` that is aware of the stress test's expected state. It only drops key versions that are already covered according to the expected state. It is incompatible with snapshots (same as all `CompactionFilter`s), so disables all snapshot-related features when used in the crash test.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6988
      
      Test Plan:
      running a minified blackbox crash test
      
      ```
      $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --max_key=1000000 -write_buffer_size=1048576 -max_bytes_for_level_base=4194304 -target_file_size_base=1048576 -value_size_mult=33 --interval=10 --duration=3600
      ```
      
      Reviewed By: anand1976
      
      Differential Revision: D22072888
      
      Pulled By: ajkr
      
      fbshipit-source-id: 727b9d7a90d5eab18be0ec6cd5a810712ac13320
      775dc623
  26. 13 6月, 2020 1 次提交
    • Y
      Add stress test for best-efforts recovery (#6819) · 15d9f28d
      Yanqin Jin 提交于
      Summary:
      Add crash test for the case of best-efforts recovery.
      After a certain amount of time, we kill the db_stress process, randomly delete some certain table files and restart db_stress. Given the randomness of file deletion, it is difficult to verify against a reference for data correctness. Therefore, we just check that the db can restart successfully.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6819
      
      Test Plan:
      ```
      ./db_stress -best_efforts_recovery=true -disable_wal=1 -reopen=0
      ./db_stress -best_efforts_recovery=true -disable_wal=0 -skip_verifydb=1 -verify_db_one_in=0 -continuous_verification_interval=0
      make crash_test_with_best_efforts_recovery
      ```
      
      Reviewed By: anand1976
      
      Differential Revision: D21436753
      
      Pulled By: riversand963
      
      fbshipit-source-id: 0b3605c922a16c37ed17d5ab6682ca4240e47926
      15d9f28d
  27. 30 5月, 2020 1 次提交
    • P
      Allow missing "unversioned" python, as in CentOS 8 (#6883) · 0c56fc4d
      Peter Dillinger 提交于
      Summary:
      RocksDB Makefile was assuming existence of 'python' command,
      which is not present in CentOS 8. We avoid using 'python' if 'python3' is available.
      
      Also added fancy logic to format-diff.sh to make clang-format-diff.py for Python2 work even with Python3 only (as some CentOS 8 FB machines come equipped)
      
      Also, now use just 'python3' for PYTHON if not found so that an informative
      "command not found" error will result rather than something weird.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6883
      
      Test Plan: manually tried some variants, 'make check' on a fresh CentOS 8 machine without 'python' executable or Python2 but with clang-format-diff.py for Python2.
      
      Reviewed By: gg814
      
      Differential Revision: D21767029
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 54761b376b140a3922407bdc462f3572f461d0e9
      0c56fc4d
  28. 08 5月, 2020 1 次提交
  29. 07 5月, 2020 1 次提交
    • A
      cover single level universal in crash test (#6818) · 1f20df2f
      Andrew Kryczka 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6818
      
      Test Plan:
      fast whitebox test and verify there are some single-level universal and
      some multi-level universal runs.
      
      ```
      $ python ./tools/db_crashtest.py whitebox --simple -max_key=1000000 -value_size_mult=33 -write_buffer_size=524288 -target_file_size_base=524288 -max_bytes_for_level_base=2097152 --duration=120 --interval=10 --ops_per_thread=1000 --random_kill_odd=887
      ```
      
      Reviewed By: riversand963
      
      Differential Revision: D21432138
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2fc5ba9f3dfa49bb11e81da7dd00a17b476e64d7
      1f20df2f
  30. 01 5月, 2020 1 次提交
  31. 25 4月, 2020 1 次提交
    • C
      Disable O_DIRECT in stress test when db directory does not support direct IO (#6727) · 0a776178
      Cheng Chang 提交于
      Summary:
      In crash test, the db directory might be set to /dev/shm or /tmp, in certain environments such as internal testing infrastructure, neither of these directories support direct IO, so direct IO is never enabled in crash test.
      
      This PR sets up SyncPoints in direct IO related code paths to disable O_DIRECT flag in calls to `open`, so the direct IO code paths will be executed, all direct IO related assertions will be checked, but no real direct IO request will be issued to the file system.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6727
      
      Test Plan:
      export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0"
      make -j24 crash_test
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D21139250
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: db9adfe78d91aa4759835b1af91c5db7b27b62ee
      0a776178
  32. 21 4月, 2020 1 次提交
  33. 18 4月, 2020 1 次提交
  34. 17 4月, 2020 2 次提交
    • S
      crash_test to cover options.avoid_flush_during_recovery (#6712) · 73523bae
      sdong 提交于
      Summary:
      Options.avoid_flush_during_recovery is uncovered in crash_test. Add the coverage with a chance of 1/8, as it is a less frequently used options.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6712
      
      Test Plan: Run crash_test and see the option can be used or not used by chance.
      
      Reviewed By: ltamasi
      
      Differential Revision: D21056566
      
      fbshipit-source-id: c3b1521517cfc204786e6ef8c6acd7fffda64793
      73523bae
    • Y
      Add env_fault_injection argument to db_stress (#6687) · 5801af46
      Yueh-Hsuan Chiang 提交于
      Summary:
      Add env_fault_injection argument to db_stress.  When enabled,
      FaultInjectionTestEnv will be used instead.  Currently this
      option does not support running with other env setting.
      
      This will allow
      us to later manually produce error when running db_crashtest.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6687
      
      Test Plan:
      make db_stress -j32
      ./db_stress --env_fault_injection
      ./db_stress --env_fault_injection --hdfs   // expect error message
      
      Reviewed By: ajkr
      
      Differential Revision: D21014683
      
      Pulled By: yhchiang
      
      fbshipit-source-id: 0724aeac37efd57adb72a37defe6dbd3bfa8106a
      5801af46
  35. 16 4月, 2020 1 次提交
  36. 14 4月, 2020 1 次提交