1. 27 6月, 2020 3 次提交
    • A
      Fix for TSAN failure in DeleteScheduler (#7029) · b9d51b86
      Akanksha Mahajan 提交于
      Summary:
      TSAN failure caused by setting statistics in SstFileManager and DeleteScheduler.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7029
      
      Test Plan:
      1. make check -j64
                 2. COMPILE_WITH_TSAN=1 make check -j64
      
      Reviewed By: siying, zhichao-cao
      
      Differential Revision: D22223418
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: c5bf336d711b787908dfeb6166cab4aa2e494d61
      b9d51b86
    • Z
      `BackupEngine::VerifyBackup` verifies checksum by default (#7014) · 1569dc48
      Zitan Chen 提交于
      Summary:
      A parameter `verify_with_checksum` is added to `BackupEngine::VerifyBackup`, which is true by default. So now `BackupEngine::VerifyBackup` verifies backup files with checksum AND file size by default. When `verify_with_checksum` is false, `BackupEngine::VerifyBackup` only compares file sizes to verify backup files.
      
      Also add a test for the case when corruption does not change the file size.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7014
      
      Test Plan: Passed backupable_db_test
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22165590
      
      Pulled By: gg814
      
      fbshipit-source-id: 606a7450714e868bceb38598c89fd356c6004f4f
      1569dc48
    • S
      Add unity build to CircleCI (#7026) · f9817201
      sdong 提交于
      Summary:
      We are still keeping unity build working. So it's a good idea to add to a pre-commit CI.
      A latest GCC docker image just to get a little bit more coverage. Fix three small issues to make it pass.
      Also make unity_test to run db_basic_test rather than db_test to cut the test time. There is no point to run expensive tests here. It was set to run db_test before db_basic_test was separated out.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7026
      
      Test Plan: watch tests to pass.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22223197
      
      fbshipit-source-id: baa3b6cbb623bf359829b63ce35715c75bcb0ed4
      f9817201
  2. 26 6月, 2020 9 次提交
  3. 25 6月, 2020 6 次提交
    • Z
      Update HISTORY.md to include the Public API Change for DB::OpenForReadonly... · 95fbb62c
      Zitan Chen 提交于
      Update HISTORY.md to include the Public API Change for DB::OpenForReadonly introduced earlier (#7023)
      
      Summary:
      `DB::OpenForReadOnly()` now returns `Status::NotFound` when the specified DB directory does not exist. Previously the error returned depended on the underlying `Env`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7023
      
      Reviewed By: ajkr
      
      Differential Revision: D22207845
      
      Pulled By: gg814
      
      fbshipit-source-id: f35830811a0e67efb0ee82eda3a9739bc526baba
      95fbb62c
    • Z
      Add a new option for BackupEngine to store table files under shared_checksum... · be41c61f
      Zitan Chen 提交于
      Add a new option for BackupEngine to store table files under shared_checksum using DB session id in the backup filenames (#6997)
      
      Summary:
      `BackupableDBOptions::new_naming_for_backup_files` is added. This option is false by default. When it is true, backup table filenames under directory shared_checksum are of the form `<file_number>_<crc32c>_<db_session_id>.sst`.
      
      Note that when this option is true, it comes into effect only when both `share_files_with_checksum` and `share_table_files` are true.
      
      Three new test cases are added.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6997
      
      Test Plan: Passed make check.
      
      Reviewed By: ajkr
      
      Differential Revision: D22098895
      
      Pulled By: gg814
      
      fbshipit-source-id: a1d9145e7fe562d71cde7ac995e17cb24fd42e76
      be41c61f
    • Y
      First step towards handling MANIFEST write error (#6949) · e66199d8
      Yanqin Jin 提交于
      Summary:
      This PR provides preliminary support for handling IO error during MANIFEST write.
      File write/sync is not guaranteed to be atomic. If we encounter an IOError while writing/syncing to the MANIFEST file, we cannot be sure about the state of the MANIFEST file. The version edits may or may not have reached the file. During cleanup, if we delete the newly-generated SST files referenced by the pending version edit(s), but the version edit(s) actually are persistent in the MANIFEST, then next recovery attempt will process the version edits(s) and then fail since the SST files have already been deleted.
      One approach is to truncate the MANIFEST after write/sync error, so that it is safe to delete the SST files. However, file truncation may not be supported on certain file systems. Therefore, we take the following approach.
      If an IOError is detected during MANIFEST write/sync, we disable file deletions for the faulty database. Depending on whether the IOError is retryable (set by underlying file system), either RocksDB or application can call `DB::Resume()`, or simply shutdown and restart. During `Resume()`, RocksDB will try to switch to a new MANIFEST and write all existing in-memory version storage in the new file. If this succeeds, then RocksDB may proceed. If all recovery is completed, then file deletions will be re-enabled.
      Note that multiple threads can call `LogAndApply()` at the same time, though only one of them will be going through the process MANIFEST write, possibly batching the version edits of other threads. When the leading MANIFEST writer finishes, all of the MANIFEST writing threads in this batch will have the same IOError. They will all call `ErrorHandler::SetBGError()` in which file deletion will be disabled.
      
      Possible future directions:
      - Add an `ErrorContext` structure so that it is easier to pass more info to `ErrorHandler`. Currently, as in this example, a new `BackgroundErrorReason` has to be added.
      
      Test plan (dev server):
      make check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6949
      
      Reviewed By: anand1976
      
      Differential Revision: D22026020
      
      Pulled By: riversand963
      
      fbshipit-source-id: f3c68a2ef45d9b505d0d625c7c5e0c88495b91c8
      e66199d8
    • S
      Test CircleCI with CLANG-10 (#7025) · 9cc25190
      sdong 提交于
      Summary:
      It's useful to build RocksDB using a more recent clang version in CI. Add a CircleCI build and fix some issues with it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7025
      
      Test Plan: See all tests pass.
      
      Reviewed By: pdillinger
      
      Differential Revision: D22215700
      
      fbshipit-source-id: 914a729c2cd3f3ac4a627cc0ac58d4691dca2168
      9cc25190
    • S
      Fix unity build broken by #7007 (#7024) · 50d69698
      sdong 提交于
      Summary:
      https://github.com/facebook/rocksdb/pull/7007 broken the unity build. Fix it by moving the const inside the function
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7024
      
      Test Plan: make unity and see it to build.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22212028
      
      fbshipit-source-id: 5daff7383b691808164d4745ab543238502d946b
      50d69698
    • Z
      Fix the memory leak in Env_basic_test (#7017) · 83a4dd1a
      Zhichao Cao 提交于
      Summary:
      Fix the memory leak broken asan and other test introduced by https://github.com/facebook/rocksdb/issues/6830
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7017
      
      Test Plan: pass asan_check
      
      Reviewed By: siying
      
      Differential Revision: D22190289
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 03a095f698b4f9d72fd9374191b17c890d7c2b56
      83a4dd1a
  4. 24 6月, 2020 2 次提交
  5. 23 6月, 2020 3 次提交
    • P
      Minimize memory internal fragmentation for Bloom filters (#6427) · 5b2bbacb
      Peter Dillinger 提交于
      Summary:
      New experimental option BBTO::optimize_filters_for_memory builds
      filters that maximize their use of "usable size" from malloc_usable_size,
      which is also used to compute block cache charges.
      
      Rather than always "rounding up," we track state in the
      BloomFilterPolicy object to mix essentially "rounding down" and
      "rounding up" so that the average FP rate of all generated filters is
      the same as without the option. (YMMV as heavily accessed filters might
      be unluckily lower accuracy.)
      
      Thus, the option near-minimizes what the block cache considers as
      "memory used" for a given target Bloom filter false positive rate and
      Bloom filter implementation. There are no forward or backward
      compatibility issues with this change, though it only works on the
      format_version=5 Bloom filter.
      
      With Jemalloc, we see about 10% reduction in memory footprint (and block
      cache charge) for Bloom filters, but 1-2% increase in storage footprint,
      due to encoding efficiency losses (FP rate is non-linear with bits/key).
      
      Why not weighted random round up/down rather than state tracking? By
      only requiring malloc_usable_size, we don't actually know what the next
      larger and next smaller usable sizes for the allocator are. We pick a
      requested size, accept and use whatever usable size it has, and use the
      difference to inform our next choice. This allows us to narrow in on the
      right balance without tracking/predicting usable sizes.
      
      Why not weight history of generated filter false positive rates by
      number of keys? This could lead to excess skew in small filters after
      generating a large filter.
      
      Results from filter_bench with jemalloc (irrelevant details omitted):
      
          (normal keys/filter, but high variance)
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9
          Build avg ns/key: 29.6278
          Number of filters: 5516
          Total size (MB): 200.046
          Reported total allocated memory (MB): 220.597
          Reported internal fragmentation: 10.2732%
          Bits/key stored: 10.0097
          Average FP rate %: 0.965228
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
          Build avg ns/key: 30.5104
          Number of filters: 5464
          Total size (MB): 200.015
          Reported total allocated memory (MB): 200.322
          Reported internal fragmentation: 0.153709%
          Bits/key stored: 10.1011
          Average FP rate %: 0.966313
      
          (very few keys / filter, optimization not as effective due to ~59 byte
           internal fragmentation in blocked Bloom filter representation)
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9
          Build avg ns/key: 29.5649
          Number of filters: 162950
          Total size (MB): 200.001
          Reported total allocated memory (MB): 224.624
          Reported internal fragmentation: 12.3117%
          Bits/key stored: 10.2951
          Average FP rate %: 0.821534
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
          Build avg ns/key: 31.8057
          Number of filters: 159849
          Total size (MB): 200
          Reported total allocated memory (MB): 208.846
          Reported internal fragmentation: 4.42297%
          Bits/key stored: 10.4948
          Average FP rate %: 0.811006
      
          (high keys/filter)
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9
          Build avg ns/key: 29.7017
          Number of filters: 164
          Total size (MB): 200.352
          Reported total allocated memory (MB): 221.5
          Reported internal fragmentation: 10.5552%
          Bits/key stored: 10.0003
          Average FP rate %: 0.969358
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
          Build avg ns/key: 30.7131
          Number of filters: 160
          Total size (MB): 200.928
          Reported total allocated memory (MB): 200.938
          Reported internal fragmentation: 0.00448054%
          Bits/key stored: 10.1852
          Average FP rate %: 0.963387
      
      And from db_bench (block cache) with jemalloc:
      
          $ ./db_bench -db=/dev/shm/dbbench.no_optimize -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false
          $ ./db_bench -db=/dev/shm/dbbench -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -optimize_filters_for_memory -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false
          $ (for FILE in /dev/shm/dbbench.no_optimize/*.sst; do ./sst_dump --file=$FILE --show_properties | grep 'filter block' ; done) | awk '{ t += $4; } END { print t; }'
          17063835
          $ (for FILE in /dev/shm/dbbench/*.sst; do ./sst_dump --file=$FILE --show_properties | grep 'filter block' ; done) | awk '{ t += $4; } END { print t; }'
          17430747
          $ #^ 2.1% additional filter storage
          $ ./db_bench -db=/dev/shm/dbbench.no_optimize -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000
          rocksdb.block.cache.index.add COUNT : 33
          rocksdb.block.cache.index.bytes.insert COUNT : 8440400
          rocksdb.block.cache.filter.add COUNT : 33
          rocksdb.block.cache.filter.bytes.insert COUNT : 21087528
          rocksdb.bloom.filter.useful COUNT : 4963889
          rocksdb.bloom.filter.full.positive COUNT : 1214081
          rocksdb.bloom.filter.full.true.positive COUNT : 1161999
          $ #^ 1.04 % observed FP rate
          $ ./db_bench -db=/dev/shm/dbbench -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -optimize_filters_for_memory -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000
          rocksdb.block.cache.index.add COUNT : 33
          rocksdb.block.cache.index.bytes.insert COUNT : 8448592
          rocksdb.block.cache.filter.add COUNT : 33
          rocksdb.block.cache.filter.bytes.insert COUNT : 18220328
          rocksdb.bloom.filter.useful COUNT : 5360933
          rocksdb.bloom.filter.full.positive COUNT : 1321315
          rocksdb.bloom.filter.full.true.positive COUNT : 1262999
          $ #^ 1.08 % observed FP rate, 13.6% less memory usage for filters
      
      (Due to specific key density, this example tends to generate filters that are "worse than average" for internal fragmentation. "Better than average" cases can show little or no improvement.)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6427
      
      Test Plan: unit test added, 'make check' with gcc, clang and valgrind
      
      Reviewed By: siying
      
      Differential Revision: D22124374
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f3e3aa152f9043ddf4fae25799e76341d0d8714e
      5b2bbacb
    • M
      Make EncryptEnv inheritable (#6830) · 1092f19d
      Matthew Von-Maszewski 提交于
      Summary:
      EncryptEnv class is both declared and defined within env_encryption.cc.  This makes it really tough to derive new classes from that base.
      
      This branch moves declaration of the class to rocksdb/env_encryption.h.  The change facilitates making new encryption modules (such as an upcoming openssl AES CTR pull request) possible / easy.
      
      The only coding change was to add the EncryptEnv object to env_basic_test.cc.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6830
      
      Reviewed By: riversand963
      
      Differential Revision: D21706593
      
      Pulled By: ajkr
      
      fbshipit-source-id: 64d2da95a1569ceeb9b1549c3bec5404cf4c89f0
      1092f19d
    • Z
      Fix double define in IO_tracer (#7007) · d739318b
      Zhichao Cao 提交于
      Summary:
      Fix the following error
      
      "./trace_replay/io_tracer.h:20:20: error: redefinition of ‘const unsigned int rocksdb::{anonymous}::kCharSize’
       const unsigned int kCharSize = 1;
                          ^~~~~~~~~
      In file included from unity.cc:177:
      trace_replay/block_cache_tracer.cc:22:20: note: ‘const unsigned int rocksdb::{anonymous}::kCharSize’ previously defined here
       const unsigned int kCharSize = 1;"
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7007
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D22142618
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: e6dcd51ccc21d1f58df52cdc7a1c88e54cf4f6e8
      d739318b
  6. 20 6月, 2020 5 次提交
    • S
      Remove CircleCI clang build's verbose output (#7000) · 096beb78
      sdong 提交于
      Summary:
      As CirclrCI build's clang build is stable, verbose flag is less useful. On the other hand, the long outputs might create other problems. A non-reproducible failure "make: write error: stdout" might be related to it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7000
      
      Test Plan: Watch the run
      
      Reviewed By: pdillinger
      
      Differential Revision: D22118870
      
      fbshipit-source-id: a4157a4282adddcb0c55c0e9e53b2d9ce18bda66
      096beb78
    • S
      Remove an assertion in FlushAfterIntraL0CompactionCheckConsistencyFail (#7003) · dea4063b
      sdong 提交于
      Summary:
      FlushAfterIntraL0CompactionCheckConsistencyFail is flakey. It sometimes fails with:
      
      db/db_compaction_test.cc:5186: Failure
      Expected equality of these values:
        10
        NumTableFilesAtLevel(0)
          Which is: 3
      
      I don't see a clear reason why the assertion would always be true. The necessarily of the assertion is not clear either. Remove it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7003
      
      Test Plan: See the test still builds.
      
      Reviewed By: riversand963
      
      Differential Revision: D22129753
      
      fbshipit-source-id: 42f0bb05e32b369e8d726bfd3e35c29cf52fe008
      dea4063b
    • P
      Fix block checksum for >=4GB, refactor (#6978) · 25a0d0ca
      Peter Dillinger 提交于
      Summary:
      Although RocksDB falls over in various other ways with KVs
      around 4GB or more, this change fixes how XXH32 and XXH64 were being
      called by the block checksum code to support >= 4GB in case that should
      ever happen, or the code copied for other uses.
      
      This change is not a schema compatibility issue because the checksum
      verification code would checksum the first (block_size + 1) mod 2^32
      bytes while the checksum construction code would checksum the first
      block_size mod 2^32 plus the compression type byte, meaning the
      XXH32/64 checksums for >=4GB block would not match about 255/256 times.
      
      While touching this code, I refactored to consolidate redundant
      implementations, improving diagnostics and performance tracking in some
      cases. Also used less confusing language in those diagnostics.
      
      Makes https://github.com/facebook/rocksdb/issues/6875 obsolete.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6978
      
      Test Plan:
      I was able to write a test for this using an SST file writer
      and VerifyChecksum in a reader. The test fails before the fix, though
      I'm leaving the test disabled because I don't think it's worth the
      expense of running regularly.
      
      Reviewed By: gg814
      
      Differential Revision: D22143260
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 982993d16134e8c50bea2269047f901c1783726e
      25a0d0ca
    • A
      minor fixes for stress/crash contruns (#7006) · d76eed48
      Andrew Kryczka 提交于
      Summary:
      Avoid using `cf_consistency` together with `enable_compaction_filter` as
      the former heavily uses snapshots while the latter is incompatible with
      snapshots.
      
      Also fix a clang-analyze error for a write to a variable that is never
      read.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7006
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22141679
      
      Pulled By: ajkr
      
      fbshipit-source-id: 1840ae238168818a9ab5973f90fd78c067399447
      d76eed48
    • P
      Remove racially charged terms "whitelist" and "blacklist" (#7008) · 88b42107
      Peter Dillinger 提交于
      Summary:
      We don't need them.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7008
      
      Test Plan: "make check" and ensure "make crash_test" starts
      
      Reviewed By: ajkr
      
      Differential Revision: D22143838
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 72c8e16603abc59f4954e304466bc4dc1f58f94e
      88b42107
  7. 19 6月, 2020 7 次提交
  8. 18 6月, 2020 2 次提交
    • S
      Fix the bug that compressed cache is disabled in read-only DBs (#6990) · 223b57ee
      sdong 提交于
      Summary:
      Compressed block cache is disabled in https://github.com/facebook/rocksdb/pull/4650 for no good reason. Re-enable it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6990
      
      Test Plan: Add a unit test to make sure a general function works with read-only DB + compressed block cache.
      
      Reviewed By: ltamasi
      
      Differential Revision: D22072755
      
      fbshipit-source-id: 2a55df6363de23a78979cf6c747526359e5dc7a1
      223b57ee
    • Z
      Store DB identity and DB session ID in SST files (#6983) · 94d04529
      Zitan Chen 提交于
      Summary:
      `db_id` and `db_session_id` are now part of the table properties for all formats and stored in SST files. This adds about 99 bytes to each new SST file.
      
      The `TablePropertiesNames` for these two identifiers are `rocksdb.creating.db.identity` and `rocksdb.creating.session.identity`.
      
      In addition, SST files generated from SstFileWriter and Repairer have DB identity “SST Writer” and “DB Repairer”, respectively. Their DB session IDs are generated in the same way as `DB::GetDbSessionId`.
      
      A table property test is added.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6983
      
      Test Plan: make check and some manual tests.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22048826
      
      Pulled By: gg814
      
      fbshipit-source-id: afdf8c11424a6f509b5c0b06dafad584a80103c9
      94d04529
  9. 17 6月, 2020 2 次提交
  10. 16 6月, 2020 1 次提交
    • Y
      Let best-efforts recovery ignore CURRENT file (#6970) · 9bfd46d0
      Yanqin Jin 提交于
      Summary:
      Best-efforts recovery does not check the content of CURRENT file to determine which MANIFEST to recover from. However, it still checks the presence of CURRENT file to determine whether to create a new DB during `open()`. Therefore, we can tweak the logic in `open()` a little bit so that best-efforts recovery does not rely on CURRENT file at all.
      
      Test plan (dev server):
      make check
      ./db_basic_test --gtest_filter=DBBasicTest.RecoverWithNoCurrentFile
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6970
      
      Reviewed By: anand1976
      
      Differential Revision: D22013990
      
      Pulled By: riversand963
      
      fbshipit-source-id: db552a1868c60ed70e1f7cd252a3a076eb8ea58f
      9bfd46d0