1. 09 9月, 2020 2 次提交
    • J
      Fix compile error for old gcc-4.8 (#7358) · 8a8a01c6
      Jay Zhuang 提交于
      Summary:
      gcc-4.8 returns error when using the constructor. Not sure if it's a compiler bug/limitation or code issue:
      ```
      table/block_based/block_based_table_reader.cc:3183:67: error: use of deleted function ‘rocksdb::WritableFileStringStreamAdapter::WritableFileStringStreamAdapter(rocksdb::WritableFileStringStreamAdapter&&)’
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7358
      
      Reviewed By: pdillinger
      
      Differential Revision: D23577651
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: b0197e3d3538da61a6f3866410d88d2047fb9695
      8a8a01c6
    • A
      Store FSWritableFilePtr object in WritableFileWriter (#7193) · b175eceb
      Akanksha Mahajan 提交于
      Summary:
      Replace FSWritableFile pointer with FSWritableFilePtr
          object in WritableFileWriter.
          This new object wraps FSWritableFile pointer.
      
          Objective: If tracing is enabled, FSWritableFile Ptr returns
          FSWritableFileTracingWrapper pointer that includes all necessary
          information in IORecord and calls underlying FileSystem and invokes
          IOTracer to dump that record in a binary file. If tracing is disabled
          then, underlying FileSystem pointer is returned directly.
          FSWritableFilePtr wrapper class is added to bypass the
          FSWritableFileWrapper when
          tracing is disabled.
      
          Test Plan: make check -j64
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7193
      
      Reviewed By: anand1976
      
      Differential Revision: D23355915
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: e62a27a13c1fd77e36a6dbafc7006d969bed25cf
      b175eceb
  2. 05 9月, 2020 1 次提交
  3. 04 9月, 2020 1 次提交
    • A
      fix SstFileWriter with dictionary compression (#7323) · af54c409
      Andrew Kryczka 提交于
      Summary:
      In block-based table builder, the cut-over from buffered to unbuffered
      mode involves sampling the buffered blocks and generating a dictionary.
      There was a bug where `SstFileWriter` passed zero as the `target_file_size`
      causing the cutover to happen immediately, so there were no samples
      available for generating the dictionary.
      
      This PR changes the meaning of `target_file_size == 0` to mean buffer
      the whole file before cutting over. It also adds dictionary compression
      support to `sst_dump --command=recompress` for easy evaluation.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7323
      
      Reviewed By: cheng-chang
      
      Differential Revision: D23412158
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3b232050e70ef3c2ee85a4b5f6fadb139c569873
      af54c409
  4. 03 9月, 2020 1 次提交
  5. 28 8月, 2020 1 次提交
  6. 26 8月, 2020 1 次提交
    • S
      Get() to fail with underlying failures in PartitionIndexReader::CacheDependencies() (#7297) · 722814e3
      sdong 提交于
      Summary:
      Right now all I/O failures under PartitionIndexReader::CacheDependencies() is swallowed. This doesn't impact correctness but we've made a decision that any I/O error in read path now should be returned to users for awareness. Return errors in those cases instead.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7297
      
      Test Plan: Add a new unit test that ingest errors in this code path and see Get() fails. Only one I/O path is hit in PartitionIndexReader::CacheDependencies(). Several option changes are attempt but not able to got other pread paths triggered. Not sure whether other failure cases would be even possible. Would rely on continuous stress test to validate it.
      
      Reviewed By: anand1976
      
      Differential Revision: D23257950
      
      fbshipit-source-id: 859dbc92fa239996e1bb378329344d3d54168c03
      722814e3
  7. 25 8月, 2020 1 次提交
  8. 21 8月, 2020 1 次提交
  9. 13 8月, 2020 1 次提交
    • L
      Clean up CompressBlock/CompressBlockInternal a bit (#7249) · 9d6f48ec
      Levi Tamasi 提交于
      Summary:
      The patch cleans up and refactors `CompressBlock` and `CompressBlockInternal` a bit.
      In particular, it does the following:
      * It renames `CompressBlockInternal` to `CompressData` and moves it to `util/compression.h`,
      where other general compression-related utilities are located. This will facilitate reuse in the
      BlobDB write path.
      * The signature of the method is changed so it now takes `compression_format_version`
      (similarly to the compression library specific methods) instead of `format_version` (which is
      specific to the block based table).
      * `GetCompressionFormatForVersion` no longer takes `compression_type` as a parameter.
      This parameter was only used in a (not entirely up-to-date) assertion; also, removing it
      eliminates the need to ensure this precondition holds at all call sites.
      * Does some minor cleanup in `CompressBlock`, for instance, it is now possible to pass
      only one of `sampled_output_fast` and `sampled_output_slow`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7249
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D23087278
      
      Pulled By: ltamasi
      
      fbshipit-source-id: e6316e45baed8b4e7de7c1780c90501c2a3439b3
      9d6f48ec
  10. 11 8月, 2020 1 次提交
    • Y
      Fix cmake build on MacOS (#7205) · 5444942f
      Yuhong Guo 提交于
      Summary:
      1. `std::random_shuffle` is deprecated and now we can use `std::shuffle`
      ```
      /rocksdb/db/prefix_test.cc:590:12: error: 'random_shuffle<std::__1::__wrap_iter<unsigned long long *> >'
            is deprecated [-Werror,-Wdeprecated-declarations]
            std::random_shuffle(prefixes.begin(), prefixes.end());
                 ^
      /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/algorithm:2982:1: note:
            'random_shuffle<std::__1::__wrap_iter<unsigned long long *> >' has been explicitly marked deprecated here
      _LIBCPP_DEPRECATED_IN_CXX14 void
      ^
      /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__config:1107:39: note: expanded from macro
            '_LIBCPP_DEPRECATED_IN_CXX14'
      #  define _LIBCPP_DEPRECATED_IN_CXX14 _LIBCPP_DEPRECATED
                                            ^
      /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__config:1090:48: note: expanded from macro
            '_LIBCPP_DEPRECATED'
      #    define _LIBCPP_DEPRECATED __attribute__ ((deprecated))
      ```
      2. `c_test` link error with `-DROCKSDB_BUILD_SHARED=OFF`:
      ```
      [  7%] Linking CXX executable c_test
      ld: library not found for -lrocksdb-shared
      clang: error: linker command failed with exit code 1 (use -v to see invocation)
      make[5]: *** [c_test] Error 1
      make[4]: *** [CMakeFiles/c_test.dir/all] Error 2
      make[4]: *** Waiting for unfinished jobs....
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7205
      
      Reviewed By: ajkr
      
      Differential Revision: D23030641
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f270e50fc0b824ca1a0876ec5c65d33f55a72dd0
      5444942f
  11. 08 8月, 2020 1 次提交
  12. 06 8月, 2020 1 次提交
    • S
      Clean up InternalIterator upper bound logic a little bit (#7200) · 5c1a5441
      sdong 提交于
      Summary:
      IteratorIterator::IsOutOfBound() and IteratorIterator::MayBeOutOfUpperBound() are two functions that related to upper bound check. It is hard for users to reason about this complexity. Consolidate the two functions into one and assign an enum as results to improve readability.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7200
      
      Test Plan: Run all existing test. Would run crash test with atomic for a while.
      
      Reviewed By: anand1976
      
      Differential Revision: D22833181
      
      fbshipit-source-id: a0c724267056adbd0476bde74650e6c7226077e6
      5c1a5441
  13. 05 8月, 2020 1 次提交
    • S
      Fix a perf regression that caused every key to go through upper bound check (#7209) · 41c328fe
      sdong 提交于
      Summary:
      https://github.com/facebook/rocksdb/pull/5289 introduces a performance regression that caused an upper bound check within every BlockBasedTableIterator::Next(). This is unnecessary if we've checked the boundary key for current block and it is within upper bound.
      
      Fix the bug. Also rename the boolean to a enum so that the code is slightly better readable. The original regression was probably to fix a bug that the block upper bound check status is not reset after a new block is created. Fix it bug so that the regression can be avoided without hitting the bug.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7209
      
      Test Plan: Run all existing tests. Will run atomic black box crash test for a while.
      
      Reviewed By: anand1976
      
      Differential Revision: D22859246
      
      fbshipit-source-id: cbdad1f5e656c55fd8b71726d5a4f6cb53ff9140
      41c328fe
  14. 04 8月, 2020 1 次提交
    • A
      dedup ReadOptions in iterator hierarchy (#7210) · a4a4a2da
      Andrew Kryczka 提交于
      Summary:
      Previously, a `ReadOptions` object was stored in every `BlockBasedTableIterator`
      and every `LevelIterator`. This redundancy consumes extra memory,
      resulting in the `Arena` making more allocations, and iteration
      observing worse cache performance.
      
      This PR migrates callers of `NewInternalIterator()` and
      `MakeInputIterator()` to provide a `ReadOptions` object guaranteed to
      outlive the returned iterator. When the iterator's lifetime will be managed by the
      user, this lifetime guarantee is achieved by storing the `ReadOptions`
      value in `ArenaWrappedDBIter`. Then, sub-iterators of `NewInternalIterator()` and
      `MakeInputIterator()` can hold a reference-to-const `ReadOptions`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7210
      
      Test Plan:
      - `make check` under ASAN and valgrind
      - benchmark: on a DB with 2 L0 files and 3 L1+ levels, this PR reduced `Arena` allocation 4792 -> 4160 bytes.
      
      Reviewed By: anand1976
      
      Differential Revision: D22861323
      
      Pulled By: ajkr
      
      fbshipit-source-id: 54aebb3e89c872eeab0f5793b4b6e42878d093ce
      a4a4a2da
  15. 30 7月, 2020 1 次提交
    • S
      Implement NextAndGetResult() in memtable and level iterator (#7179) · 692f6a31
      sdong 提交于
      Summary:
      NextAndGetResult() is not implemented in memtable and is very simply implemented in level iterator. The result is that for a normal leveled iterator, performance regression will be observed for calling PrepareValue() for most iterator Next(). Mitigate the problem by implementing the function for both iterators. In level iterator, the implementation cannot be perfect as when calling file iterator's SeekToFirst() we don't have information about whether the value is prepared. Fortunately, the first key should not cause a big portion of the CPu.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7179
      
      Test Plan: Run normal crash test for a while.
      
      Reviewed By: anand1976
      
      Differential Revision: D22783840
      
      fbshipit-source-id: c19f45cdf21b756190adef97a3b66ccde3936e05
      692f6a31
  16. 23 7月, 2020 1 次提交
  17. 21 7月, 2020 1 次提交
    • A
      minimize BlockIter comparator scope (#7149) · 643c863b
      Andrew Kryczka 提交于
      Summary:
      PR https://github.com/facebook/rocksdb/issues/6944 transitioned `BlockIter` from using `Comparator*` to using
      concrete `UserComparatorWrapper` and `InternalKeyComparator`. However,
      adding them as instance variables to `BlockIter` was not optimal.
      Bloating `BlockIter` caused the `ArenaWrappedDBIter`'s arena allocator to do more heap
      allocations (in certain cases) which harmed performance of `DB::NewIterator()`. This PR
      pushes down the concrete comparator objects to the point of usage, which
      forces them to be on the stack. As a result, the `BlockIter` is back to
      its original size prior to https://github.com/facebook/rocksdb/issues/6944 (actually a bit smaller since there
      were two `Comparator*` before).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7149
      
      Test Plan:
      verified our internal `DB::NewIterator()`-heavy regression
      test no longer reports regression.
      
      Reviewed By: riversand963
      
      Differential Revision: D22623189
      
      Pulled By: ajkr
      
      fbshipit-source-id: f6d69accfe5de51e0bd9874a480b32b29909bab6
      643c863b
  18. 10 7月, 2020 2 次提交
    • M
      More Makefile Cleanup (#7097) · c7c7b07f
      mrambacher 提交于
      Summary:
      Cleans up some of the dependencies on test code in the Makefile while building tools:
      - Moves the test::RandomString, DBBaseTest::RandomString into Random
      - Moves the test::RandomHumanReadableString into Random
      - Moves the DestroyDir method into file_utils
      - Moves the SetupSyncPointsToMockDirectIO into sync_point.
      - Moves the FaultInjection Env and FS classes under env
      
      These changes allow all of the tools to build without dependencies on test_util, thereby simplifying the build dependencies.  By moving the FaultInjection code, the dependency in db_stress on different libraries for debug vs release was eliminated.
      
      Tested both release and debug builds via Make and CMake for both static and shared libraries.
      
      More work remains to clean up how the tools are built and remove some unnecessary dependencies.  There is also more work that should be done to get the Makefile and CMake to align in their builds -- what is in the libraries and the sizes of the executables are different.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7097
      
      Reviewed By: riversand963
      
      Differential Revision: D22463160
      
      Pulled By: pdillinger
      
      fbshipit-source-id: e19462b53324ab3f0b7c72459dbc73165cc382b2
      c7c7b07f
    • A
      save key comparisons in BlockIter::BinarySeek (#7068) · 82611ee2
      Andrew Kryczka 提交于
      Summary:
      This is a followup to https://github.com/facebook/rocksdb/issues/6646. In that PR, for simplicity I just appended a comparison against the 0th restart key in case `BinarySeek()`'s binary search landed at index 0. As a result there were `2/(N+1) + log_2(N)` key comparisons. This PR does it differently. Now we expand the binary search range by one so it also covers the case where target is at or before the restart key at index 0. As a result, it involves `log_2(N+1)` key comparisons.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7068
      
      Test Plan:
      ran readrandom with mostly default settings and counted key comparisons
      using `PerfContext`.
      
      before: `user_key_comparison_count = 28881965`
      after: `user_key_comparison_count = 27823245`
      
      setup command:
      
      ```
      $ TEST_TMPDIR=/dev/shm/dbbench ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -max_background_jobs=12 -level_compaction_dynamic_level_bytes=true -num=10000000
      ```
      
      benchmark command:
      
      ```
      $ TEST_TMPDIR=/dev/shm/dbbench/ ./db_bench -use_existing_db=true -benchmarks=readrandom -disable_auto_compactions=true -num=10000000 -compression_type=none -reads=1000000 -perf_level=3
      ```
      
      Reviewed By: anand1976
      
      Differential Revision: D22357032
      
      Pulled By: ajkr
      
      fbshipit-source-id: 8b01e9c1c2a4e9d02fc9dfe16c1cc0327f8bdf24
      82611ee2
  19. 09 7月, 2020 3 次提交
    • Z
      Fix GetFileDbIdentities (#7104) · b35a2f91
      Zitan Chen 提交于
      Summary:
      Although PR https://github.com/facebook/rocksdb/issues/7032 fixes the construction of the `SstFileDumper` in `GetFileDbIdentities` by setting a proper `Env` of the `Options` passed in the constructor, the file path was not corrected accordingly. This actually disables backup engine to use db session ids in the file names since the `db_session_id` is always empty.
      
      Now it is fixed by setting the correct path in the construction of `SstFileDumper`. Furthermore, to preserve the Direct IO property that backup engine already has, parameter `EnvOptions` is added to `GetFileDbIdentities` and `SstFileDumper`.
      
      The `BackupUsingDirectIO` test is updated accordingly.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7104
      
      Test Plan: backupable_db_test and some manual tests.
      
      Reviewed By: ajkr
      
      Differential Revision: D22443245
      
      Pulled By: gg814
      
      fbshipit-source-id: 056a9bb8b82947c5e73d7c3fbb62bfe23af5e562
      b35a2f91
    • A
      Update Flush policy in PartitionedIndexBuilder on switching from user-key to... · 54f171fe
      Akanksha Mahajan 提交于
      Update Flush policy in PartitionedIndexBuilder on switching from user-key to internal-key mode (#7096)
      
      Summary:
      When format_version is high enough to support user-key and
      there are index entries for same user key that spans multiple data
      blocks then it changes from user-key mode to internal-key mode. But the
      flush policy is not reset to point to Block Builder of internal-keys.
      After this switch, no entries are added to user key index partition
      result, thus it never triggers flushing the block.
      
      Fix: 1. After adding the entry in sub_builder_index_, if there is a switch
      from user-key to internal-key, then flush policy is updated to point to
      Block Builder of internal-keys index partition.
      2. Set sub_builder_index_->seperator_is_key_plus_seq_ = true if
      seperator_is_key_plus_seq_  is set to true so that subsequent partitions
      can also use internal key mode.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7096
      
      Test Plan: make check -j64
      
      Reviewed By: ajkr
      
      Differential Revision: D22416598
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 01fc2dc07ea1b32f8fb803995ebe6e9a3fbe67ac
      54f171fe
    • R
      Fixed Factory construct just for calling .Name() (#7080) · b649d8cb
      rockeet 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7080
      
      Reviewed By: riversand963
      
      Differential Revision: D22412352
      
      Pulled By: ajkr
      
      fbshipit-source-id: 1d7f4c1621040a0130245139b52c3f4d3deac865
      b649d8cb
  20. 08 7月, 2020 1 次提交
    • A
      Separate internal and user key comparators in `BlockIter` (#6944) · dd29ad42
      Andrew Kryczka 提交于
      Summary:
      Replace `BlockIter::comparator_` and `IndexBlockIter::user_comparator_wrapper_` with a concrete `UserComparatorWrapper` and `InternalKeyComparator`. The motivation for this change was the inconvenience of not knowing the concrete type of `BlockIter::comparator_`, which prevented calling specialized internal key comparison functions to optimize comparison of keys with global seqno applied.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6944
      
      Test Plan:
      benchmark setup -- single file DBs, in-memory, no compression. "normal_db"
      created by regular flush; "ingestion_db" created by ingesting a file. Both
      DBs have same contents.
      
      ```
      $ TEST_TMPDIR=/dev/shm/normal_db/ ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=10485760000 -disable_auto_compactions=true -compression_type=none -num=1000000
      $ ./ldb write_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ --compression_type=no --hex --create_if_missing < <(./sst_dump --command=scan --output_hex --file=/dev/shm/normal_db/dbbench/000007.sst | awk 'began {print "0x" substr($1, 2, length($1) - 2), "==>", "0x" $5} ; /^Sst file format: block-based/ {began=1}')
      $ ./ldb ingest_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/
      ```
      
      benchmark run command:
      
      ```
      $ TEST_TMPDIR=/dev/shm/$DB/ ./db_bench -benchmarks=seekrandom -seek_nexts=$SEEK_NEXT -use_existing_db=true -cache_index_and_filter_blocks=false -num=1000000 -cache_size=0 -threads=1 -reads=200000000 -mmap_read=1 -verify_checksum=false
      ```
      
      results: perf improved marginally for ingestion_db and did not change significantly for normal_db:
      
      SEEK_NEXT | DB | code | ops/sec | % change
      -- | -- | -- | -- | --
      0 | normal_db | master | 350880 |  
      0 | normal_db | PR6944 | 351040 | 0.0
      0 | ingestion_db | master | 343255 |  
      0 | ingestion_db | PR6944 | 349424 | 1.8
      10 | normal_db | master | 218711 |  
      10 | normal_db | PR6944 | 217892 | -0.4
      10 | ingestion_db | master | 220334 |  
      10 | ingestion_db | PR6944 | 226437 | 2.8
      
      Reviewed By: pdillinger
      
      Differential Revision: D21924676
      
      Pulled By: ajkr
      
      fbshipit-source-id: ea4288a2eefa8112eb6c651a671c1de18c12e538
      dd29ad42
  21. 03 7月, 2020 1 次提交
  22. 02 7月, 2020 1 次提交
    • A
      Update Flush policy in PartitionedIndexBuilder on switching from user-key to... · 5edfe3a3
      Akanksha Mahajan 提交于
      Update Flush policy in PartitionedIndexBuilder on switching from user-key to internal-key mode (#7022)
      
      Summary:
      When format_version is high enough to support user-key and there are index entries for same user key that spans multiple data blocks then it changes from user-key mode to internal-key mode. But the flush policy is not reset to point to Block Builder of internal-keys. After this switch, no entries are added to user key index partition result, thus it never triggers flushing the block.
      
      Fix: After adding the entry in sub_builder_index_, if there is a switch from user-key to internal-key, then flush policy is updated to point to Block Builder of internal-keys index partition.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7022
      
      Test Plan:
      1. make check -j64
                 2. Added one unit test case
      
      Reviewed By: ajkr
      
      Differential Revision: D22197734
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: d87e9e46bccab8e896ee6979d6b79c51f73d479e
      5edfe3a3
  23. 01 7月, 2020 1 次提交
    • A
      Skip unnecessary allocation for mmap reads under 5000 bytes (#7043) · 8458532d
      Andrew Kryczka 提交于
      Summary:
      With mmap enabled on an uncompressed file, we were previously always doing a heap allocation to obtain the scratch buffer for `RandomAccessFileReader::Read()`. However, that allocation was unnecessary as the underlying file reader returned a pointer into its mapped memory, not the provided scratch buffer. This PR makes passes the `BlockFetcher`'s inline buffer as the scratch buffer if the data block is small enough (less than `kDefaultStackBufferSize` bytes, currently 5000). Ideally we would not pass a scratch buffer at all for an mmap read; however, the `RandomAccessFile::Read()` API guarantees such a buffer is provided, and non-standard implementations may be relying on it even when `Options::allow_mmap_reads == true`. In that case, this PR still works but introduces an extra copy from the inline buffer to a heap buffer.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7043
      
      Reviewed By: cheng-chang
      
      Differential Revision: D22320606
      
      Pulled By: ajkr
      
      fbshipit-source-id: ad964dd23df34e07d979c6032c2dfe5454c98b52
      8458532d
  24. 30 6月, 2020 1 次提交
    • A
      Extend Get/MultiGet deadline support to table open (#6982) · 9a5886bd
      Anand Ananthabhotla 提交于
      Summary:
      Current implementation of the ```read_options.deadline``` option only checks the deadline for random file reads during point lookups. This PR extends the checks to file opens, prefetches and preloads as part of table open.
      
      The main changes are in the ```BlockBasedTable```, partitioned index and filter readers, and ```TableCache``` to take ReadOptions as an additional parameter. In ```BlockBasedTable::Open```, in order to retain existing behavior w.r.t checksum verification and block cache usage, we filter out most of the options in ```ReadOptions``` except ```deadline```. However, having the ```ReadOptions``` gives us more flexibility to honor other options like verify_checksums, fill_cache etc. in the future.
      
      Additional changes in callsites due to function signature changes in ```NewTableReader()``` and ```FilePrefetchBuffer```.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6982
      
      Test Plan: Add new unit tests in db_basic_test
      
      Reviewed By: riversand963
      
      Differential Revision: D22219515
      
      Pulled By: anand1976
      
      fbshipit-source-id: 8a3b92f4a889808013838603aa3ca35229cd501b
      9a5886bd
  25. 27 6月, 2020 1 次提交
    • S
      Add unity build to CircleCI (#7026) · f9817201
      sdong 提交于
      Summary:
      We are still keeping unity build working. So it's a good idea to add to a pre-commit CI.
      A latest GCC docker image just to get a little bit more coverage. Fix three small issues to make it pass.
      Also make unity_test to run db_basic_test rather than db_test to cut the test time. There is no point to run expensive tests here. It was set to run db_test before db_basic_test was separated out.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7026
      
      Test Plan: watch tests to pass.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22223197
      
      fbshipit-source-id: baa3b6cbb623bf359829b63ce35715c75bcb0ed4
      f9817201
  26. 25 6月, 2020 2 次提交
    • Z
      Add a new option for BackupEngine to store table files under shared_checksum... · be41c61f
      Zitan Chen 提交于
      Add a new option for BackupEngine to store table files under shared_checksum using DB session id in the backup filenames (#6997)
      
      Summary:
      `BackupableDBOptions::new_naming_for_backup_files` is added. This option is false by default. When it is true, backup table filenames under directory shared_checksum are of the form `<file_number>_<crc32c>_<db_session_id>.sst`.
      
      Note that when this option is true, it comes into effect only when both `share_files_with_checksum` and `share_table_files` are true.
      
      Three new test cases are added.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6997
      
      Test Plan: Passed make check.
      
      Reviewed By: ajkr
      
      Differential Revision: D22098895
      
      Pulled By: gg814
      
      fbshipit-source-id: a1d9145e7fe562d71cde7ac995e17cb24fd42e76
      be41c61f
    • S
      Test CircleCI with CLANG-10 (#7025) · 9cc25190
      sdong 提交于
      Summary:
      It's useful to build RocksDB using a more recent clang version in CI. Add a CircleCI build and fix some issues with it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7025
      
      Test Plan: See all tests pass.
      
      Reviewed By: pdillinger
      
      Differential Revision: D22215700
      
      fbshipit-source-id: 914a729c2cd3f3ac4a627cc0ac58d4691dca2168
      9cc25190
  27. 23 6月, 2020 1 次提交
    • P
      Minimize memory internal fragmentation for Bloom filters (#6427) · 5b2bbacb
      Peter Dillinger 提交于
      Summary:
      New experimental option BBTO::optimize_filters_for_memory builds
      filters that maximize their use of "usable size" from malloc_usable_size,
      which is also used to compute block cache charges.
      
      Rather than always "rounding up," we track state in the
      BloomFilterPolicy object to mix essentially "rounding down" and
      "rounding up" so that the average FP rate of all generated filters is
      the same as without the option. (YMMV as heavily accessed filters might
      be unluckily lower accuracy.)
      
      Thus, the option near-minimizes what the block cache considers as
      "memory used" for a given target Bloom filter false positive rate and
      Bloom filter implementation. There are no forward or backward
      compatibility issues with this change, though it only works on the
      format_version=5 Bloom filter.
      
      With Jemalloc, we see about 10% reduction in memory footprint (and block
      cache charge) for Bloom filters, but 1-2% increase in storage footprint,
      due to encoding efficiency losses (FP rate is non-linear with bits/key).
      
      Why not weighted random round up/down rather than state tracking? By
      only requiring malloc_usable_size, we don't actually know what the next
      larger and next smaller usable sizes for the allocator are. We pick a
      requested size, accept and use whatever usable size it has, and use the
      difference to inform our next choice. This allows us to narrow in on the
      right balance without tracking/predicting usable sizes.
      
      Why not weight history of generated filter false positive rates by
      number of keys? This could lead to excess skew in small filters after
      generating a large filter.
      
      Results from filter_bench with jemalloc (irrelevant details omitted):
      
          (normal keys/filter, but high variance)
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9
          Build avg ns/key: 29.6278
          Number of filters: 5516
          Total size (MB): 200.046
          Reported total allocated memory (MB): 220.597
          Reported internal fragmentation: 10.2732%
          Bits/key stored: 10.0097
          Average FP rate %: 0.965228
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
          Build avg ns/key: 30.5104
          Number of filters: 5464
          Total size (MB): 200.015
          Reported total allocated memory (MB): 200.322
          Reported internal fragmentation: 0.153709%
          Bits/key stored: 10.1011
          Average FP rate %: 0.966313
      
          (very few keys / filter, optimization not as effective due to ~59 byte
           internal fragmentation in blocked Bloom filter representation)
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9
          Build avg ns/key: 29.5649
          Number of filters: 162950
          Total size (MB): 200.001
          Reported total allocated memory (MB): 224.624
          Reported internal fragmentation: 12.3117%
          Bits/key stored: 10.2951
          Average FP rate %: 0.821534
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
          Build avg ns/key: 31.8057
          Number of filters: 159849
          Total size (MB): 200
          Reported total allocated memory (MB): 208.846
          Reported internal fragmentation: 4.42297%
          Bits/key stored: 10.4948
          Average FP rate %: 0.811006
      
          (high keys/filter)
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9
          Build avg ns/key: 29.7017
          Number of filters: 164
          Total size (MB): 200.352
          Reported total allocated memory (MB): 221.5
          Reported internal fragmentation: 10.5552%
          Bits/key stored: 10.0003
          Average FP rate %: 0.969358
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
          Build avg ns/key: 30.7131
          Number of filters: 160
          Total size (MB): 200.928
          Reported total allocated memory (MB): 200.938
          Reported internal fragmentation: 0.00448054%
          Bits/key stored: 10.1852
          Average FP rate %: 0.963387
      
      And from db_bench (block cache) with jemalloc:
      
          $ ./db_bench -db=/dev/shm/dbbench.no_optimize -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false
          $ ./db_bench -db=/dev/shm/dbbench -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -optimize_filters_for_memory -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false
          $ (for FILE in /dev/shm/dbbench.no_optimize/*.sst; do ./sst_dump --file=$FILE --show_properties | grep 'filter block' ; done) | awk '{ t += $4; } END { print t; }'
          17063835
          $ (for FILE in /dev/shm/dbbench/*.sst; do ./sst_dump --file=$FILE --show_properties | grep 'filter block' ; done) | awk '{ t += $4; } END { print t; }'
          17430747
          $ #^ 2.1% additional filter storage
          $ ./db_bench -db=/dev/shm/dbbench.no_optimize -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000
          rocksdb.block.cache.index.add COUNT : 33
          rocksdb.block.cache.index.bytes.insert COUNT : 8440400
          rocksdb.block.cache.filter.add COUNT : 33
          rocksdb.block.cache.filter.bytes.insert COUNT : 21087528
          rocksdb.bloom.filter.useful COUNT : 4963889
          rocksdb.bloom.filter.full.positive COUNT : 1214081
          rocksdb.bloom.filter.full.true.positive COUNT : 1161999
          $ #^ 1.04 % observed FP rate
          $ ./db_bench -db=/dev/shm/dbbench -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -optimize_filters_for_memory -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000
          rocksdb.block.cache.index.add COUNT : 33
          rocksdb.block.cache.index.bytes.insert COUNT : 8448592
          rocksdb.block.cache.filter.add COUNT : 33
          rocksdb.block.cache.filter.bytes.insert COUNT : 18220328
          rocksdb.bloom.filter.useful COUNT : 5360933
          rocksdb.bloom.filter.full.positive COUNT : 1321315
          rocksdb.bloom.filter.full.true.positive COUNT : 1262999
          $ #^ 1.08 % observed FP rate, 13.6% less memory usage for filters
      
      (Due to specific key density, this example tends to generate filters that are "worse than average" for internal fragmentation. "Better than average" cases can show little or no improvement.)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6427
      
      Test Plan: unit test added, 'make check' with gcc, clang and valgrind
      
      Reviewed By: siying
      
      Differential Revision: D22124374
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f3e3aa152f9043ddf4fae25799e76341d0d8714e
      5b2bbacb
  28. 20 6月, 2020 1 次提交
    • P
      Fix block checksum for >=4GB, refactor (#6978) · 25a0d0ca
      Peter Dillinger 提交于
      Summary:
      Although RocksDB falls over in various other ways with KVs
      around 4GB or more, this change fixes how XXH32 and XXH64 were being
      called by the block checksum code to support >= 4GB in case that should
      ever happen, or the code copied for other uses.
      
      This change is not a schema compatibility issue because the checksum
      verification code would checksum the first (block_size + 1) mod 2^32
      bytes while the checksum construction code would checksum the first
      block_size mod 2^32 plus the compression type byte, meaning the
      XXH32/64 checksums for >=4GB block would not match about 255/256 times.
      
      While touching this code, I refactored to consolidate redundant
      implementations, improving diagnostics and performance tracking in some
      cases. Also used less confusing language in those diagnostics.
      
      Makes https://github.com/facebook/rocksdb/issues/6875 obsolete.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6978
      
      Test Plan:
      I was able to write a test for this using an SST file writer
      and VerifyChecksum in a reader. The test fails before the fix, though
      I'm leaving the test disabled because I don't think it's worth the
      expense of running regularly.
      
      Reviewed By: gg814
      
      Differential Revision: D22143260
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 982993d16134e8c50bea2269047f901c1783726e
      25a0d0ca
  29. 18 6月, 2020 2 次提交
    • S
      Fix the bug that compressed cache is disabled in read-only DBs (#6990) · 223b57ee
      sdong 提交于
      Summary:
      Compressed block cache is disabled in https://github.com/facebook/rocksdb/pull/4650 for no good reason. Re-enable it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6990
      
      Test Plan: Add a unit test to make sure a general function works with read-only DB + compressed block cache.
      
      Reviewed By: ltamasi
      
      Differential Revision: D22072755
      
      fbshipit-source-id: 2a55df6363de23a78979cf6c747526359e5dc7a1
      223b57ee
    • Z
      Store DB identity and DB session ID in SST files (#6983) · 94d04529
      Zitan Chen 提交于
      Summary:
      `db_id` and `db_session_id` are now part of the table properties for all formats and stored in SST files. This adds about 99 bytes to each new SST file.
      
      The `TablePropertiesNames` for these two identifiers are `rocksdb.creating.db.identity` and `rocksdb.creating.session.identity`.
      
      In addition, SST files generated from SstFileWriter and Repairer have DB identity “SST Writer” and “DB Repairer”, respectively. Their DB session IDs are generated in the same way as `DB::GetDbSessionId`.
      
      A table property test is added.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6983
      
      Test Plan: make check and some manual tests.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22048826
      
      Pulled By: gg814
      
      fbshipit-source-id: afdf8c11424a6f509b5c0b06dafad584a80103c9
      94d04529
  30. 16 6月, 2020 1 次提交
    • L
      Fix uninitialized memory read in table_test (#6980) · aa8f1331
      Levi Tamasi 提交于
      Summary:
      When using parameterized tests, `gtest` sometimes prints the test
      parameters. If no other printing method is available, it essentially
      produces a hex dump of the object. This can cause issues with valgrind
      with types like `TestArgs` in `table_test`, where the object layout has
      gaps (with uninitialized contents) due to the members' alignment
      requirements. The patch fixes the uninitialized reads by providing an
      `operator<<` for `TestArgs` and also makes sure all members are
      initialized (in a consistent order) on all code paths.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6980
      
      Test Plan: `valgrind --leak-check=full ./table_test`
      
      Reviewed By: siying
      
      Differential Revision: D22045536
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 6f5920ac28c712d0aa88162fffb80172ed769c32
      aa8f1331
  31. 14 6月, 2020 1 次提交
    • Z
      Fix persistent cache on windows (#6932) · 9c24a5cb
      Zhen Li 提交于
      Summary:
      Persistent cache feature caused rocks db crash on windows. I posted a issue for it, https://github.com/facebook/rocksdb/issues/6919. I found this is because no "persistent_cache_key_prefix" is generated for persistent cache. Looking repo history, "GetUniqueIdFromFile" is not implemented on Windows. So my fix is adding "NewId()" function in "persistent_cache" and using it to generate prefix for persistent cache. In this PR, i also re-enable related test cases defined in "db_test2" and "persistent_cache_test" for windows.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6932
      
      Test Plan:
      1. run related test cases in "db_test2" and "persistent_cache_test" on windows and see it passed.
      2. manually run db_bench.exe with "read_cache_path" and verified.
      
      Reviewed By: riversand963
      
      Differential Revision: D21911608
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: cdfd938d54a385edbb2836b13aaa1d39b0a6f1c2
      9c24a5cb
  32. 13 6月, 2020 1 次提交
    • L
      Turn HarnessTest in table_test into a parameterized test (#6974) · bacd6edc
      Levi Tamasi 提交于
      Summary:
      `HarnessTest` in `table_test.cc` currently tests many parameter
      combinations sequentially in a loop. This is problematic from
      a testing perspective, since if the test fails, we have no way of
      knowing how many/which combinations have failed. It can also cause timeouts on
      our test system due to the sheer number of combinations tested.
      (Specifically, the parallel compression threads parameter added by
      https://github.com/facebook/rocksdb/pull/6262 seems to have been the last straw.)
      There is some DIY code there that splits the load among eight test cases
      but that does not appear to be sufficient anymore.
      
      Instead, the patch turns `HarnessTest` into a parameterized test, so all the
      parameter combinations can be tested separately and potentially
      concurrently. It also cleans up the tests a little, fixes
      `RandomizedLongDB`, which did not get updated when the parallel
      compression threads parameter was added, and turns `FooterTests` into a
      standalone test case (since it does not actually need a fixture class).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6974
      
      Test Plan: `make check`
      
      Reviewed By: siying
      
      Differential Revision: D22029572
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 51baea670771c33928f2eb3902bd69dcf540aa41
      bacd6edc
  33. 11 6月, 2020 1 次提交
    • A
      save a key comparison in block seeks (#6646) · e6be168a
      Andrew Kryczka 提交于
      Summary:
      This saves up to two key comparisons in block seeks. The first key
      comparison saved is a redundant key comparison against the restart key
      where the linear scan starts. This comparison is saved in all cases
      except when the found key is in the first restart interval. The
      second key comparison saved is a redundant key comparison against the
      restart key where the linear scan ends. This is only saved in cases
      where all keys in the restart interval are less than the target
      (probability roughly `1/restart_interval`).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6646
      
      Test Plan:
      ran a benchmark with mostly default settings and counted key comparisons
      
      before: `user_key_comparison_count = 19399529`
      after: `user_key_comparison_count = 18431498`
      
      setup command:
      
      ```
      $ TEST_TMPDIR=/dev/shm/dbbench ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -max_background_jobs=12 -level_compaction_dynamic_level_bytes=true -num=10000000
      ```
      
      benchmark command:
      
      ```
      $ TEST_TMPDIR=/dev/shm/dbbench/ ./db_bench -use_existing_db=true -benchmarks=readrandom -disable_auto_compactions=true -num=10000000 -compression_type=none -reads=1000000 -perf_level=3
      ```
      
      Reviewed By: pdillinger
      
      Differential Revision: D20849707
      
      Pulled By: ajkr
      
      fbshipit-source-id: 1f01c5cd99ea771fd27974046e37b194f1cdcfac
      e6be168a
  34. 10 6月, 2020 1 次提交