1. 09 9月, 2020 1 次提交
    • J
      Fix compile error for old gcc-4.8 (#7358) · 8a8a01c6
      Jay Zhuang 提交于
      Summary:
      gcc-4.8 returns error when using the constructor. Not sure if it's a compiler bug/limitation or code issue:
      ```
      table/block_based/block_based_table_reader.cc:3183:67: error: use of deleted function ‘rocksdb::WritableFileStringStreamAdapter::WritableFileStringStreamAdapter(rocksdb::WritableFileStringStreamAdapter&&)’
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7358
      
      Reviewed By: pdillinger
      
      Differential Revision: D23577651
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: b0197e3d3538da61a6f3866410d88d2047fb9695
      8a8a01c6
  2. 05 9月, 2020 1 次提交
  3. 04 9月, 2020 1 次提交
    • A
      fix SstFileWriter with dictionary compression (#7323) · af54c409
      Andrew Kryczka 提交于
      Summary:
      In block-based table builder, the cut-over from buffered to unbuffered
      mode involves sampling the buffered blocks and generating a dictionary.
      There was a bug where `SstFileWriter` passed zero as the `target_file_size`
      causing the cutover to happen immediately, so there were no samples
      available for generating the dictionary.
      
      This PR changes the meaning of `target_file_size == 0` to mean buffer
      the whole file before cutting over. It also adds dictionary compression
      support to `sst_dump --command=recompress` for easy evaluation.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7323
      
      Reviewed By: cheng-chang
      
      Differential Revision: D23412158
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3b232050e70ef3c2ee85a4b5f6fadb139c569873
      af54c409
  4. 03 9月, 2020 1 次提交
  5. 28 8月, 2020 1 次提交
  6. 26 8月, 2020 1 次提交
    • S
      Get() to fail with underlying failures in PartitionIndexReader::CacheDependencies() (#7297) · 722814e3
      sdong 提交于
      Summary:
      Right now all I/O failures under PartitionIndexReader::CacheDependencies() is swallowed. This doesn't impact correctness but we've made a decision that any I/O error in read path now should be returned to users for awareness. Return errors in those cases instead.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7297
      
      Test Plan: Add a new unit test that ingest errors in this code path and see Get() fails. Only one I/O path is hit in PartitionIndexReader::CacheDependencies(). Several option changes are attempt but not able to got other pread paths triggered. Not sure whether other failure cases would be even possible. Would rely on continuous stress test to validate it.
      
      Reviewed By: anand1976
      
      Differential Revision: D23257950
      
      fbshipit-source-id: 859dbc92fa239996e1bb378329344d3d54168c03
      722814e3
  7. 21 8月, 2020 1 次提交
  8. 13 8月, 2020 1 次提交
    • L
      Clean up CompressBlock/CompressBlockInternal a bit (#7249) · 9d6f48ec
      Levi Tamasi 提交于
      Summary:
      The patch cleans up and refactors `CompressBlock` and `CompressBlockInternal` a bit.
      In particular, it does the following:
      * It renames `CompressBlockInternal` to `CompressData` and moves it to `util/compression.h`,
      where other general compression-related utilities are located. This will facilitate reuse in the
      BlobDB write path.
      * The signature of the method is changed so it now takes `compression_format_version`
      (similarly to the compression library specific methods) instead of `format_version` (which is
      specific to the block based table).
      * `GetCompressionFormatForVersion` no longer takes `compression_type` as a parameter.
      This parameter was only used in a (not entirely up-to-date) assertion; also, removing it
      eliminates the need to ensure this precondition holds at all call sites.
      * Does some minor cleanup in `CompressBlock`, for instance, it is now possible to pass
      only one of `sampled_output_fast` and `sampled_output_slow`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7249
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D23087278
      
      Pulled By: ltamasi
      
      fbshipit-source-id: e6316e45baed8b4e7de7c1780c90501c2a3439b3
      9d6f48ec
  9. 08 8月, 2020 1 次提交
  10. 06 8月, 2020 1 次提交
    • S
      Clean up InternalIterator upper bound logic a little bit (#7200) · 5c1a5441
      sdong 提交于
      Summary:
      IteratorIterator::IsOutOfBound() and IteratorIterator::MayBeOutOfUpperBound() are two functions that related to upper bound check. It is hard for users to reason about this complexity. Consolidate the two functions into one and assign an enum as results to improve readability.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7200
      
      Test Plan: Run all existing test. Would run crash test with atomic for a while.
      
      Reviewed By: anand1976
      
      Differential Revision: D22833181
      
      fbshipit-source-id: a0c724267056adbd0476bde74650e6c7226077e6
      5c1a5441
  11. 05 8月, 2020 1 次提交
    • S
      Fix a perf regression that caused every key to go through upper bound check (#7209) · 41c328fe
      sdong 提交于
      Summary:
      https://github.com/facebook/rocksdb/pull/5289 introduces a performance regression that caused an upper bound check within every BlockBasedTableIterator::Next(). This is unnecessary if we've checked the boundary key for current block and it is within upper bound.
      
      Fix the bug. Also rename the boolean to a enum so that the code is slightly better readable. The original regression was probably to fix a bug that the block upper bound check status is not reset after a new block is created. Fix it bug so that the regression can be avoided without hitting the bug.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7209
      
      Test Plan: Run all existing tests. Will run atomic black box crash test for a while.
      
      Reviewed By: anand1976
      
      Differential Revision: D22859246
      
      fbshipit-source-id: cbdad1f5e656c55fd8b71726d5a4f6cb53ff9140
      41c328fe
  12. 04 8月, 2020 1 次提交
    • A
      dedup ReadOptions in iterator hierarchy (#7210) · a4a4a2da
      Andrew Kryczka 提交于
      Summary:
      Previously, a `ReadOptions` object was stored in every `BlockBasedTableIterator`
      and every `LevelIterator`. This redundancy consumes extra memory,
      resulting in the `Arena` making more allocations, and iteration
      observing worse cache performance.
      
      This PR migrates callers of `NewInternalIterator()` and
      `MakeInputIterator()` to provide a `ReadOptions` object guaranteed to
      outlive the returned iterator. When the iterator's lifetime will be managed by the
      user, this lifetime guarantee is achieved by storing the `ReadOptions`
      value in `ArenaWrappedDBIter`. Then, sub-iterators of `NewInternalIterator()` and
      `MakeInputIterator()` can hold a reference-to-const `ReadOptions`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7210
      
      Test Plan:
      - `make check` under ASAN and valgrind
      - benchmark: on a DB with 2 L0 files and 3 L1+ levels, this PR reduced `Arena` allocation 4792 -> 4160 bytes.
      
      Reviewed By: anand1976
      
      Differential Revision: D22861323
      
      Pulled By: ajkr
      
      fbshipit-source-id: 54aebb3e89c872eeab0f5793b4b6e42878d093ce
      a4a4a2da
  13. 21 7月, 2020 1 次提交
    • A
      minimize BlockIter comparator scope (#7149) · 643c863b
      Andrew Kryczka 提交于
      Summary:
      PR https://github.com/facebook/rocksdb/issues/6944 transitioned `BlockIter` from using `Comparator*` to using
      concrete `UserComparatorWrapper` and `InternalKeyComparator`. However,
      adding them as instance variables to `BlockIter` was not optimal.
      Bloating `BlockIter` caused the `ArenaWrappedDBIter`'s arena allocator to do more heap
      allocations (in certain cases) which harmed performance of `DB::NewIterator()`. This PR
      pushes down the concrete comparator objects to the point of usage, which
      forces them to be on the stack. As a result, the `BlockIter` is back to
      its original size prior to https://github.com/facebook/rocksdb/issues/6944 (actually a bit smaller since there
      were two `Comparator*` before).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7149
      
      Test Plan:
      verified our internal `DB::NewIterator()`-heavy regression
      test no longer reports regression.
      
      Reviewed By: riversand963
      
      Differential Revision: D22623189
      
      Pulled By: ajkr
      
      fbshipit-source-id: f6d69accfe5de51e0bd9874a480b32b29909bab6
      643c863b
  14. 10 7月, 2020 2 次提交
    • M
      More Makefile Cleanup (#7097) · c7c7b07f
      mrambacher 提交于
      Summary:
      Cleans up some of the dependencies on test code in the Makefile while building tools:
      - Moves the test::RandomString, DBBaseTest::RandomString into Random
      - Moves the test::RandomHumanReadableString into Random
      - Moves the DestroyDir method into file_utils
      - Moves the SetupSyncPointsToMockDirectIO into sync_point.
      - Moves the FaultInjection Env and FS classes under env
      
      These changes allow all of the tools to build without dependencies on test_util, thereby simplifying the build dependencies.  By moving the FaultInjection code, the dependency in db_stress on different libraries for debug vs release was eliminated.
      
      Tested both release and debug builds via Make and CMake for both static and shared libraries.
      
      More work remains to clean up how the tools are built and remove some unnecessary dependencies.  There is also more work that should be done to get the Makefile and CMake to align in their builds -- what is in the libraries and the sizes of the executables are different.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7097
      
      Reviewed By: riversand963
      
      Differential Revision: D22463160
      
      Pulled By: pdillinger
      
      fbshipit-source-id: e19462b53324ab3f0b7c72459dbc73165cc382b2
      c7c7b07f
    • A
      save key comparisons in BlockIter::BinarySeek (#7068) · 82611ee2
      Andrew Kryczka 提交于
      Summary:
      This is a followup to https://github.com/facebook/rocksdb/issues/6646. In that PR, for simplicity I just appended a comparison against the 0th restart key in case `BinarySeek()`'s binary search landed at index 0. As a result there were `2/(N+1) + log_2(N)` key comparisons. This PR does it differently. Now we expand the binary search range by one so it also covers the case where target is at or before the restart key at index 0. As a result, it involves `log_2(N+1)` key comparisons.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7068
      
      Test Plan:
      ran readrandom with mostly default settings and counted key comparisons
      using `PerfContext`.
      
      before: `user_key_comparison_count = 28881965`
      after: `user_key_comparison_count = 27823245`
      
      setup command:
      
      ```
      $ TEST_TMPDIR=/dev/shm/dbbench ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -max_background_jobs=12 -level_compaction_dynamic_level_bytes=true -num=10000000
      ```
      
      benchmark command:
      
      ```
      $ TEST_TMPDIR=/dev/shm/dbbench/ ./db_bench -use_existing_db=true -benchmarks=readrandom -disable_auto_compactions=true -num=10000000 -compression_type=none -reads=1000000 -perf_level=3
      ```
      
      Reviewed By: anand1976
      
      Differential Revision: D22357032
      
      Pulled By: ajkr
      
      fbshipit-source-id: 8b01e9c1c2a4e9d02fc9dfe16c1cc0327f8bdf24
      82611ee2
  15. 09 7月, 2020 1 次提交
    • A
      Update Flush policy in PartitionedIndexBuilder on switching from user-key to... · 54f171fe
      Akanksha Mahajan 提交于
      Update Flush policy in PartitionedIndexBuilder on switching from user-key to internal-key mode (#7096)
      
      Summary:
      When format_version is high enough to support user-key and
      there are index entries for same user key that spans multiple data
      blocks then it changes from user-key mode to internal-key mode. But the
      flush policy is not reset to point to Block Builder of internal-keys.
      After this switch, no entries are added to user key index partition
      result, thus it never triggers flushing the block.
      
      Fix: 1. After adding the entry in sub_builder_index_, if there is a switch
      from user-key to internal-key, then flush policy is updated to point to
      Block Builder of internal-keys index partition.
      2. Set sub_builder_index_->seperator_is_key_plus_seq_ = true if
      seperator_is_key_plus_seq_  is set to true so that subsequent partitions
      can also use internal key mode.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7096
      
      Test Plan: make check -j64
      
      Reviewed By: ajkr
      
      Differential Revision: D22416598
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 01fc2dc07ea1b32f8fb803995ebe6e9a3fbe67ac
      54f171fe
  16. 08 7月, 2020 1 次提交
    • A
      Separate internal and user key comparators in `BlockIter` (#6944) · dd29ad42
      Andrew Kryczka 提交于
      Summary:
      Replace `BlockIter::comparator_` and `IndexBlockIter::user_comparator_wrapper_` with a concrete `UserComparatorWrapper` and `InternalKeyComparator`. The motivation for this change was the inconvenience of not knowing the concrete type of `BlockIter::comparator_`, which prevented calling specialized internal key comparison functions to optimize comparison of keys with global seqno applied.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6944
      
      Test Plan:
      benchmark setup -- single file DBs, in-memory, no compression. "normal_db"
      created by regular flush; "ingestion_db" created by ingesting a file. Both
      DBs have same contents.
      
      ```
      $ TEST_TMPDIR=/dev/shm/normal_db/ ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=10485760000 -disable_auto_compactions=true -compression_type=none -num=1000000
      $ ./ldb write_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ --compression_type=no --hex --create_if_missing < <(./sst_dump --command=scan --output_hex --file=/dev/shm/normal_db/dbbench/000007.sst | awk 'began {print "0x" substr($1, 2, length($1) - 2), "==>", "0x" $5} ; /^Sst file format: block-based/ {began=1}')
      $ ./ldb ingest_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/
      ```
      
      benchmark run command:
      
      ```
      $ TEST_TMPDIR=/dev/shm/$DB/ ./db_bench -benchmarks=seekrandom -seek_nexts=$SEEK_NEXT -use_existing_db=true -cache_index_and_filter_blocks=false -num=1000000 -cache_size=0 -threads=1 -reads=200000000 -mmap_read=1 -verify_checksum=false
      ```
      
      results: perf improved marginally for ingestion_db and did not change significantly for normal_db:
      
      SEEK_NEXT | DB | code | ops/sec | % change
      -- | -- | -- | -- | --
      0 | normal_db | master | 350880 |  
      0 | normal_db | PR6944 | 351040 | 0.0
      0 | ingestion_db | master | 343255 |  
      0 | ingestion_db | PR6944 | 349424 | 1.8
      10 | normal_db | master | 218711 |  
      10 | normal_db | PR6944 | 217892 | -0.4
      10 | ingestion_db | master | 220334 |  
      10 | ingestion_db | PR6944 | 226437 | 2.8
      
      Reviewed By: pdillinger
      
      Differential Revision: D21924676
      
      Pulled By: ajkr
      
      fbshipit-source-id: ea4288a2eefa8112eb6c651a671c1de18c12e538
      dd29ad42
  17. 03 7月, 2020 1 次提交
  18. 02 7月, 2020 1 次提交
    • A
      Update Flush policy in PartitionedIndexBuilder on switching from user-key to... · 5edfe3a3
      Akanksha Mahajan 提交于
      Update Flush policy in PartitionedIndexBuilder on switching from user-key to internal-key mode (#7022)
      
      Summary:
      When format_version is high enough to support user-key and there are index entries for same user key that spans multiple data blocks then it changes from user-key mode to internal-key mode. But the flush policy is not reset to point to Block Builder of internal-keys. After this switch, no entries are added to user key index partition result, thus it never triggers flushing the block.
      
      Fix: After adding the entry in sub_builder_index_, if there is a switch from user-key to internal-key, then flush policy is updated to point to Block Builder of internal-keys index partition.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7022
      
      Test Plan:
      1. make check -j64
                 2. Added one unit test case
      
      Reviewed By: ajkr
      
      Differential Revision: D22197734
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: d87e9e46bccab8e896ee6979d6b79c51f73d479e
      5edfe3a3
  19. 30 6月, 2020 1 次提交
    • A
      Extend Get/MultiGet deadline support to table open (#6982) · 9a5886bd
      Anand Ananthabhotla 提交于
      Summary:
      Current implementation of the ```read_options.deadline``` option only checks the deadline for random file reads during point lookups. This PR extends the checks to file opens, prefetches and preloads as part of table open.
      
      The main changes are in the ```BlockBasedTable```, partitioned index and filter readers, and ```TableCache``` to take ReadOptions as an additional parameter. In ```BlockBasedTable::Open```, in order to retain existing behavior w.r.t checksum verification and block cache usage, we filter out most of the options in ```ReadOptions``` except ```deadline```. However, having the ```ReadOptions``` gives us more flexibility to honor other options like verify_checksums, fill_cache etc. in the future.
      
      Additional changes in callsites due to function signature changes in ```NewTableReader()``` and ```FilePrefetchBuffer```.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6982
      
      Test Plan: Add new unit tests in db_basic_test
      
      Reviewed By: riversand963
      
      Differential Revision: D22219515
      
      Pulled By: anand1976
      
      fbshipit-source-id: 8a3b92f4a889808013838603aa3ca35229cd501b
      9a5886bd
  20. 27 6月, 2020 1 次提交
    • S
      Add unity build to CircleCI (#7026) · f9817201
      sdong 提交于
      Summary:
      We are still keeping unity build working. So it's a good idea to add to a pre-commit CI.
      A latest GCC docker image just to get a little bit more coverage. Fix three small issues to make it pass.
      Also make unity_test to run db_basic_test rather than db_test to cut the test time. There is no point to run expensive tests here. It was set to run db_test before db_basic_test was separated out.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7026
      
      Test Plan: watch tests to pass.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22223197
      
      fbshipit-source-id: baa3b6cbb623bf359829b63ce35715c75bcb0ed4
      f9817201
  21. 23 6月, 2020 1 次提交
    • P
      Minimize memory internal fragmentation for Bloom filters (#6427) · 5b2bbacb
      Peter Dillinger 提交于
      Summary:
      New experimental option BBTO::optimize_filters_for_memory builds
      filters that maximize their use of "usable size" from malloc_usable_size,
      which is also used to compute block cache charges.
      
      Rather than always "rounding up," we track state in the
      BloomFilterPolicy object to mix essentially "rounding down" and
      "rounding up" so that the average FP rate of all generated filters is
      the same as without the option. (YMMV as heavily accessed filters might
      be unluckily lower accuracy.)
      
      Thus, the option near-minimizes what the block cache considers as
      "memory used" for a given target Bloom filter false positive rate and
      Bloom filter implementation. There are no forward or backward
      compatibility issues with this change, though it only works on the
      format_version=5 Bloom filter.
      
      With Jemalloc, we see about 10% reduction in memory footprint (and block
      cache charge) for Bloom filters, but 1-2% increase in storage footprint,
      due to encoding efficiency losses (FP rate is non-linear with bits/key).
      
      Why not weighted random round up/down rather than state tracking? By
      only requiring malloc_usable_size, we don't actually know what the next
      larger and next smaller usable sizes for the allocator are. We pick a
      requested size, accept and use whatever usable size it has, and use the
      difference to inform our next choice. This allows us to narrow in on the
      right balance without tracking/predicting usable sizes.
      
      Why not weight history of generated filter false positive rates by
      number of keys? This could lead to excess skew in small filters after
      generating a large filter.
      
      Results from filter_bench with jemalloc (irrelevant details omitted):
      
          (normal keys/filter, but high variance)
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9
          Build avg ns/key: 29.6278
          Number of filters: 5516
          Total size (MB): 200.046
          Reported total allocated memory (MB): 220.597
          Reported internal fragmentation: 10.2732%
          Bits/key stored: 10.0097
          Average FP rate %: 0.965228
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
          Build avg ns/key: 30.5104
          Number of filters: 5464
          Total size (MB): 200.015
          Reported total allocated memory (MB): 200.322
          Reported internal fragmentation: 0.153709%
          Bits/key stored: 10.1011
          Average FP rate %: 0.966313
      
          (very few keys / filter, optimization not as effective due to ~59 byte
           internal fragmentation in blocked Bloom filter representation)
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9
          Build avg ns/key: 29.5649
          Number of filters: 162950
          Total size (MB): 200.001
          Reported total allocated memory (MB): 224.624
          Reported internal fragmentation: 12.3117%
          Bits/key stored: 10.2951
          Average FP rate %: 0.821534
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
          Build avg ns/key: 31.8057
          Number of filters: 159849
          Total size (MB): 200
          Reported total allocated memory (MB): 208.846
          Reported internal fragmentation: 4.42297%
          Bits/key stored: 10.4948
          Average FP rate %: 0.811006
      
          (high keys/filter)
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9
          Build avg ns/key: 29.7017
          Number of filters: 164
          Total size (MB): 200.352
          Reported total allocated memory (MB): 221.5
          Reported internal fragmentation: 10.5552%
          Bits/key stored: 10.0003
          Average FP rate %: 0.969358
          $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
          Build avg ns/key: 30.7131
          Number of filters: 160
          Total size (MB): 200.928
          Reported total allocated memory (MB): 200.938
          Reported internal fragmentation: 0.00448054%
          Bits/key stored: 10.1852
          Average FP rate %: 0.963387
      
      And from db_bench (block cache) with jemalloc:
      
          $ ./db_bench -db=/dev/shm/dbbench.no_optimize -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false
          $ ./db_bench -db=/dev/shm/dbbench -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -optimize_filters_for_memory -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false
          $ (for FILE in /dev/shm/dbbench.no_optimize/*.sst; do ./sst_dump --file=$FILE --show_properties | grep 'filter block' ; done) | awk '{ t += $4; } END { print t; }'
          17063835
          $ (for FILE in /dev/shm/dbbench/*.sst; do ./sst_dump --file=$FILE --show_properties | grep 'filter block' ; done) | awk '{ t += $4; } END { print t; }'
          17430747
          $ #^ 2.1% additional filter storage
          $ ./db_bench -db=/dev/shm/dbbench.no_optimize -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000
          rocksdb.block.cache.index.add COUNT : 33
          rocksdb.block.cache.index.bytes.insert COUNT : 8440400
          rocksdb.block.cache.filter.add COUNT : 33
          rocksdb.block.cache.filter.bytes.insert COUNT : 21087528
          rocksdb.bloom.filter.useful COUNT : 4963889
          rocksdb.bloom.filter.full.positive COUNT : 1214081
          rocksdb.bloom.filter.full.true.positive COUNT : 1161999
          $ #^ 1.04 % observed FP rate
          $ ./db_bench -db=/dev/shm/dbbench -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -optimize_filters_for_memory -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000
          rocksdb.block.cache.index.add COUNT : 33
          rocksdb.block.cache.index.bytes.insert COUNT : 8448592
          rocksdb.block.cache.filter.add COUNT : 33
          rocksdb.block.cache.filter.bytes.insert COUNT : 18220328
          rocksdb.bloom.filter.useful COUNT : 5360933
          rocksdb.bloom.filter.full.positive COUNT : 1321315
          rocksdb.bloom.filter.full.true.positive COUNT : 1262999
          $ #^ 1.08 % observed FP rate, 13.6% less memory usage for filters
      
      (Due to specific key density, this example tends to generate filters that are "worse than average" for internal fragmentation. "Better than average" cases can show little or no improvement.)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6427
      
      Test Plan: unit test added, 'make check' with gcc, clang and valgrind
      
      Reviewed By: siying
      
      Differential Revision: D22124374
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f3e3aa152f9043ddf4fae25799e76341d0d8714e
      5b2bbacb
  22. 20 6月, 2020 1 次提交
    • P
      Fix block checksum for >=4GB, refactor (#6978) · 25a0d0ca
      Peter Dillinger 提交于
      Summary:
      Although RocksDB falls over in various other ways with KVs
      around 4GB or more, this change fixes how XXH32 and XXH64 were being
      called by the block checksum code to support >= 4GB in case that should
      ever happen, or the code copied for other uses.
      
      This change is not a schema compatibility issue because the checksum
      verification code would checksum the first (block_size + 1) mod 2^32
      bytes while the checksum construction code would checksum the first
      block_size mod 2^32 plus the compression type byte, meaning the
      XXH32/64 checksums for >=4GB block would not match about 255/256 times.
      
      While touching this code, I refactored to consolidate redundant
      implementations, improving diagnostics and performance tracking in some
      cases. Also used less confusing language in those diagnostics.
      
      Makes https://github.com/facebook/rocksdb/issues/6875 obsolete.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6978
      
      Test Plan:
      I was able to write a test for this using an SST file writer
      and VerifyChecksum in a reader. The test fails before the fix, though
      I'm leaving the test disabled because I don't think it's worth the
      expense of running regularly.
      
      Reviewed By: gg814
      
      Differential Revision: D22143260
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 982993d16134e8c50bea2269047f901c1783726e
      25a0d0ca
  23. 18 6月, 2020 2 次提交
    • S
      Fix the bug that compressed cache is disabled in read-only DBs (#6990) · 223b57ee
      sdong 提交于
      Summary:
      Compressed block cache is disabled in https://github.com/facebook/rocksdb/pull/4650 for no good reason. Re-enable it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6990
      
      Test Plan: Add a unit test to make sure a general function works with read-only DB + compressed block cache.
      
      Reviewed By: ltamasi
      
      Differential Revision: D22072755
      
      fbshipit-source-id: 2a55df6363de23a78979cf6c747526359e5dc7a1
      223b57ee
    • Z
      Store DB identity and DB session ID in SST files (#6983) · 94d04529
      Zitan Chen 提交于
      Summary:
      `db_id` and `db_session_id` are now part of the table properties for all formats and stored in SST files. This adds about 99 bytes to each new SST file.
      
      The `TablePropertiesNames` for these two identifiers are `rocksdb.creating.db.identity` and `rocksdb.creating.session.identity`.
      
      In addition, SST files generated from SstFileWriter and Repairer have DB identity “SST Writer” and “DB Repairer”, respectively. Their DB session IDs are generated in the same way as `DB::GetDbSessionId`.
      
      A table property test is added.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6983
      
      Test Plan: make check and some manual tests.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22048826
      
      Pulled By: gg814
      
      fbshipit-source-id: afdf8c11424a6f509b5c0b06dafad584a80103c9
      94d04529
  24. 14 6月, 2020 1 次提交
    • Z
      Fix persistent cache on windows (#6932) · 9c24a5cb
      Zhen Li 提交于
      Summary:
      Persistent cache feature caused rocks db crash on windows. I posted a issue for it, https://github.com/facebook/rocksdb/issues/6919. I found this is because no "persistent_cache_key_prefix" is generated for persistent cache. Looking repo history, "GetUniqueIdFromFile" is not implemented on Windows. So my fix is adding "NewId()" function in "persistent_cache" and using it to generate prefix for persistent cache. In this PR, i also re-enable related test cases defined in "db_test2" and "persistent_cache_test" for windows.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6932
      
      Test Plan:
      1. run related test cases in "db_test2" and "persistent_cache_test" on windows and see it passed.
      2. manually run db_bench.exe with "read_cache_path" and verified.
      
      Reviewed By: riversand963
      
      Differential Revision: D21911608
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: cdfd938d54a385edbb2836b13aaa1d39b0a6f1c2
      9c24a5cb
  25. 11 6月, 2020 1 次提交
    • A
      save a key comparison in block seeks (#6646) · e6be168a
      Andrew Kryczka 提交于
      Summary:
      This saves up to two key comparisons in block seeks. The first key
      comparison saved is a redundant key comparison against the restart key
      where the linear scan starts. This comparison is saved in all cases
      except when the found key is in the first restart interval. The
      second key comparison saved is a redundant key comparison against the
      restart key where the linear scan ends. This is only saved in cases
      where all keys in the restart interval are less than the target
      (probability roughly `1/restart_interval`).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6646
      
      Test Plan:
      ran a benchmark with mostly default settings and counted key comparisons
      
      before: `user_key_comparison_count = 19399529`
      after: `user_key_comparison_count = 18431498`
      
      setup command:
      
      ```
      $ TEST_TMPDIR=/dev/shm/dbbench ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -max_background_jobs=12 -level_compaction_dynamic_level_bytes=true -num=10000000
      ```
      
      benchmark command:
      
      ```
      $ TEST_TMPDIR=/dev/shm/dbbench/ ./db_bench -use_existing_db=true -benchmarks=readrandom -disable_auto_compactions=true -num=10000000 -compression_type=none -reads=1000000 -perf_level=3
      ```
      
      Reviewed By: pdillinger
      
      Differential Revision: D20849707
      
      Pulled By: ajkr
      
      fbshipit-source-id: 1f01c5cd99ea771fd27974046e37b194f1cdcfac
      e6be168a
  26. 10 6月, 2020 1 次提交
  27. 08 6月, 2020 1 次提交
    • Y
      Remove unnecessary inclusion of version_edit.h in env (#6952) · 3020df9d
      Yanqin Jin 提交于
      Summary:
      In db_options.c, we should avoid including header files in the `db` directory to avoid introducing unnecessary dependency. The reason why `version_edit.h` has been included in `db_options.cc` is because we need two constants, `kUnknownChecksum` and `kUnknownChecksumFuncName`. We can put these two constants as `constexpr` in the public header `file_checksum.h`.
      
      Test plan (devserver):
      make check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6952
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D21925341
      
      Pulled By: riversand963
      
      fbshipit-source-id: 2902f3b74c97f0cf16c58ad24c095c787c3a40e2
      3020df9d
  28. 06 6月, 2020 1 次提交
    • A
      Check iterator status BlockBasedTableReader::VerifyChecksumInBlocks() (#6909) · 98b0cbea
      anand76 提交于
      Summary:
      The ```for``` loop in ```VerifyChecksumInBlocks``` only checks ```index_iter->Valid()``` which could be ```false``` either due to reaching the end of the index or, in case of partitioned index, it could be due to a checksum mismatch error when reading a 2nd level index block. Instead of throwing away the index iterator status, we need to return any errors back to the caller.
      
      Tests:
      Add a test in block_based_table_reader_test.cc.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6909
      
      Reviewed By: pdillinger
      
      Differential Revision: D21833922
      
      Pulled By: anand1976
      
      fbshipit-source-id: bc778ebf1121dbbdd768689de5183f07a9f0beae
      98b0cbea
  29. 04 6月, 2020 2 次提交
    • S
      Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) · afa35188
      sdong 提交于
      Summary:
      This reverts commit 8d87e9ce.
      
      Based on offline discussions, it's too early to upgrade to gtest 1.10, as it prevents some developers from using an older version of gtest to integrate to some other systems. Revert it for now.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6923
      
      Reviewed By: pdillinger
      
      Differential Revision: D21864799
      
      fbshipit-source-id: d0726b1ff649fc911b9378f1763316200bd363fc
      afa35188
    • P
      Fix handling of too-small filter partition size (#6905) · 9360776c
      Peter Dillinger 提交于
      Summary:
      Because ARM and some other platforms have a larger cache line
      size, they have a larger minimum filter size, which causes recently
      added PartitionedMultiGet test in db_bloom_filter_test to fail on those
      platforms. The code would actually end up using larger partitions,
      because keys_per_partition_ would be 0 and never == number of keys
      added.
      
      The code now attempts to get as close as possible to the small target
      size, while fully utilizing that filter size, if the target partition
      size is smaller than the minimum filter size.
      
      Also updated the test to break more uniformly across platforms
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6905
      
      Test Plan: updated test, tested on ARM
      
      Reviewed By: anand1976
      
      Differential Revision: D21840639
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 11684b6d35f43d2e98b85ddb2c8dcfd59d670817
      9360776c
  30. 03 6月, 2020 2 次提交
    • P
      For ApproximateSizes, pro-rate table metadata size over data blocks (#6784) · 14eca6bf
      Peter Dillinger 提交于
      Summary:
      The implementation of GetApproximateSizes was inconsistent in
      its treatment of the size of non-data blocks of SST files, sometimes
      including and sometimes now. This was at its worst with large portion
      of table file used by filters and querying a small range that crossed
      a table boundary: the size estimate would include large filter size.
      
      It's conceivable that someone might want only to know the size in terms
      of data blocks, but I believe that's unlikely enough to ignore for now.
      Similarly, there's no evidence the internal function AppoximateOffsetOf
      is used for anything other than a one-sided ApproximateSize, so I intend
      to refactor to remove redundancy in a follow-up commit.
      
      So to fix this, GetApproximateSizes (and implementation details
      ApproximateSize and ApproximateOffsetOf) now consistently include in
      their returned sizes a portion of table file metadata (incl filters
      and indexes) based on the size portion of the data blocks in range. In
      other words, if a key range covers data blocks that are X% by size of all
      the table's data blocks, returned approximate size is X% of the total
      file size. It would technically be more accurate to attribute metadata
      based on number of keys, but that's not computationally efficient with
      data available and rarely a meaningful difference.
      
      Also includes miscellaneous comment improvements / clarifications.
      
      Also included is a new approximatesizerandom benchmark for db_bench.
      No significant performance difference seen with this change, whether ~700 ops/sec with cache_index_and_filter_blocks and small cache or ~150k ops/sec without cache_index_and_filter_blocks.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6784
      
      Test Plan:
      Test added to DBTest.ApproximateSizesFilesWithErrorMargin.
      Old code running new test...
      
          [ RUN      ] DBTest.ApproximateSizesFilesWithErrorMargin
          db/db_test.cc:1562: Failure
          Expected: (size) <= (11 * 100), actual: 9478 vs 1100
      
      Other tests updated to reflect consistent accounting of metadata.
      
      Reviewed By: siying
      
      Differential Revision: D21334706
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 6f86870e45213334fedbe9c73b4ebb1d8d611185
      14eca6bf
    • S
      Reduce dependency on gtest dependency in release code (#6907) · 298b00a3
      sdong 提交于
      Summary:
      Release code now depends on gtest, indirectly through including "test_util/testharness.h". This creates multiple problems. One important reason is the definition of IGNORE_STATUS_IF_ERROR() in test_util/testharness.h. Move it to sync_point.h instead.
      Note that utilities/cassandra/format.h still depends on "test_util/testharness.h". This will be resolved in a separate diff.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6907
      
      Test Plan: Run all existing tests.
      
      Reviewed By: ajkr
      
      Differential Revision: D21829884
      
      fbshipit-source-id: 9253c19ffde2936f3ae68998210f8e54f645a6e6
      298b00a3
  31. 02 6月, 2020 3 次提交
  32. 29 5月, 2020 1 次提交
    • A
      avoid `IterKey::UpdateInternalKey()` in `BlockIter` (#6843) · c5abf78b
      Andrew Kryczka 提交于
      Summary:
      `IterKey::UpdateInternalKey()` is an error-prone API as it's
      incompatible with `IterKey::TrimAppend()`, which is used for
      decoding delta-encoded internal keys. This PR stops using it in
      `BlockIter`. Instead, it assigns global seqno in a separate `IterKey`'s
      buffer when needed. The logic for safely getting a Slice with global
      seqno properly assigned is encapsulated in `GlobalSeqnoAppliedKey`.
      `BinarySeek()` is also migrated to use this API (previously it ignored
      global seqno entirely).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6843
      
      Test Plan:
      benchmark setup -- single file DBs, in-memory, no compression. "normal_db"
      created by regular flush; "ingestion_db" created by ingesting a file. Both
      DBs have same contents.
      
      ```
      $ TEST_TMPDIR=/dev/shm/normal_db/ ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=10485760000 -disable_auto_compactions=true -compression_type=none -num=1000000
      $ ./ldb write_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ --compression_type=no --hex --create_if_missing < <(./sst_dump --command=scan --output_hex --file=/dev/shm/normal_db/dbbench/000007.sst | awk 'began {print "0x" substr($1, 2, length($1) - 2), "==>", "0x" $5} ; /^Sst file format: block-based/ {began=1}')
      $ ./ldb ingest_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/
      ```
      
      benchmark run command:
      ```
      TEST_TMPDIR=/dev/shm/$DB/ ./db_bench -benchmarks=seekrandom -seek_nexts=10 -use_existing_db=true -cache_index_and_filter_blocks=false -num=1000000 -cache_size=1048576000 -threads=1 -reads=40000000
      ```
      
      results:
      
      | DB | code | throughput |
      |---|---|---|
      | normal_db | master |  267.9 |
      | normal_db   |    PR6843 | 254.2 (-5.1%) |
      | ingestion_db |   master |  259.6 |
      | ingestion_db |   PR6843 | 250.5 (-3.5%) |
      
      Reviewed By: pdillinger
      
      Differential Revision: D21562604
      
      Pulled By: ajkr
      
      fbshipit-source-id: 937596f836930515da8084d11755e1f247dcb264
      c5abf78b
  33. 27 5月, 2020 1 次提交
  34. 22 5月, 2020 1 次提交