1. 27 3月, 2020 1 次提交
    • M
      Fix iterator reading filter block despite read_tier == kBlockCacheTier (#6562) · 963af52f
      Mike Kolupaev 提交于
      Summary:
      We're seeing iterators with `ReadOptions::read_tier == kBlockCacheTier` sometimes doing file reads. Stack trace:
      
      ```
      rocksdb::RandomAccessFileReader::Read(unsigned long, unsigned long, rocksdb::Slice*, char*, bool) const
      rocksdb::BlockFetcher::ReadBlockContents()
      rocksdb::Status rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<rocksdb::ParsedFullFilterBlock>(rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>*, rocksdb::BlockType, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, rocksdb::BlockContents*) const
      rocksdb::Status rocksdb::BlockBasedTable::RetrieveBlock<rocksdb::ParsedFullFilterBlock>(rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>*, rocksdb::BlockType, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, bool, bool) const
      rocksdb::FilterBlockReaderCommon<rocksdb::ParsedFullFilterBlock>::ReadFilterBlock(rocksdb::BlockBasedTable const*, rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, bool, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>*)
      rocksdb::FilterBlockReaderCommon<rocksdb::ParsedFullFilterBlock>::GetOrReadFilterBlock(bool, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>*) const
      rocksdb::FullFilterBlockReader::MayMatch(rocksdb::Slice const&, bool, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*) const
      rocksdb::FullFilterBlockReader::RangeMayExist(rocksdb::Slice const*, rocksdb::Slice const&, rocksdb::SliceTransform const*, rocksdb::Comparator const*, rocksdb::Slice const*, bool*, bool, rocksdb::BlockCacheLookupContext*)
      rocksdb::BlockBasedTable::PrefixMayMatch(rocksdb::Slice const&, rocksdb::ReadOptions const&, rocksdb::SliceTransform const*, bool, rocksdb::BlockCacheLookupContext*) const
      rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::SeekImpl(rocksdb::Slice const*)
      rocksdb::ForwardIterator::SeekInternal(rocksdb::Slice const&, bool)
      rocksdb::DBIter::Seek(rocksdb::Slice const&)
      ```
      
      `BlockBasedTableIterator::CheckPrefixMayMatch` was missing a check for `kBlockCacheTier`. This PR adds it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6562
      
      Test Plan: deployed it to a logdevice test cluster and looked at logdevice's IO tracing.
      
      Reviewed By: siying
      
      Differential Revision: D20529368
      
      Pulled By: al13n321
      
      fbshipit-source-id: 65bf33964b1951464415c900336635fb20919611
      963af52f
  2. 21 2月, 2020 1 次提交
    • S
      Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) · fdf882de
      sdong 提交于
      Summary:
      When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
      
      Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
      
      Differential Revision: D19977691
      
      fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
      fdf882de
  3. 19 10月, 2019 1 次提交
    • L
      Store the filter bits reader alongside the filter block contents (#5936) · 29ccf207
      Levi Tamasi 提交于
      Summary:
      Amongst other things, PR https://github.com/facebook/rocksdb/issues/5504 refactored the filter block readers so that
      only the filter block contents are stored in the block cache (as opposed to the
      earlier design where the cache stored the filter block reader itself, leading to
      potentially dangling pointers and concurrency bugs). However, this change
      introduced a performance hit since with the new code, the metadata fields are
      re-parsed upon every access. This patch reunites the block contents with the
      filter bits reader to eliminate this overhead; since this is still a self-contained
      pure data object, it is safe to store it in the cache. (Note: this is similar to how
      the zstd digest is handled.)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5936
      
      Test Plan:
      make asan_check
      
      filter_bench results for the old code:
      
      ```
      $ ./filter_bench -quick
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      Building...
      Build avg ns/key: 26.7153
      Number of filters: 16669
      Total memory (MB): 200.009
      Bits/key actual: 10.0647
      ----------------------------
      Inside queries...
        Dry run (46b) ns/op: 33.4258
        Single filter ns/op: 42.5974
        Random filter ns/op: 217.861
      ----------------------------
      Outside queries...
        Dry run (25d) ns/op: 32.4217
        Single filter ns/op: 50.9855
        Random filter ns/op: 219.167
          Average FP rate %: 1.13993
      ----------------------------
      Done. (For more info, run with -legend or -help.)
      
      $ ./filter_bench -quick -use_full_block_reader
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      Building...
      Build avg ns/key: 26.5172
      Number of filters: 16669
      Total memory (MB): 200.009
      Bits/key actual: 10.0647
      ----------------------------
      Inside queries...
        Dry run (46b) ns/op: 32.3556
        Single filter ns/op: 83.2239
        Random filter ns/op: 370.676
      ----------------------------
      Outside queries...
        Dry run (25d) ns/op: 32.2265
        Single filter ns/op: 93.5651
        Random filter ns/op: 408.393
          Average FP rate %: 1.13993
      ----------------------------
      Done. (For more info, run with -legend or -help.)
      ```
      
      With the new code:
      
      ```
      $ ./filter_bench -quick
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      Building...
      Build avg ns/key: 25.4285
      Number of filters: 16669
      Total memory (MB): 200.009
      Bits/key actual: 10.0647
      ----------------------------
      Inside queries...
        Dry run (46b) ns/op: 31.0594
        Single filter ns/op: 43.8974
        Random filter ns/op: 226.075
      ----------------------------
      Outside queries...
        Dry run (25d) ns/op: 31.0295
        Single filter ns/op: 50.3824
        Random filter ns/op: 226.805
          Average FP rate %: 1.13993
      ----------------------------
      Done. (For more info, run with -legend or -help.)
      
      $ ./filter_bench -quick -use_full_block_reader
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      Building...
      Build avg ns/key: 26.5308
      Number of filters: 16669
      Total memory (MB): 200.009
      Bits/key actual: 10.0647
      ----------------------------
      Inside queries...
        Dry run (46b) ns/op: 33.2968
        Single filter ns/op: 58.6163
        Random filter ns/op: 291.434
      ----------------------------
      Outside queries...
        Dry run (25d) ns/op: 32.1839
        Single filter ns/op: 66.9039
        Random filter ns/op: 292.828
          Average FP rate %: 1.13993
      ----------------------------
      Done. (For more info, run with -legend or -help.)
      ```
      
      Differential Revision: D17991712
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 7ea205550217bfaaa1d5158ebd658e5832e60f29
      29ccf207
  4. 25 9月, 2019 1 次提交
    • M
      Fix a bug in format_version 3 + partition filters + prefix search (#5835) · 6652c94f
      Maysam Yabandeh 提交于
      Summary:
      Partitioned filters make use of a top-level index to find the partition in which the filter resides. The top-level index has a key per partition. The key is guaranteed to be larger or equal than any key in that partition. When used with format_version 3, which excludes the sequence number form index keys, the separator key in the index could be equal to the prefix of the keys in the next partition. In this way, when searching for the key, the top-level index will lead us to the previous partition, which has no key with that prefix. The prefix bloom test thus returns false, although the prefix exists in the bloom of the next partition.
      The patch fixes that by a hack: It always adds the prefix of the first key of the next partition to the bloom of the current partition. In this way, in the corner cases that the index will lead us to the previous partition, we still can find the bloom filter there.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5835
      
      Differential Revision: D17513585
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: e2d1ff26c759e6e03875c4d57f4228316ecf50e9
      6652c94f
  5. 12 9月, 2019 1 次提交
  6. 22 8月, 2019 1 次提交
    • A
      Fix MultiGet() bug when whole_key_filtering is disabled (#5665) · 9046bdc5
      anand76 提交于
      Summary:
      The batched MultiGet() implementation was not correctly handling bloom filter lookups when whole_key_filtering is disabled. It was incorrectly skipping keys not in the prefix_extractor domain, and not calling transform for keys in domain. This PR fixes both problems by moving the domain check and transformation to the FilterBlockReader.
      
      Tests:
      Unit test (confirmed failed before the fix)
      make check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5665
      
      Differential Revision: D16902380
      
      Pulled By: anand1976
      
      fbshipit-source-id: a6be81ad68a6e37134a65246aec7a2c590eccf00
      9046bdc5
  7. 17 7月, 2019 1 次提交
    • L
      Move the filter readers out of the block cache (#5504) · 3bde41b5
      Levi Tamasi 提交于
      Summary:
      Currently, when the block cache is used for the filter block, it is not
      really the block itself that is stored in the cache but a FilterBlockReader
      object. Since this object is not pure data (it has, for instance, pointers that
      might dangle, including in one case a back pointer to the TableReader), it's not
      really sharable. To avoid the issues around this, the current code erases the
      cache entries when the TableReader is closed (which, BTW, is not sufficient
      since a concurrent TableReader might have picked up the object in the meantime).
      Instead of doing this, the patch moves the FilterBlockReader out of the cache
      altogether, and decouples the filter reader object from the filter block.
      In particular, instead of the TableReader owning, or caching/pinning the
      FilterBlockReader (based on the customer's settings), with the change the
      TableReader unconditionally owns the FilterBlockReader, which in turn
      owns/caches/pins the filter block. This change also enables us to reuse the code
      paths historically used for data blocks for filters as well.
      
      Note:
      Eviction statistics for filter blocks are temporarily broken. We plan to fix this in a
      separate phase.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5504
      
      Test Plan: make asan_check
      
      Differential Revision: D16036974
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 770f543c5fb4ed126fd1e04bfd3809cf4ff9c091
      3bde41b5
  8. 11 6月, 2019 1 次提交
    • H
      Create a BlockCacheLookupContext to enable fine-grained block cache tracing. (#5421) · 5efa0d6b
      haoyuhuang 提交于
      Summary:
      BlockCacheLookupContext only contains the caller for now.
      We will trace block accesses at five places:
      1. BlockBasedTable::GetFilter.
      2. BlockBasedTable::GetUncompressedDict.
      3. BlockBasedTable::MaybeReadAndLoadToCache. (To trace access on data, index, and range deletion block.)
      4. BlockBasedTable::Get. (To trace the referenced key and whether the referenced key exists in a fetched data block.)
      5. BlockBasedTable::MultiGet. (To trace the referenced key and whether the referenced key exists in a fetched data block.)
      
      We create the context at:
      1. BlockBasedTable::Get. (kUserGet)
      2. BlockBasedTable::MultiGet. (kUserMGet)
      3. BlockBasedTable::NewIterator. (either kUserIterator, kCompaction, or external SST ingestion calls this function.)
      4. BlockBasedTable::Open. (kPrefetch)
      5. Index/Filter::CacheDependencies. (kPrefetch)
      6. BlockBasedTable::ApproximateOffsetOf. (kCompaction or kUserApproximateSize).
      
      I loaded 1 million key-value pairs into the database and ran the readrandom benchmark with a single thread. I gave the block cache 10 GB to make sure all reads hit the block cache after warmup. The throughput is comparable.
      Throughput of this PR: 231334 ops/s.
      Throughput of the master branch: 238428 ops/s.
      
      Experiment setup:
      RocksDB:    version 6.2
      Date:       Mon Jun 10 10:42:51 2019
      CPU:        24 * Intel Core Processor (Skylake)
      CPUCache:   16384 KB
      Keys:       20 bytes each
      Values:     100 bytes each (100 bytes after compression)
      Entries:    1000000
      Prefix:    20 bytes
      Keys per prefix:    0
      RawSize:    114.4 MB (estimated)
      FileSize:   114.4 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: NoCompression
      Compression sampling rate: 0
      Memtablerep: skip_list
      Perf Level: 1
      
      Load command: ./db_bench --benchmarks="fillseq" --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000
      
      Run command: ./db_bench --benchmarks="readrandom,stats" --use_existing_db --threads=1 --duration=120 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000 --duration=120
      
      TODOs:
      1. Create a caller for external SST file ingestion and differentiate the callers for iterator.
      2. Integrate tracer to trace block cache accesses.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5421
      
      Differential Revision: D15704258
      
      Pulled By: HaoyuHuang
      
      fbshipit-source-id: 4aa8a55f8cb1576ffb367bfa3186a91d8f06d93a
      5efa0d6b
  9. 31 5月, 2019 2 次提交
  10. 12 4月, 2019 1 次提交
    • A
      Introduce a new MultiGet batching implementation (#5011) · fefd4b98
      anand76 提交于
      Summary:
      This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching.
      
      Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to -
      1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch()
      2. Bloom filter cachelines can be prefetched, hiding the cache miss latency
      
      The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress.
      
      Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32).
      
      Batch   Sizes
      
      1        | 2        | 4         | 8      | 16  | 32
      
      Random pattern (Stride length 0)
      4.158 | 4.109 | 4.026 | 4.05 | 4.1 | 4.074        - Get
      4.438 | 4.302 | 4.165 | 4.122 | 4.096 | 4.075 - MultiGet (no batching)
      4.461 | 4.256 | 4.277 | 4.11 | 4.182 | 4.14        - MultiGet (w/ batching)
      
      Good locality (Stride length 16)
      4.048 | 3.659 | 3.248 | 2.99 | 2.84 | 2.753
      4.429 | 3.728 | 3.406 | 3.053 | 2.911 | 2.781
      4.452 | 3.45 | 2.833 | 2.451 | 2.233 | 2.135
      
      Good locality (Stride length 256)
      4.066 | 3.786 | 3.581 | 3.447 | 3.415 | 3.232
      4.406 | 4.005 | 3.644 | 3.49 | 3.381 | 3.268
      4.393 | 3.649 | 3.186 | 2.882 | 2.676 | 2.62
      
      Medium locality (Stride length 4096)
      4.012 | 3.922 | 3.768 | 3.61 | 3.582 | 3.555
      4.364 | 4.057 | 3.791 | 3.65 | 3.57 | 3.465
      4.479 | 3.758 | 3.316 | 3.077 | 2.959 | 2.891
      
      dbbench command used (on a DB with 4 levels, 12 million keys)-
      TEST_TMPDIR=/dev/shm numactl -C 10  ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011
      
      Differential Revision: D14348703
      
      Pulled By: anand1976
      
      fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b
      fefd4b98
  11. 29 6月, 2018 1 次提交
    • M
      Charging block cache more accurately (#4073) · 29ffbb8a
      Maysam Yabandeh 提交于
      Summary:
      Currently the block cache is charged only by the size of the raw data block and excludes the overhead of the c++ objects that contain the raw data block. The patch improves the accuracy of the charge by including the c++ object overhead into it.
      Closes https://github.com/facebook/rocksdb/pull/4073
      
      Differential Revision: D8686552
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 8472f7fc163c0644533bc6942e20cdd5725f520f
      29ffbb8a
  12. 27 6月, 2018 1 次提交
  13. 20 6月, 2018 1 次提交
  14. 22 5月, 2018 1 次提交
    • Z
      Move prefix_extractor to MutableCFOptions · c3ebc758
      Zhongyi Xie 提交于
      Summary:
      Currently it is not possible to change bloom filter config without restart the db, which is causing a lot of operational complexity for users.
      This PR aims to make it possible to dynamically change bloom filter config.
      Closes https://github.com/facebook/rocksdb/pull/3601
      
      Differential Revision: D7253114
      
      Pulled By: miasantreble
      
      fbshipit-source-id: f22595437d3e0b86c95918c484502de2ceca120c
      c3ebc758
  15. 27 4月, 2018 1 次提交
    • M
      Fix the bloom filter skipping empty prefixes · 7e4e3814
      Maysam Yabandeh 提交于
      Summary:
      bc0da4b5 optimized bloom filters by skipping duplicate entires when the whole key and prefixes are both added to the bloom. It however used empty string as the initial value of the last entry added to the bloom. This is incorrect since empty key/prefix are valid entires by themselves. This patch fixes that.
      Closes https://github.com/facebook/rocksdb/pull/3776
      
      Differential Revision: D7778803
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: d5a065daebee17f9403cac51e9d5626aac87bfbc
      7e4e3814
  16. 25 4月, 2018 1 次提交
    • M
      Skip duplicate bloom keys when whole_key and prefix are mixed · bc0da4b5
      Maysam Yabandeh 提交于
      Summary:
      Currently we rely on FilterBitsBuilder to skip the duplicate keys. It does that by comparing that hash of the key to the hash of the last added entry. This logic breaks however when we have whole_key_filtering mixed with prefix blooms as their addition to FilterBitsBuilder will be interleaved. The patch fixes that by comparing the last whole key and last prefix with the whole key and prefix of the new key respectively and skip the call to FilterBitsBuilder if it is a duplicate.
      Closes https://github.com/facebook/rocksdb/pull/3764
      
      Differential Revision: D7744413
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 15df73bbbafdfd754d4e1f42ea07f47b03bc5eb8
      bc0da4b5
  17. 22 3月, 2018 1 次提交
  18. 06 3月, 2018 1 次提交
  19. 23 2月, 2018 2 次提交
  20. 22 7月, 2017 2 次提交
  21. 16 7月, 2017 1 次提交
  22. 28 4月, 2017 1 次提交
  23. 23 3月, 2017 1 次提交
  24. 08 3月, 2017 1 次提交
  25. 11 6月, 2016 1 次提交
  26. 04 6月, 2016 1 次提交
    • A
      Add statistics field to show total size of index and filter blocks in block cache · e5328779
      Aaron Gao 提交于
      Summary: With `table_options.cache_index_and_filter_blocks = true`, index and filter blocks are stored in block cache. Then people are curious how much of the block cache total size is used by indexes and bloom filters. It will be nice we have a way to report that. It can help people tune performance and plan for optimized hardware setting. We add several enum values for db Statistics. BLOCK_CACHE_INDEX/FILTER_BYTES_INSERT - BLOCK_CACHE_INDEX/FILTER_BYTES_ERASE = current INDEX/FILTER total block size in bytes.
      
      Test Plan:
      write a test case called `DBBlockCacheTest.IndexAndFilterBlocksStats`. The result is:
      ```
      [gzh@dev9927.prn1 ~/local/rocksdb]  make db_block_cache_test -j64 && ./db_block_cache_test --gtest_filter=DBBlockCacheTest.IndexAndFilterBlocksStats
      Makefile:101: Warning: Compiling in debug mode. Don't use the resulting binary in production
        GEN      util/build_version.cc
        make: `db_block_cache_test' is up to date.
        Note: Google Test filter = DBBlockCacheTest.IndexAndFilterBlocksStats
        [==========] Running 1 test from 1 test case.
        [----------] Global test environment set-up.
        [----------] 1 test from DBBlockCacheTest
        [ RUN      ] DBBlockCacheTest.IndexAndFilterBlocksStats
        [       OK ] DBBlockCacheTest.IndexAndFilterBlocksStats (689 ms)
        [----------] 1 test from DBBlockCacheTest (689 ms total)
      
        [----------] Global test environment tear-down
        [==========] 1 test from 1 test case ran. (689 ms total)
        [  PASSED  ] 1 test.
      ```
      
      Reviewers: IslamAbdelRahman, andrewkr, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D58677
      e5328779
  27. 10 2月, 2016 1 次提交
  28. 12 2月, 2015 1 次提交
    • S
      Remember whole key/prefix filtering on/off in SST file · 68af7811
      sdong 提交于
      Summary: Remember whole key or prefix filtering on/off in SST files. If user opens the DB with a different setting that cannot be satisfied while reading the SST file, ignore the bloom filter.
      
      Test Plan: Add a unit test for it
      
      Reviewers: yhchiang, igor, rven
      
      Reviewed By: rven
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D32889
      68af7811
  29. 18 9月, 2014 1 次提交
    • T
      Replace naked calls to operator new and delete (Fixes #222) · fb6456b0
      Torrie Fischer 提交于
      This replaces a mishmash of pointers in the Block and BlockContents classes with
      std::unique_ptr. It also changes the semantics of BlockContents to be limited to
      use as a constructor parameter for Block objects, as it owns any block buffers
      handed to it.
      fb6456b0
  30. 17 9月, 2014 1 次提交
  31. 09 9月, 2014 1 次提交
    • F
      Implement full filter for block based table. · 0af157f9
      Feng Zhu 提交于
      Summary:
      1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file.
      
      2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter.
      
      3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type.
      
      4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h.
      
      5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc
      
      Benchmark: base commit 1d23b5c4
      Command:
      db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1
      Read QPS increase for about 30% from 2230002 to 2991411.
      
      Test Plan:
      make all check
      valgrind db_test
      db_stress --use_block_based_filter = 0
      ./auto_sanity_test.sh
      
      Reviewers: igor, yhchiang, ljin, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D20979
      0af157f9