1. 17 3月, 2020 1 次提交
    • S
      De-template block based table iterator (#6531) · d6690809
      sdong 提交于
      Summary:
      Right now block based table iterator is used as both of iterating data for block based table, and for the index iterator for partitioend index. This was initially convenient for introducing a new iterator and block type for new index format, while reducing code change. However, these two usage doesn't go with each other very well. For example, Prev() is never called for partitioned index iterator, and some other complexity is maintained in block based iterators, which is not needed for index iterator but maintainers will always need to reason about it. Furthermore, the template usage is not following Google C++ Style which we are following, and makes a large chunk of code tangled together. This commit separate the two iterators. Right now, here is what it is done:
      1. Copy the block based iterator code into partitioned index iterator, and de-template them.
      2. Remove some code not needed for partitioned index. The upper bound check and tricks are removed. We never tested performance for those tricks when partitioned index is enabled in the first place. It's unlikelyl to generate performance regression, as creating new partitioned index block is much rarer than data blocks.
      3. Separate out the prefetch logic to a helper class and both classes call them.
      
      This commit will enable future follow-ups. One direction is that we might separate index iterator interface for data blocks and index blocks, as they are quite different.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6531
      
      Test Plan: build using make and cmake. And build release
      
      Differential Revision: D20473108
      
      fbshipit-source-id: e48011783b339a4257c204cc07507b171b834b0f
      d6690809
  2. 13 3月, 2020 1 次提交
    • S
      Divide block_based_table_reader.cc (#6527) · 674cf417
      sdong 提交于
      Summary:
      block_based_table_reader.cc is a giant file, which makes it hard for users to navigate the code. Divide the files to multiple files.
      Some class templates cannot be moved to .cc file. They are moved to .h files. It is still better than including them all in block_based_table_reader.cc.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6527
      
      Test Plan: "make all check" and "make release". Also build using cmake.
      
      Differential Revision: D20428455
      
      fbshipit-source-id: ca713c698469f07f35bc0c271358c0874ed4eb28
      674cf417
  3. 07 3月, 2020 1 次提交
    • Y
      Iterator with timestamp (#6255) · d93812c9
      Yanqin Jin 提交于
      Summary:
      Preliminary support for iterator with user timestamp. Current implementation does not consider merge operator and reverse iterator. Auto compaction is also disabled in unit tests.
      
      Create an iterator with timestamp.
      ```
      ...
      read_opts.timestamp = &ts;
      auto* iter = db->NewIterator(read_opts);
      // target is key without timestamp.
      for (iter->Seek(target); iter->Valid(); iter->Next()) {}
      for (iter->SeekToFirst(); iter->Valid(); iter->Next()) {}
      delete iter;
      read_opts.timestamp = &ts1;
      // lower_bound and upper_bound are without timestamp.
      read_opts.iterate_lower_bound = &lower_bound;
      read_opts.iterate_upper_bound = &upper_bound;
      auto* iter1 = db->NewIterator(read_opts);
      // Do Seek or SeekToFirst()
      delete iter1;
      ```
      
      Test plan (dev server)
      ```
      $make check
      ```
      
      Simple benchmarking (dev server)
      1. The overhead introduced by this PR even when timestamp is disabled.
      key size: 16 bytes
      value size: 100 bytes
      Entries: 1000000
      Data reside in main memory, and try to stress iterator.
      Repeated three times on master and this PR.
      - Seek without next
      ```
      ./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3
      ```
      master: 159047.0 ops/sec
      this PR: 158922.3 ops/sec (2% drop in throughput)
      - Seek and next 10 times
      ```
      ./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3 -seek_nexts=10
      ```
      master: 109539.3 ops/sec
      this PR: 107519.7 ops/sec (2% drop in throughput)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6255
      
      Differential Revision: D19438227
      
      Pulled By: riversand963
      
      fbshipit-source-id: b66b4979486f8474619f4aa6bdd88598870b0746
      d93812c9
  4. 26 2月, 2020 1 次提交
    • A
      Fix range deletion tombstone ingestion with global seqno (#6429) · 69679e73
      Andrew Kryczka 提交于
      Summary:
      Original author: jeffrey-xiao
      
      If we are writing a global seqno for an ingested file, the range
      tombstone metablock gets accessed and put into the cache during
      ingestion preparation. At the time, the global seqno of the ingested
      file has not yet been determined, so the cached block will not have a
      global seqno. When the file is ingested and we read its range tombstone
      metablock, it will be returned from the cache with no global seqno. In
      that case, we use the actual seqnos stored in the range tombstones,
      which are all zero, so the tombstones cover nothing.
      
      This commit removes global_seqno_ variable from Block. When iterating
      over a block, the global seqno for the block is determined by the
      iterator instead of storing this mutable attribute in Block.
      Additionally, this commit adds a regression test to check that keys are
      deleted when ingesting a file with a global seqno and range deletion
      tombstones.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6429
      
      Differential Revision: D19961563
      
      Pulled By: ajkr
      
      fbshipit-source-id: 5cf777397fa3e452401f0bf0364b0750492487b7
      69679e73
  5. 22 2月, 2020 1 次提交
  6. 21 2月, 2020 1 次提交
    • S
      Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) · fdf882de
      sdong 提交于
      Summary:
      When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
      
      Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
      
      Differential Revision: D19977691
      
      fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
      fdf882de
  7. 12 2月, 2020 1 次提交
  8. 11 2月, 2020 1 次提交
  9. 08 2月, 2020 1 次提交
    • A
      Check KeyContext status in MultiGet (#6387) · d70011bc
      anand76 提交于
      Summary:
      Currently, any IO errors and checksum mismatches while reading data
      blocks, are being ignored by the batched MultiGet. Its only looking at
      the GetContext state. Fix that.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6387
      
      Test Plan: Add unit tests
      
      Differential Revision: D19799819
      
      Pulled By: anand1976
      
      fbshipit-source-id: 46133dccbb04e64067b9fe6cda73e282203db969
      d70011bc
  10. 29 1月, 2020 1 次提交
    • S
      Add ReadOptions.auto_prefix_mode (#6314) · 8f2bee67
      sdong 提交于
      Summary:
      Add a new option ReadOptions.auto_prefix_mode. When set to true, iterator should return the same result as total order seek, but may choose to do prefix seek internally, based on iterator upper bounds. Also fix two previous bugs when handling prefix extrator changes: (1) reverse iterator should not rely on upper bound to determine prefix. Fix it with skipping prefix check. (2) block-based filter is not handled properly.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6314
      
      Test Plan: (1) add a unit test; (2) add the check to stress test and run see whether it can pass at least one run.
      
      Differential Revision: D19458717
      
      fbshipit-source-id: 51c1bcc5cdd826c2469af201979a39600e779bce
      8f2bee67
  11. 28 1月, 2020 1 次提交
  12. 17 1月, 2020 1 次提交
    • S
      Fix another bug caused by recent hash index fix (#6305) · d87cffae
      sdong 提交于
      Summary:
      Recent bug fix related to hash index introduced a new bug: hash index can return NotFound but it is not handled by BlockBasedTable::Get(). The end result is that Get() stops being executed too early. Fix it by ignoring NotFound code in Get().
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6305
      
      Test Plan: A problematic DB used to return NotFound incorrectly, and now able to return correct result. Will try to construct a unit test too.0
      
      Differential Revision: D19438925
      
      fbshipit-source-id: e751afa8c13728d56511cfeb1bc811ecb99f3217
      d87cffae
  13. 16 1月, 2020 1 次提交
    • S
      Fix kHashSearch bug with SeekForPrev (#6297) · d2b4d42d
      sdong 提交于
      Summary:
      When prefix is enabled the expected behavior when the prefix of the target does not exist is for Seek is to seek to any key larger than target and SeekToPrev to any key less than the target.
      Currently. the prefix index (kHashSearch) returns OK status but sets Invalid() to indicate two cases: a prefix of the searched key does not exist, ii) the key is beyond the range of the keys in SST file. The SeekForPrev implementation in BlockBasedTable thus does not have enough information to know when it should set the index key to first (to return a key smaller than target). The patch fixes that by returning NotFound status for cases that the prefix does not exist. SeekForPrev in BlockBasedTable accordingly SeekToFirst instead of SeekToLast on the index iterator.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6297
      
      Test Plan: SeekForPrev of non-exsiting prefix is added to block_test.cc, and a test case is added in db_test2, which fails without the fix.
      
      Differential Revision: D19404695
      
      fbshipit-source-id: cafbbf95f8f60ff9ede9ccc99d25bfa1cf6fcdc3
      d2b4d42d
  14. 20 12月, 2019 1 次提交
  15. 19 12月, 2019 1 次提交
    • S
      Prevent file prefetch when mmap is enabled. (#6206) · 02193ce4
      sdong 提交于
      Summary:
      Right now, sometimes file prefetching is still on when mmap is enabled. This causes bug of reading wrong data. In this commit, we remove all those possible paths.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6206
      
      Test Plan: make crash_test with compaction_readahead_size, which used to fail. RUn all existing tests.
      
      Differential Revision: D19149429
      
      fbshipit-source-id: 9e18ea8c566e416aac9647bdd05afe596634791b
      02193ce4
  16. 18 12月, 2019 1 次提交
  17. 17 12月, 2019 1 次提交
    • Z
      Merge adjacent file block reads in RocksDB MultiGet() and Add uncompressed block to cache (#6089) · cddd6379
      Zhichao Cao 提交于
      Summary:
      In the current MultiGet, if the KV-pairs do not belong to the data blocks in the block cache, multiple blocks are read from a SST. It will trigger one block read for each block request and read them in parallel. In some cases, if some data blocks are adjacent in the SST, the reads for these blocks can be combined to a single large read, which can reduce the system calls and reduce the read latency if possible.
      
      Considering to fill the block cache, if multiple data blocks are in the same memory buffer, we need to copy them to the heap separately. Therefore, only in the case that 1) data block compression is enabled, and 2) compressed block cache is null, we can do combined read. Otherwise, extra memory copy is needed, which may cause extra overhead. In the current case, data blocks will be uncompressed to a new memory space.
      
      Also, in the case that 1) data block compression is enabled, and 2) compressed block cache is null, it is possible the data block is actually not compressed. In the current logic, these data blocks will not be added to the uncompressed_cache. So if memory buffer is shared and the data block is not compressed, the data block are copied to the head and fill the cache.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6089
      
      Test Plan: Added test case to ParallelIO.MultiGet. Pass make asan_check
      
      Differential Revision: D18734668
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 67c5615ed373e51e42635fd74b36f8f3a66d5da4
      cddd6379
  18. 14 12月, 2019 1 次提交
    • A
      Introduce a new storage specific Env API (#5761) · afa2420c
      anand76 提交于
      Summary:
      The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc.
      
      This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO.
      
      The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before.
      
      This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection.
      
      The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761
      
      Differential Revision: D18868376
      
      Pulled By: anand1976
      
      fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
      afa2420c
  19. 12 11月, 2019 1 次提交
    • A
      Fix a buffer overrun problem in BlockBasedTable::MultiGet (#6014) · 03ce7fb2
      anand76 提交于
      Summary:
      The calculation in BlockBasedTable::MultiGet for the required buffer length for reading in compressed blocks is incorrect. It needs to take the 5-byte block trailer into account.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6014
      
      Test Plan: Add a unit test DBBasicTest.MultiGetBufferOverrun that fails in asan_check before the fix, and passes after.
      
      Differential Revision: D18412753
      
      Pulled By: anand1976
      
      fbshipit-source-id: 754dfb66be1d5f161a7efdf87be872198c7e3b72
      03ce7fb2
  20. 08 11月, 2019 1 次提交
  21. 06 11月, 2019 1 次提交
  22. 22 10月, 2019 1 次提交
    • S
      Fix VerifyChecksum readahead with mmap mode (#5945) · 30e2dc02
      sdong 提交于
      Summary:
      A recent change introduced readahead inside VerifyChecksum(). However it is not compatible with mmap mode and generated wrong checksum verification failure. Fix it by not enabling readahead in mmap
       mode.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5945
      
      Test Plan: Add a unit test that used to fail.
      
      Differential Revision: D18021443
      
      fbshipit-source-id: 6f2eb600f81b26edb02222563a4006869d576bff
      30e2dc02
  23. 19 10月, 2019 1 次提交
    • L
      Store the filter bits reader alongside the filter block contents (#5936) · 29ccf207
      Levi Tamasi 提交于
      Summary:
      Amongst other things, PR https://github.com/facebook/rocksdb/issues/5504 refactored the filter block readers so that
      only the filter block contents are stored in the block cache (as opposed to the
      earlier design where the cache stored the filter block reader itself, leading to
      potentially dangling pointers and concurrency bugs). However, this change
      introduced a performance hit since with the new code, the metadata fields are
      re-parsed upon every access. This patch reunites the block contents with the
      filter bits reader to eliminate this overhead; since this is still a self-contained
      pure data object, it is safe to store it in the cache. (Note: this is similar to how
      the zstd digest is handled.)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5936
      
      Test Plan:
      make asan_check
      
      filter_bench results for the old code:
      
      ```
      $ ./filter_bench -quick
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      Building...
      Build avg ns/key: 26.7153
      Number of filters: 16669
      Total memory (MB): 200.009
      Bits/key actual: 10.0647
      ----------------------------
      Inside queries...
        Dry run (46b) ns/op: 33.4258
        Single filter ns/op: 42.5974
        Random filter ns/op: 217.861
      ----------------------------
      Outside queries...
        Dry run (25d) ns/op: 32.4217
        Single filter ns/op: 50.9855
        Random filter ns/op: 219.167
          Average FP rate %: 1.13993
      ----------------------------
      Done. (For more info, run with -legend or -help.)
      
      $ ./filter_bench -quick -use_full_block_reader
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      Building...
      Build avg ns/key: 26.5172
      Number of filters: 16669
      Total memory (MB): 200.009
      Bits/key actual: 10.0647
      ----------------------------
      Inside queries...
        Dry run (46b) ns/op: 32.3556
        Single filter ns/op: 83.2239
        Random filter ns/op: 370.676
      ----------------------------
      Outside queries...
        Dry run (25d) ns/op: 32.2265
        Single filter ns/op: 93.5651
        Random filter ns/op: 408.393
          Average FP rate %: 1.13993
      ----------------------------
      Done. (For more info, run with -legend or -help.)
      ```
      
      With the new code:
      
      ```
      $ ./filter_bench -quick
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      Building...
      Build avg ns/key: 25.4285
      Number of filters: 16669
      Total memory (MB): 200.009
      Bits/key actual: 10.0647
      ----------------------------
      Inside queries...
        Dry run (46b) ns/op: 31.0594
        Single filter ns/op: 43.8974
        Random filter ns/op: 226.075
      ----------------------------
      Outside queries...
        Dry run (25d) ns/op: 31.0295
        Single filter ns/op: 50.3824
        Random filter ns/op: 226.805
          Average FP rate %: 1.13993
      ----------------------------
      Done. (For more info, run with -legend or -help.)
      
      $ ./filter_bench -quick -use_full_block_reader
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      Building...
      Build avg ns/key: 26.5308
      Number of filters: 16669
      Total memory (MB): 200.009
      Bits/key actual: 10.0647
      ----------------------------
      Inside queries...
        Dry run (46b) ns/op: 33.2968
        Single filter ns/op: 58.6163
        Random filter ns/op: 291.434
      ----------------------------
      Outside queries...
        Dry run (25d) ns/op: 32.1839
        Single filter ns/op: 66.9039
        Random filter ns/op: 292.828
          Average FP rate %: 1.13993
      ----------------------------
      Done. (For more info, run with -legend or -help.)
      ```
      
      Differential Revision: D17991712
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 7ea205550217bfaaa1d5158ebd658e5832e60f29
      29ccf207
  24. 12 10月, 2019 1 次提交
    • A
      Fix block cache ID uniqueness for Windows builds (#5844) · b00761ee
      Andrew Kryczka 提交于
      Summary:
      Since we do not evict a file's blocks from block cache before that file
      is deleted, we require a file's cache ID prefix is both unique and
      non-reusable. However, the Windows functionality we were relying on only
      guaranteed uniqueness. That meant a newly created file could be assigned
      the same cache ID prefix as a deleted file. If the newly created file
      had block offsets matching the deleted file, full cache keys could be
      exactly the same, resulting in obsolete data blocks returned from cache
      when trying to read from the new file.
      
      We noticed this when running on FAT32 where compaction was writing out
      of order keys due to reading obsolete blocks from its input files. The
      functionality is documented as behaving the same on NTFS, although I
      wasn't able to repro it there.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5844
      
      Test Plan:
      we had a reliable repro of out-of-order keys on FAT32 that
      was fixed by this change
      
      Differential Revision: D17752442
      
      fbshipit-source-id: 95d983f9196cf415f269e19293b97341edbf7e00
      b00761ee
  25. 04 10月, 2019 1 次提交
    • A
      Fix data block upper bound checking for iterator reseek case (#5883) · 19a97dd1
      anand76 提交于
      Summary:
      When an iterator reseek happens with the user specifying a new iterate_upper_bound in ReadOptions, and the new seek position is at the end of the same data block, the Seek() ends up using a stale value of data_block_within_upper_bound_ and may return incorrect results.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5883
      
      Test Plan: Added a new test case DBIteratorTest.IterReseekNewUpperBound. Verified that it failed due to the assertion failure without the fix, and passes with the fix.
      
      Differential Revision: D17752740
      
      Pulled By: anand1976
      
      fbshipit-source-id: f9b635ff5d6aeb0e1bef102cf8b2f900efd378e3
      19a97dd1
  26. 21 9月, 2019 1 次提交
  27. 17 9月, 2019 1 次提交
    • S
      Divide file_reader_writer.h and .cc (#5803) · b931f84e
      sdong 提交于
      Summary:
      file_reader_writer.h and .cc contain several files and helper function, and it's hard to navigate. Separate it to multiple files and put them under file/
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5803
      
      Test Plan: Build whole project using make and cmake.
      
      Differential Revision: D17374550
      
      fbshipit-source-id: 10efca907721e7a78ed25bbf74dc5410dea05987
      b931f84e
  28. 23 8月, 2019 1 次提交
    • L
      Revert to storing UncompressionDicts in the cache (#5645) · df8c307d
      Levi Tamasi 提交于
      Summary:
      PR https://github.com/facebook/rocksdb/issues/5584 decoupled the uncompression dictionary object from the underlying block data; however, this defeats the purpose of the digested ZSTD dictionary, since the whole point
      of the digest is to create it once and reuse it over and over again. This patch goes back to
      storing the uncompression dictionary itself in the cache (which should be now safe to do,
      since it no longer includes a Statistics pointer), while preserving the rest of the refactoring.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5645
      
      Test Plan: make asan_check
      
      Differential Revision: D16551864
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 2a7e2d34bb16e70e3c816506d5afe1d842057800
      df8c307d
  29. 22 8月, 2019 2 次提交
    • M
      Refactor MultiGet names in BlockBasedTable (#5726) · 244e6f20
      Maysam Yabandeh 提交于
      Summary:
      To improve code readability, since RetrieveBlock already calls MaybeReadBlockAndLoadToCache, we avoid name similarity of the functions that call RetrieveBlock with MaybeReadBlockAndLoadToCache. The patch thus renames MaybeLoadBlocksToCache to RetrieveMultipleBlock and deletes GetDataBlockFromCache, which contains only two lines.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5726
      
      Differential Revision: D16962535
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 99e8946808ce4eb7857592b9003812e3004f92d6
      244e6f20
    • A
      Fix MultiGet() bug when whole_key_filtering is disabled (#5665) · 9046bdc5
      anand76 提交于
      Summary:
      The batched MultiGet() implementation was not correctly handling bloom filter lookups when whole_key_filtering is disabled. It was incorrectly skipping keys not in the prefix_extractor domain, and not calling transform for keys in domain. This PR fixes both problems by moving the domain check and transformation to the FilterBlockReader.
      
      Tests:
      Unit test (confirmed failed before the fix)
      make check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5665
      
      Differential Revision: D16902380
      
      Pulled By: anand1976
      
      fbshipit-source-id: a6be81ad68a6e37134a65246aec7a2c590eccf00
      9046bdc5
  30. 17 8月, 2019 3 次提交
  31. 15 8月, 2019 1 次提交
    • L
      Fix regression affecting partitioned indexes/filters when... · d92a59b6
      Levi Tamasi 提交于
      Fix regression affecting partitioned indexes/filters when cache_index_and_filter_blocks is false (#5705)
      
      Summary:
      PR https://github.com/facebook/rocksdb/issues/5298 (and subsequent related patches) unintentionally changed the
      semantics of cache_index_and_filter_blocks: historically, this option
      only affected the main index/filter block; with the changes, it affects
      index/filter partitions as well. This can cause performance issues when
      cache_index_and_filter_blocks is false since in this case, partitions are
      neither cached nor preloaded (i.e. they are loaded on demand upon each
      access). The patch reverts to the earlier behavior, that is, partitions
      are cached similarly to data blocks regardless of the value of the above
      option.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5705
      
      Test Plan:
      make check
      ./db_bench -benchmarks=fillrandom --statistics --stats_interval_seconds=1 --duration=30 --num=500000000 --bloom_bits=20 --partition_index_and_filters=true --cache_index_and_filter_blocks=false
      ./db_bench -benchmarks=readrandom --use_existing_db --statistics --stats_interval_seconds=1 --duration=10 --num=500000000 --bloom_bits=20 --partition_index_and_filters=true --cache_index_and_filter_blocks=false --cache_size=8000000000
      
      Relevant statistics from the readrandom benchmark with the old code:
      
      rocksdb.block.cache.index.miss COUNT : 0
      rocksdb.block.cache.index.hit COUNT : 0
      rocksdb.block.cache.index.add COUNT : 0
      rocksdb.block.cache.index.bytes.insert COUNT : 0
      rocksdb.block.cache.index.bytes.evict COUNT : 0
      rocksdb.block.cache.filter.miss COUNT : 0
      rocksdb.block.cache.filter.hit COUNT : 0
      rocksdb.block.cache.filter.add COUNT : 0
      rocksdb.block.cache.filter.bytes.insert COUNT : 0
      rocksdb.block.cache.filter.bytes.evict COUNT : 0
      
      With the new code:
      
      rocksdb.block.cache.index.miss COUNT : 2500
      rocksdb.block.cache.index.hit COUNT : 42696
      rocksdb.block.cache.index.add COUNT : 2500
      rocksdb.block.cache.index.bytes.insert COUNT : 4050048
      rocksdb.block.cache.index.bytes.evict COUNT : 0
      rocksdb.block.cache.filter.miss COUNT : 2500
      rocksdb.block.cache.filter.hit COUNT : 4550493
      rocksdb.block.cache.filter.add COUNT : 2500
      rocksdb.block.cache.filter.bytes.insert COUNT : 10331040
      rocksdb.block.cache.filter.bytes.evict COUNT : 0
      
      Differential Revision: D16817382
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 28a516b0da1f041a03313e0b70b28cf5cf205d00
      d92a59b6
  32. 24 7月, 2019 2 次提交
    • L
      Move the uncompression dictionary object out of the block cache (#5584) · 092f4170
      Levi Tamasi 提交于
      Summary:
      RocksDB has historically stored uncompression dictionary objects in the block
      cache as opposed to storing just the block contents. This neccesitated
      evicting the object upon table close. With the new code, only the raw blocks
      are stored in the cache, eliminating the need for eviction.
      
      In addition, the patch makes the following improvements:
      
      1) Compression dictionary blocks are now prefetched/pinned similarly to
      index/filter blocks.
      2) A copy operation got eliminated when the uncompression dictionary is
      retrieved.
      3) Errors related to retrieving the uncompression dictionary are propagated as
      opposed to silently ignored.
      
      Note: the patch temporarily breaks the compression dictionary evicition stats.
      They will be fixed in a separate phase.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5584
      
      Test Plan: make asan_check
      
      Differential Revision: D16344151
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 2962b295f5b19628f9da88a3fcebbce5a5017a7b
      092f4170
    • E
      Improve CPU Efficiency of ApproximateSize (part 1) (#5613) · 6b7fcc0d
      Eli Pozniansky 提交于
      Summary:
      1. Avoid creating the iterator in order to call BlockBasedTable::ApproximateOffsetOf(). Instead, directly call into it.
      2. Optimize BlockBasedTable::ApproximateOffsetOf() keeps the index block iterator in stack.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5613
      
      Differential Revision: D16442660
      
      Pulled By: elipoz
      
      fbshipit-source-id: 9320be3e918c139b10e758cbbb684706d172e516
      6b7fcc0d
  33. 18 7月, 2019 1 次提交
  34. 17 7月, 2019 1 次提交
    • L
      Move the filter readers out of the block cache (#5504) · 3bde41b5
      Levi Tamasi 提交于
      Summary:
      Currently, when the block cache is used for the filter block, it is not
      really the block itself that is stored in the cache but a FilterBlockReader
      object. Since this object is not pure data (it has, for instance, pointers that
      might dangle, including in one case a back pointer to the TableReader), it's not
      really sharable. To avoid the issues around this, the current code erases the
      cache entries when the TableReader is closed (which, BTW, is not sufficient
      since a concurrent TableReader might have picked up the object in the meantime).
      Instead of doing this, the patch moves the FilterBlockReader out of the cache
      altogether, and decouples the filter reader object from the filter block.
      In particular, instead of the TableReader owning, or caching/pinning the
      FilterBlockReader (based on the customer's settings), with the change the
      TableReader unconditionally owns the FilterBlockReader, which in turn
      owns/caches/pins the filter block. This change also enables us to reuse the code
      paths historically used for data blocks for filters as well.
      
      Note:
      Eviction statistics for filter blocks are temporarily broken. We plan to fix this in a
      separate phase.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5504
      
      Test Plan: make asan_check
      
      Differential Revision: D16036974
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 770f543c5fb4ed126fd1e04bfd3809cf4ff9c091
      3bde41b5
  35. 08 7月, 2019 1 次提交
    • H
      Fix -Werror=shadow (#5546) · 6ca3feed
      haoyuhuang 提交于
      Summary:
      This PR fixes shadow errors.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5546
      
      Test Plan: make clean && make check -j32 && make clean && USE_CLANG=1 make check -j32 && make clean && COMPILE_WITH_ASAN=1 make check -j32
      
      Differential Revision: D16147841
      
      Pulled By: HaoyuHuang
      
      fbshipit-source-id: 1043500d70c134185f537ab4c3900452752f1534
      6ca3feed
  36. 06 7月, 2019 1 次提交
    • S
      Assert get_context not null in BlockBasedTable::Get() (#5542) · 2de61d91
      sdong 提交于
      Summary:
      clang analyze fails after https://github.com/facebook/rocksdb/pull/5514 for this failure:
      table/block_based/block_based_table_reader.cc:3450:16: warning: Called C++ object pointer is null
                if (!get_context->SaveValue(
                     ^~~~~~~~~~~~~~~~~~~~~~~
      1 warning generated.
      
      The reaon is that a branching is added earlier in the function on get_context is null or not, CLANG analyze thinks that it can be null and we make the function call withou the null checking.
      Fix the issue by removing the branch and add an assert.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5542
      
      Test Plan: "make all check" passes and CLANG analyze failure goes away.
      
      Differential Revision: D16133988
      
      fbshipit-source-id: d4627d03c4746254cc11926c523931086ccebcda
      2de61d91