1. 25 4月, 2019 3 次提交
    • Z
      update history.md (#5245) · 66d8360b
      Zhongyi Xie 提交于
      Summary:
      update history.md for `BottommostLevelCompaction::kForceOptimized` to mention possible user impact.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5245
      
      Differential Revision: D15073712
      
      Pulled By: miasantreble
      
      fbshipit-source-id: d40f698c42e8a6368be4eac0a00d02279615edea
      66d8360b
    • M
      Don't call FindObsoleteFiles() in ~ColumnFamilyHandleImpl() if CF is not dropped (#5238) · cd77d3c5
      Mike Kolupaev 提交于
      Summary:
      We have a DB with ~4k column families and ~70k files. On shutdown, destroying the 4k ColumnFamilyHandle-s takes over 2 minutes. Most of this time is spent in VersionSet::AddLiveFiles() called from FindObsoleteFiles() from ~ColumnFamilyHandleImpl(). It's just iterating over the list of files in memory. This seems completely unnecessary as no obsolete files are actually found since the CFs are not even dropped. This PR fixes that.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5238
      
      Differential Revision: D15056342
      
      Pulled By: siying
      
      fbshipit-source-id: 2aa342ef3770b4aa384ce81f8768e485480e4f08
      cd77d3c5
    • Z
      secondary instance: add support for WAL tailing on `OpenAsSecondary` · aa56b7e7
      Zhongyi Xie 提交于
      Summary: PR https://github.com/facebook/rocksdb/pull/4899 implemented the general framework for RocksDB secondary instances. This PR adds the support for WAL tailing in `OpenAsSecondary`, which means after the `OpenAsSecondary` call, the secondary is now able to see primary's writes that are yet to be flushed. The secondary can see primary's writes in the WAL up to the moment of `OpenAsSecondary` call starts.
      
      Differential Revision: D15059905
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 44f71f548a30b38179a7940165e138f622de1f10
      aa56b7e7
  2. 24 4月, 2019 4 次提交
  3. 23 4月, 2019 3 次提交
    • Y
      Fix compilation errors for 32bits/LITE/ios build. (#5220) · 78a6e07c
      Yuchi Chen 提交于
      Summary:
      When I build RocksDB for 32bits/LITE/iOS environment, some errors like the following.
      
      `
      table/block_based_table_reader.cc:971:44: error: implicit conversion loses integer precision: 'uint64_t'
            (aka 'unsigned long long') to 'size_t' (aka 'unsigned long') [-Werror,-Wshorten-64-to-32]
          size_t block_size = props_block_handle.size();
                 ~~~~~~~~~~   ~~~~~~~~~~~~~~~~~~~^~~~~~
      
      ./util/file_reader_writer.h:177:8: error: private field 'env_' is not used [-Werror,-Wunused-private-field]
        Env* env_;
             ^
      `
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5220
      
      Differential Revision: D15023481
      
      Pulled By: siying
      
      fbshipit-source-id: 1b5d121d3016f2b0a8a9a2cc1bd638479357f9f7
      78a6e07c
    • S
      Log file_creation_time table property (#5232) · 47fd5748
      Sagar Vemuri 提交于
      Summary:
      Log file_creation_time table property when a new table file is created.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5232
      
      Differential Revision: D15033069
      
      Pulled By: sagar0
      
      fbshipit-source-id: aaac56a4c03a8f96c338cad1b0cdb7fbfb887647
      47fd5748
    • A
      Optionally wait on bytes_per_sync to smooth I/O (#5183) · 8272a6de
      Andrew Kryczka 提交于
      Summary:
      The existing implementation does not guarantee bytes reach disk every `bytes_per_sync` when writing SST files, or every `wal_bytes_per_sync` when writing WALs. This can cause confusing behavior for users who enable this feature to avoid large syncs during flush and compaction, but then end up hitting them anyways.
      
      My understanding of the existing behavior is we used `sync_file_range` with `SYNC_FILE_RANGE_WRITE` to submit ranges for async writeback, such that we could continue processing the next range of bytes while that I/O is happening. I believe we can preserve that benefit while also limiting how far the processing can get ahead of the I/O, which prevents huge syncs from happening when the file finishes.
      
      Consider this `sync_file_range` usage: `sync_file_range(fd_, 0, static_cast<off_t>(offset + nbytes), SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE)`. Expanding the range to start at 0 and adding the `SYNC_FILE_RANGE_WAIT_BEFORE` flag causes any pending writeback (like from a previous call to `sync_file_range`) to finish before it proceeds to submit the latest `nbytes` for writeback. The latest `nbytes` are still written back asynchronously, unless processing exceeds I/O speed, in which case the following `sync_file_range` will need to wait on it.
      
      There is a second change in this PR to use `fdatasync` when `sync_file_range` is unavailable (determined statically) or has some known problem with the underlying filesystem (determined dynamically).
      
      The above two changes only apply when the user enables a new option, `strict_bytes_per_sync`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5183
      
      Differential Revision: D14953553
      
      Pulled By: siying
      
      fbshipit-source-id: 445c3862e019fb7b470f9c7f314fc231b62706e9
      8272a6de
  4. 22 4月, 2019 1 次提交
    • M
      Add BlockBasedTableOptions::index_shortening (#5174) · df38c1ce
      Mike Kolupaev 提交于
      Summary:
      Introduce BlockBasedTableOptions::index_shortening to give users control on which key shortening techniques to be used in building index blocks. Before this patch, both separators and successor keys where shortened in indexes. With this patch, the default is set to kShortenSeparators to only shorten the separators. Since each index block has many separators and only one successor (last key), the change should not have negative impact on index block size. However it should prevent many unnecessary block loads where due to approximation introduced by shorted successor, seek would land us to the previous block and then fix it by moving to the next one.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5174
      
      Differential Revision: D14884185
      
      Pulled By: al13n321
      
      fbshipit-source-id: 1b08bc8c03edcf09b6b8c16e9a7eea08ad4dd534
      df38c1ce
  5. 20 4月, 2019 5 次提交
    • J
      refactor SavePoints (#5192) · de769094
      jsteemann 提交于
      Summary:
      Savepoints are assumed to be used in a stack-wise fashion (only
      the top element should be used), so they were stored by `WriteBatch`
      in a member variable `save_points` using an std::stack.
      
      Conceptually this is fine, but the implementation had a few issues:
      - the `save_points_` instance variable was a plain pointer to a heap-
        allocated `SavePoints` struct. The destructor of `WriteBatch` simply
        deletes this pointer. However, the copy constructor of WriteBatch
        just copied that pointer, meaning that copying a WriteBatch with
        active savepoints will very likely have crashed before. Now a proper
        copy of the savepoints is made in the copy constructor, and not just
        a copy of the pointer
      - `save_points_` was an std::stack, which defaults to `std::deque` for
        the underlying container. A deque is a bit over the top here, as we
        only need access to the most recent savepoint (i.e. stack.top()) but
        never any elements at the front. std::deque is rather expensive to
        initialize in common environments. For example, the STL implementation
        shipped with GNU g++ will perform a heap allocation of more than 500
        bytes to create an empty deque object. Although the `save_points_`
        container is created lazily by RocksDB, moving from a deque to a plain
        `std::vector` is much more memory-efficient. So `save_points_` is now
        a vector.
      - `save_points_` was changed from a plain pointer to an `std::unique_ptr`,
        making ownership more explicit.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5192
      
      Differential Revision: D15024074
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 5b128786d3789cde94e46465c9e91badd07a25d7
      de769094
    • S
      Fix history to not include some features in 6.1 (#5224) · dc64c2f5
      Sagar Vemuri 提交于
      Summary:
      Fix HISTORY.md by removing a few items from 6.1.1 history as they did not make into the 6.1.fb branch.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5224
      
      Differential Revision: D15017030
      
      Pulled By: sagar0
      
      fbshipit-source-id: 090724d326d29168952e06dc1a5090c03fdd739e
      dc64c2f5
    • Y
      Force read existing data during db repair (#5209) · c77aab58
      Yanqin Jin 提交于
      Summary:
      Setting read_opts.total_order_seek achieves this, even with a different prefix
      extractor.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5209
      
      Differential Revision: D14980388
      
      Pulled By: riversand963
      
      fbshipit-source-id: 16527989a3d6b3e3ae8241c894d011326429d66e
      c77aab58
    • A
      Remove a couple of non-public includes from public header file (#5219) · 5265c570
      anand76 提交于
      Summary:
      Cleanup a couple of stray includes left by #5011.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5219
      
      Differential Revision: D15007244
      
      Pulled By: anand1976
      
      fbshipit-source-id: 15ca1d4f977b5b60e99df3bfb8fc3db217d19bdd
      5265c570
    • S
      Add some "inline" annotation to DBIter functions (#5217) · 7a73adda
      Siying Dong 提交于
      Summary:
      My compiler doesn't inline DBIter::Next() to arena wrapped iterator, even if it is a direct forward. Adding this annotation makes it inlined. It might not always work but inlinging this function to arena wrapped iterator always feels like the right decision.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5217
      
      Differential Revision: D15004086
      
      Pulled By: siying
      
      fbshipit-source-id: a4cffd79c6fb092669a3a90633c9aa5e494f8a66
      7a73adda
  6. 19 4月, 2019 7 次提交
    • S
      Use creation_time or mtime when file_creation_time=0 (#5184) · efa94874
      Sagar Vemuri 提交于
      Summary:
      We found an issue in Periodic Compactions (introduced in #5166) where files were not being picked up for compactions as all the SST files created with older versions of RocksDB have `file_creation_time` as 0. (Note that `file_creation_time` is a new table property introduced in #5166).
      
      To address this, Periodic compactions now fall back to looking at the `creation_time` table property or the file's modification time (as given by the Env) when `file_creation_time` table property is found to be 0.
      
      Here how the file's modification time (and, in turn, the file age) is computed now:
      1. Use `file_creation_time` table property if it is > 0.
      1. If not, then use `creation_time` table property if it is > 0.
      1. If not, then use file's mtime stat metadata given by the underlying Env.
      Don't consider the file at all for compaction if the modification time cannot be correctly determined based on the above conditions.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5184
      
      Differential Revision: D14907795
      
      Pulled By: sagar0
      
      fbshipit-source-id: 4bb2f3631f9a3e04470c674a1d13544584e1e56c
      efa94874
    • Z
      reorganize history.md to list unreleased changes separately · 3bdce20e
      Zhongyi Xie 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5216
      
      Differential Revision: D15003749
      
      Pulled By: miasantreble
      
      fbshipit-source-id: a52c264e694cd7c55813be33ee22b4f3046b545a
      3bdce20e
    • S
      Make ReadRangeDelAggregator::ShouldDelete() more inline friendly (#5202) · d6862b3f
      Siying Dong 提交于
      Summary:
      Reorganize the code so that no function call into ReadRangeDelAggregator is needed if there is no tomb range stone.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5202
      
      Differential Revision: D14968155
      
      Pulled By: siying
      
      fbshipit-source-id: 0bd61911293c7a27b4e1b8d57c66d0c4ad6a6a5f
      d6862b3f
    • S
      Some small code changes to improve Next() (#5200) · 01cfea66
      Siying Dong 提交于
      Summary:
      Several small changes for Next():
      1. Reducing branching by always update local_stats_.next_count_++ even if statistics is null. This should be faster than a branching.
      2. Replacing ResetInternalKeysSkippedCounter() in Next() because the valid_ check is not needed in this case.
      3. iter_->Valid() should always be true for non merge case. Remove this check.
      4. Adding an inline annotation. It ends up with not picked up by my compiler, but it shouldn't hurt.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5200
      
      Differential Revision: D15000391
      
      Pulled By: siying
      
      fbshipit-source-id: be97f61c708968234fb8e5cf272b5c2ac07dc4dd
      01cfea66
    • S
      Introduce InternalIteratorBase::NextAndGetResult() (#5197) · 992dfc78
      Siying Dong 提交于
      Summary:
      In long scans, virtual function calls of Next(), Valid(), key() and value() are not trivial. By introducing NextAndGetResult(), Some of the Next(), Valid() and key() calls are consolidated into one virtual function call to reduce CPU.
      Also did some inline tricks and add some "final" randomly in some functions. Even without the "final" annotation, most Next() calls are inlined with -O3, but sometimes with a final it is inlined by O2 too. It doesn't hurt to add those final annotations.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5197
      
      Differential Revision: D14945977
      
      Pulled By: siying
      
      fbshipit-source-id: 7003969f9a5f1d5717f0bda503b91d19ba75ed88
      992dfc78
    • F
      Add copyright headers per FB open-source checkup tool. (#5199) · 6c2bf9e9
      Fosco Marotto 提交于
      Summary:
      internal task: T35568575
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5199
      
      Differential Revision: D14962794
      
      Pulled By: gfosco
      
      fbshipit-source-id: 93838ede6d0235eaecff90d200faed9a8515bbbe
      6c2bf9e9
    • Y
      Fix a bug in GetOverlappingInputsRangeBinarySearch (#5211) · 392f6d49
      Yanqin Jin 提交于
      Summary:
      As title.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5211
      
      Differential Revision: D14992018
      
      Pulled By: riversand963
      
      fbshipit-source-id: b5720ea4742029e2fb47ff6d9f8d9de006db4ed4
      392f6d49
  7. 18 4月, 2019 2 次提交
    • J
      VersionSet: optmize GetOverlappingInputsRangeBinarySearch (#4987) · 5b7e09bd
      JiYou 提交于
      Summary:
      `GetOverlappingInputsRangeBinarySearch` firstly use binary search
      to find a index in the given range `[begin, end]`. But after find
      the index, then use linear search to find the `start_index` and
      `end_index`. So the search process degraded to linear time.
      
      Here optmize the search process with below changes:
      
      - use `std::lower_bound` and `std::upper_bound` to get
        `lg(n)` search complexity.
      - use uniformed lambda for search process.
      - simplify process for `within_interval` true or false.
      - remove function `ExtendFileRangeWithinInterval`
        and `ExtendFileRangeOverlappingInterval`.
      Signed-off-by: NJiYou <jiyou09@gmail.com>
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4987
      
      Differential Revision: D14984192
      
      Pulled By: riversand963
      
      fbshipit-source-id: fae4b8e59a21b7e350718d60cdc94dd55ac81e89
      5b7e09bd
    • Z
      rename variable to avoid shadowing (#5204) · 248b6b55
      Zhongyi Xie 提交于
      Summary:
      this PR fixes the following compile warning:
      ```
      db/memtable.cc: In member function ‘virtual void rocksdb::MemTableIterator::Seek(const rocksdb::Slice&)’:
      db/memtable.cc:321:22: error: declaration of ‘user_key’ shadows a member of 'this' [-Werror=shadow]
             Slice user_key(ExtractUserKey(k));
                            ^
      db/memtable.cc: In member function ‘virtual void rocksdb::MemTableIterator::SeekForPrev(const rocksdb::Slice&)’:
      db/memtable.cc:338:22: error: declaration of ‘user_key’ shadows a member of 'this' [-Werror=shadow]
             Slice user_key(ExtractUserKey(k));
                            ^
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5204
      
      Differential Revision: D14970160
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 388eb089f90c4528cc6d615dd4607fb53ceac705
      248b6b55
  8. 17 4月, 2019 4 次提交
    • Z
      Avoid double-compacting data in bottom level in manual compactions (#5138) · baa53024
      Zhongyi Xie 提交于
      Summary:
      Depending on the config, manual compaction (leveled compaction style) does following compactions:
      L0->L1
      L1->L2
      ...
      Ln-1 -> Ln
      Ln -> Ln
      The final Ln -> Ln compaction is partly unnecessary as it recompacts all the files that were just generated by the Ln-1 -> Ln. We should avoid recompacting such files. This rule should be applied to Lmax only.
      Resolves issue https://github.com/facebook/rocksdb/issues/4995
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5138
      
      Differential Revision: D14940106
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 8d3cf5507a17e76f3333cfd4bac5256d005636e5
      baa53024
    • Y
      Add back NewEmptyIterator (#5203) · d9280ff2
      Yanqin Jin 提交于
      Summary:
      #4905 removed the implementation of `NewEmptyIterator` but kept its
      declaration in the public header. This breaks some systems that depend on
      RocksDB if the systems use `NewEmptyIterator`. Therefore, add it back to fix. cc maysamyabandeh please remind me if I miss anything here. Thanks
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5203
      
      Differential Revision: D14968382
      
      Pulled By: riversand963
      
      fbshipit-source-id: 5fb86e99c8cfaf9f7a9473cdb1355d7558ff6e01
      d9280ff2
    • S
      WriteBufferManager's dummy entry size to block cache 1MB -> 256KB (#5175) · beb44ec3
      Siying Dong 提交于
      Summary:
      Dummy cache size of 1MB is too large for small block sizes. Our GetDefaultCacheShardBits() use min_shard_size = 512L * 1024L to determine number of shards, so 1MB will excceeds the size of the whole shard and make the cache excceeds the budget.
      Change it to 256KB accordingly.
      There shouldn't be obvious performance impact, since inserting a cache entry every 256KB of memtable inserts is still infrequently enough.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5175
      
      Differential Revision: D14954289
      
      Pulled By: siying
      
      fbshipit-source-id: 2c275255c1ac3992174e06529e44c55538325c94
      beb44ec3
    • Y
      Avoid per-key upper bound check in BlockBasedTableIterator (#5142) · f1239d5f
      yiwu-arbug 提交于
      Summary:
      This is second attempt for #5101. Original commit message:
      `BlockBasedTableIterator` avoid reading next block on `Next()` if it detects the iterator will be out of bound, by checking against index key. The optimization was added in #2239, and by the time it only check the bound per block. It seems later change make it a per-key check, which introduce unnecessary key comparisons.
      
      This patch come with two fixes:
      
      Fix 1: To optimize checking for bounds, we need comparing the bounds with index key as well. However BlockBasedTableIterator doesn't know whether its index iterator is internally using user keys or internal keys. The patch fixes that by extending InternalIterator with a user_key() function that is overridden by In IndexBlockIter.
      
      Fix 2: In #5101 we return `IsOutOfBound()=true` when block index key is out of bound. But the index key can be larger than smallest key of the next file on the level. That file can be within upper bound and should not be filtered out.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5142
      
      Differential Revision: D14907113
      
      Pulled By: siying
      
      fbshipit-source-id: ac95775c5b4e7b700f76ab43e39f45402c98fbfb
      f1239d5f
  9. 16 4月, 2019 5 次提交
    • V
      Consolidating WAL creation which currently has duplicate logic in... · 71a82a0a
      Vijay Nadimpalli 提交于
      Consolidating WAL creation which currently has duplicate logic in db_impl_write.cc and db_impl_open.cc (#5188)
      
      Summary:
      Right now, two separate pieces of code are used to create WAL files in DBImpl::Open function of db_impl_open.cc and DBImpl::SwitchMemtable function of db_impl_write.cc. This code change simply creates 1 function called DBImpl::CreateWAL in db_impl_open.cc which is used to replace existing WAL creation logic in DBImpl::Open and DBImpl::SwitchMemtable.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5188
      
      Differential Revision: D14942832
      
      Pulled By: vjnadimpalli
      
      fbshipit-source-id: d49230e04c36176015c8c1b422575872f92157fb
      71a82a0a
    • Y
      Fix MultiGet ASSERT bug when passing unsorted result (#5195) · 3e63e553
      Yi Zhang 提交于
      Summary:
      Found this when test driving the new MultiGet. If you pass unsorted result with sorted_result = false you'll trigger the ASSERT incorrect even though we'll sort down below.
      
      I've also added simple test cover sorted_result=true/false scenario copied from MultiGetSimple.
      
      anand1976
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5195
      
      Differential Revision: D14935475
      
      Pulled By: yizhang82
      
      fbshipit-source-id: 1d2af5e3a003847d965066a16e3b19da68acf170
      3e63e553
    • Y
      db_bench: support seek to non-exist prefix (#5163) · b70967aa
      Yi Wu 提交于
      Summary:
      Add `--seek_missing_prefix` flag to db_bench to allow benchmarking seeking to non-existing prefix. Usage example:
      ```
      ./db_bench --db=/dev/shm/db_bench --use_existing_db=false --benchmarks=fillrandom --num=100000000 --prefix_size=9 --keys_per_prefix=10
      ./db_bench --db=/dev/shm/db_bench --use_existing_db=true --benchmarks=seekrandom --disable_auto_compactions=true --num=100000000 --prefix_size=9 --keys_per_prefix=10 --reads=1000 --prefix_same_as_start=true --seek_missing_prefix=true
      ```
      Also adding `--total_order_seek` and `--prefix_same_as_start` flags.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5163
      
      Differential Revision: D14935724
      
      Pulled By: riversand963
      
      fbshipit-source-id: 7c41023f007febe373eb1589861f215432a9e18a
      b70967aa
    • F
      Update history and version to 6.1.1 (#5171) · b5cad5c9
      Fosco Marotto 提交于
      Summary:
      Including latest fixes.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5171
      
      Differential Revision: D14875157
      
      Pulled By: gfosco
      
      fbshipit-source-id: 86ec7ee3553a9b25ab71ed98966ce08a16322e2c
      b5cad5c9
    • J
      Improve transaction lock details (#5193) · 8295d364
      jsteemann 提交于
      Summary:
      This branch contains two small improvements:
      * Create `LockMap` entries using `std::make_shared`. This saves one heap allocation per LockMap entry but also locates the control block and the LockMap object closely together in memory, which can help with caching
      * Reorder the members of `TrackedTrxInfo`, so that the resulting struct uses less memory (at least on 64bit systems)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5193
      
      Differential Revision: D14934536
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f7b49812bb4b6029eef9d131e7cd56260df5b28e
      8295d364
  10. 13 4月, 2019 6 次提交
    • A
      Add bounds check in FilePickerMultiGet::PrepareNextLevel() (#5189) · 29111e92
      anand76 提交于
      Summary:
      Add bounds check when looping through empty levels in FilePickerMultiGet
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5189
      
      Differential Revision: D14925334
      
      Pulled By: anand1976
      
      fbshipit-source-id: 65d53247cf443153e28ce2b8b753fa51c6ae4566
      29111e92
    • Y
      Fix crash with memtable prefix bloom and key out of prefix extractor domain (#5190) · cca141ec
      yiwu-arbug 提交于
      Summary:
      Before using prefix extractor `InDomain()` should be check. All uses in memtable.cc didn't check `InDomain()`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5190
      
      Differential Revision: D14923773
      
      Pulled By: miasantreble
      
      fbshipit-source-id: b3ad60bcca5f3a1a2b929a6eb34b0b7ba6326f04
      cca141ec
    • M
      Remove extraneous call to TrackKey (#5173) · d655a3aa
      Manuel Ung 提交于
      Summary:
      In `PessimisticTransaction::TryLock`, we were calling `TrackKey` even when assume_tracked=true, which defeats the purpose of assume_tracked. Remove this.
      
      For keys that are already tracked, TrackKey will actually bump some counters (num_reads/num_writes) which are consumed in `TransactionBaseImpl::GetTrackedKeysSinceSavePoint`, and this is used to determine which keys were tracked since the last savepoint. I believe this functionality should still work, since I think the user should not call GetForUpdate/Put(assume_tracked=true) across savepoints, and if they do, they should not expect the Put(assume_tracked=true) to show up as a tracked key in the second savepoint.
      
      This is another 2-3% cpu improvement.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5173
      
      Differential Revision: D14883809
      
      Pulled By: lth
      
      fbshipit-source-id: 7d09f0772da422384af0519773e310c22b0cbca3
      d655a3aa
    • M
      WritePrepared: fix race condition in reading batch with duplicate keys (#5147) · fe642cbe
      Maysam Yabandeh 提交于
      Summary:
      When ReadOption doesn't specify a snapshot, WritePrepared::Get used kMaxSequenceNumber to avoid the cost of creating a new snapshot object (that requires sync over db_mutex). This creates a race condition if it is reading from the writes of a transaction that had duplicate keys: each instance of duplicate key is inserted with a different sequence number and depending on the ordering the ::Get might skip the newer one and read the older one that is obsolete.
      The patch fixes that by using last published seq as the snapshot sequence number. It also adds a check after the read is done to ensure that the max_evicted_seq has not advanced the aforementioned seq, which is a very unlikely event. If it did, then the read is not valid since the seq is not backed by an actually snapshot to let IsInSnapshot handle that properly when an overlapping commit is evicted from commit cache.
      A unit  test is added to reproduce the race condition with duplicate keys.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5147
      
      Differential Revision: D14758815
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: a56915657132cf6ba5e3f5ea1b5d78c803407719
      fe642cbe
    • A
      Expose JavaAPI for getting the filter policy of a BlockBasedTableConfig (#5186) · 1966a7c0
      ableegoldman 提交于
      Summary:
      I would like to be able to read out the current Filter that has been set (or not) for a BlockBasedTableConfig. Added one public method to BlockBasedTableConfig:
      
      public Filter filterPolicy() {
          return filterPolicy;
      }
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5186
      
      Differential Revision: D14921415
      
      Pulled By: siying
      
      fbshipit-source-id: 2a63c8685480197862b49fc48916c757cd6daf95
      1966a7c0
    • S
      Still implement StatisticsImpl::measureTime() (#5181) · 85b2bde3
      Siying Dong 提交于
      Summary:
      Since Statistics::measureTime() is deprecated, StatisticsImpl::measureTime() is not implemented. We realized that users might have a wrapped Statistics implementation in which measureTime() is implemented as forwarded to StatisticsImpl, and causes assert failure. In order to make the change less intrusive, we implement StatisticsImpl::measureTime(). We will revisit whether we need to remove it after several releases.
      
      Also, add a test to make sure that a Statistics implementation using the old interface still works.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5181
      
      Differential Revision: D14907089
      
      Pulled By: siying
      
      fbshipit-source-id: 29b6202fd04e30ed6f6adcaeb1000e87f10d1e1a
      85b2bde3