1. 01 12月, 2018 1 次提交
    • S
      Make NewBloomFilterPolicy() use full filter by default (#4735) · 6e938c90
      Siying Dong 提交于
      Summary:
      Full block (use_block_based_builder=false) Bloom filter has clear CPU saving benefits but with limitation of using temp memory when building an SST file proportional to the SST file size. We reduced the chance of having large SST files with multi-level universal compaction. Now we change to a default with better performance.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4735
      
      Differential Revision: D13266674
      
      Pulled By: siying
      
      fbshipit-source-id: 7594a4c3e32568a5a2adce22bb0e46553e55c602
      6e938c90
  2. 29 11月, 2018 1 次提交
  3. 28 11月, 2018 1 次提交
  4. 22 11月, 2018 2 次提交
  5. 21 11月, 2018 1 次提交
    • A
      Fix range tombstone covering short-circuit logic (#4698) · ed5aec5b
      Abhishek Madan 提交于
      Summary:
      Since a range tombstone seen at one level will cover all keys
      in the range at lower levels, there was a short-circuiting check in Get
      that reported a key was not found at most one file after the range
      tombstone was discovered. However, this was incorrect for merge
      operands, since a deletion might only cover some merge operands,
      which implies that the key should be found. This PR fixes this logic in
      the Version portion of Get, and removes the logic from the MemTable
      portion of Get, since the perforamnce benefit provided there is minimal.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4698
      
      Differential Revision: D13142484
      
      Pulled By: abhimadan
      
      fbshipit-source-id: cbd74537c806032f2bfa564724d01a80df7c8f10
      ed5aec5b
  6. 14 11月, 2018 4 次提交
    • Y
      Move MemoryAllocator option from Cache to BlockBasedTableOptions (#4676) · b32d087d
      Yi Wu 提交于
      Summary:
      Per offline discussion with siying, `MemoryAllocator` and `Cache` should be decouple. The idea is that memory allocator handles memory allocation, while cache handle cache policy.
      
      It is normal that external cache libraries pack couple the two components for better optimization. If we want to integrate with such library in the future, we can make a wrapper of the library implementing both `Cache` and `MemoryAllocator` interface.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4676
      
      Differential Revision: D13047662
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: cd42e246d80ab600b4de47d073f7d2db308ce6dd
      b32d087d
    • S
      Divide `NO_ITERATORS` into two counters `NO_ITERATOR_CREATED` and `NO_ITERATOR_DELETE` (#4498) · 5945e16d
      Soli Como 提交于
      Summary:
      Currently, `Statistics` can record tick by `recordTick()` whose second parameter is an `uint64_t`.
      That means tick can only increase.
      If we want to reduce tick, we have to work around like `RecordTick(statistics_, NO_ITERATORS, uint64_t(-1));`.
      That's kind of a hack.
      
      So, this PR divide `NO_ITERATORS` into two counters `NO_ITERATOR_CREATED` and `NO_ITERATOR_DELETE`, making the counters increase only.
      
      Fixes #3013 .
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4498
      
      Differential Revision: D10395010
      
      Pulled By: sagar0
      
      fbshipit-source-id: cfb523b22a37411c794b4e9da090f1ae30293db2
      5945e16d
    • A
      Backup engine support for direct I/O reads (#4640) · ea945470
      Andrew Kryczka 提交于
      Summary:
      Use the `DBOptions` that the backup engine already holds to figure out the right `EnvOptions` to use when reading the DB files. This means that, if a user opened a DB instance with `use_direct_reads=true`, then using `BackupEngine` to back up that DB instance will use direct I/O to read files when calculating checksums and copying. Currently the WALs and manifests would still be read using buffered I/O to prevent mixing direct I/O reads with concurrent buffered I/O writes.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4640
      
      Differential Revision: D13015268
      
      Pulled By: ajkr
      
      fbshipit-source-id: 77006ad6f3e00ce58374ca4793b785eea0db6269
      ea945470
    • Z
      use per-level perfcontext for DB::Get calls (#4617) · b3130193
      Zhongyi Xie 提交于
      Summary:
      this PR adds two more per-level perf context counters to track
      * number of keys returned in Get call, break down by levels
      * total processing time at each level during Get call
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4617
      
      Differential Revision: D12898024
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 6b84ef1c8097c0d9e97bee1a774958f56ab4a6c4
      b3130193
  7. 13 11月, 2018 1 次提交
  8. 07 11月, 2018 1 次提交
  9. 06 11月, 2018 2 次提交
    • M
      WritePrepared: Fix bug in searching in non-cached snapshots (#4639) · 2b5b7bc7
      Maysam Yabandeh 提交于
      Summary:
      When evicting an entry form the commit_cache, it is verified against the list of old snapshots to see if it overlaps with any. The list of old snapshots is split into two lists: an efficient concurrent cache and an slow vector protected by a lock. The patch fixes a bug that would stop the search in the cache if it finds any and yet would not include the larger snapshots in the slower list.
      An extra info log entry is also removed. The condition to trigger that although very rare is still feasible and should not spam the LOG when that happens.
      Fixes #4621
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4639
      
      Differential Revision: D12934989
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 4e0fe8147ba292b554ae78e94c21c2ef31e03e2d
      2b5b7bc7
    • A
      Add DB property for SST files kept from deletion (#4618) · fffac43c
      Andrew Kryczka 提交于
      Summary:
      This property can help debug why SST files aren't being deleted. Previously we only had the property "rocksdb.is-file-deletions-enabled". However, even when that returned true, obsolete SSTs may still not be deleted due to the coarse-grained mechanism we use to prevent newly created SSTs from being accidentally deleted. That coarse-grained mechanism uses a lower bound file number for SSTs that should not be deleted, and this property exposes that lower bound.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4618
      
      Differential Revision: D12898179
      
      Pulled By: ajkr
      
      fbshipit-source-id: fe68acc041ddbcc9276bbd48976524d95aafc776
      fffac43c
  10. 03 11月, 2018 1 次提交
  11. 31 10月, 2018 1 次提交
    • A
      Promote rocksdb.{deleted.keys,merge.operands} to main table properties (#4594) · eaaf1a6f
      Abhishek Madan 提交于
      Summary:
      Since the number of range deletions are reported in
      TableProperties, it is confusing to not report the number of merge
      operands and point deletions as top-level properties; they are
      accessible through the public API, but since they are not the "main"
      properties, they do not appear in aggregated table properties, or the
      string representation of table properties.
      
      This change promotes those two property keys to
      `rocksdb/table_properties.h`, adds corresponding uint64 members for
      them, deprecates the old access methods `GetDeletedKeys()` and
      `GetMergeOperands()` (though they are still usable for now), and removes
      `InternalKeyPropertiesCollector`. The property key strings are the same
      as before this change, so this should be able to read DBs written from older
      versions (though I haven't tested this yet).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4594
      
      Differential Revision: D12826893
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 9e4e4fbdc5b0da161c89582566d184101ba8eb68
      eaaf1a6f
  12. 27 10月, 2018 2 次提交
  13. 23 10月, 2018 2 次提交
    • M
      Fix user comparator receiving internal key (#4575) · c34cc404
      Maysam Yabandeh 提交于
      Summary:
      There was a bug that the user comparator would receive the internal key instead of the user key. The bug was due to RangeMightExistAfterSortedRun expecting user key but receiving internal key when called in GenerateBottommostFiles. The patch augment an existing unit test to reproduce the bug and fixes it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4575
      
      Differential Revision: D10500434
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 858346d2fd102cce9e20516d77338c112bdfe366
      c34cc404
    • S
      Dynamic level to adjust level multiplier when write is too heavy (#4338) · 70242636
      Siying Dong 提交于
      Summary:
      Level compaction usually performs poorly when the writes so heavy that the level targets can't be guaranteed. With this improvement, we improve level_compaction_dynamic_level_bytes = true so that in the write heavy cases, the level multiplier can be slightly adjusted based on the size of L0.
      
      We keep the behavior the same if number of L0 files is under 2X compaction trigger and the total size is less than options.max_bytes_for_level_base, so that unless write is so heavy that compaction cannot keep up, the behavior doesn't change.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4338
      
      Differential Revision: D9636782
      
      Pulled By: siying
      
      fbshipit-source-id: e27fc17a7c29c84b00064cc17536a01dacef7595
      70242636
  14. 20 10月, 2018 1 次提交
    • S
      Fix WriteBatchWithIndex's SeekForPrev() (#4559) · c17383f9
      Siying Dong 提交于
      Summary:
      WriteBatchWithIndex's SeekForPrev() has a bug that we internally place the position just before the seek key rather than after. This makes the iterator to miss the result that is the same as the seek key. Fix it by position the iterator equal or smaller.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4559
      
      Differential Revision: D10468534
      
      Pulled By: siying
      
      fbshipit-source-id: 2fb371ae809c561b60a1c11cef71e1c66fea1f19
      c17383f9
  15. 18 10月, 2018 1 次提交
    • Z
      Add PerfContextByLevel to provide per level perf context information (#4226) · d6ec2887
      Zhongyi Xie 提交于
      Summary:
      Current implementation of perf context is level agnostic. Making it hard to do performance evaluation for the LSM tree. This PR adds `PerfContextByLevel` to decompose the counters by level.
      This will be helpful when analyzing point and range query performance as well as tuning bloom filter
      Also replaced __thread with thread_local keyword for perf_context
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4226
      
      Differential Revision: D10369509
      
      Pulled By: miasantreble
      
      fbshipit-source-id: f1ced4e0de5fcebdb7f9cff36164516bc6382d82
      d6ec2887
  16. 16 10月, 2018 2 次提交
    • A
      Properly determine a truncated CompactRange stop key (#4496) · 1e384580
      anand1976 提交于
      Summary:
      When a CompactRange() call for a level is truncated before the end key
      is reached, because it exceeds max_compaction_bytes, we need to properly
      set the compaction_end parameter to indicate the stop key. The next
      CompactRange will use that as the begin key. We set it to the smallest
      key of the next file in the level after expanding inputs to get a clean
      cut.
      
      Previously, we were setting it before expanding inputs. So we could end
      up recompacting some files. In a pathological case, where a single key
      has many entries spanning all the files in the level (possibly due to
      merge operands without a partial merge operator, thus resulting in
      compaction output identical to the input), this would result in
      an endless loop over the same set of files.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4496
      
      Differential Revision: D10395026
      
      Pulled By: anand1976
      
      fbshipit-source-id: f0c2f89fee29b4b3be53b6467b53abba8e9146a9
      1e384580
    • A
      Avoid per-key linear scan over snapshots in compaction (#4495) · 32b4d4ad
      Andrew Kryczka 提交于
      Summary:
      `CompactionIterator::snapshots_` is ordered by ascending seqnum, just like `DBImpl`'s linked list of snapshots from which it was copied. This PR exploits this ordering to make `findEarliestVisibleSnapshot` do binary search rather than linear scan. This can make flush/compaction significantly faster when many snapshots exist since that function is called on every single key.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4495
      
      Differential Revision: D10386470
      
      Pulled By: ajkr
      
      fbshipit-source-id: 29734991631227b6b7b677e156ac567690118a8b
      32b4d4ad
  17. 11 10月, 2018 1 次提交
  18. 10 10月, 2018 1 次提交
    • A
      Handle mixed slowdown/no_slowdown writer properly (#4475) · 854a4be0
      Anand Ananthabhotla 提交于
      Summary:
      There is a bug when the write queue leader is blocked on a write
      delay/stop, and the queue has writers with WriteOptions::no_slowdown set
      to true. They are not woken up until the write stall is cleared.
      
      The fix introduces a dummy writer inserted at the tail to indicate a
      write stall and prevent further inserts into the queue, and a condition
      variable that writers who can tolerate slowdown wait on before adding
      themselves to the queue. The leader calls WriteThread::BeginWriteStall()
      to add the dummy writer and then walk the queue to fail any writers with
      no_slowdown set. Once the stall clears, the leader calls
      WriteThread::EndWriteStall() to remove the dummy writer and signal the
      condition variable.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4475
      
      Differential Revision: D10285827
      
      Pulled By: anand1976
      
      fbshipit-source-id: 747465e5e7f07a829b1fb0bc1afcd7b93f4ab1a9
      854a4be0
  19. 09 10月, 2018 2 次提交
  20. 03 10月, 2018 1 次提交
  21. 01 10月, 2018 1 次提交
  22. 19 9月, 2018 1 次提交
  23. 11 9月, 2018 1 次提交
    • M
      Skip concurrency control during recovery of pessimistic txn (#4346) · 3f528226
      Maysam Yabandeh 提交于
      Summary:
      TransactionOptions::skip_concurrency_control allows pessimistic transactions to skip the overhead of concurrency control. This could be as an optimization if the application knows that the transaction would not have any conflict with concurrent transactions. It is currently used during recovery assuming (i) application guarantees no conflict between prepared transactions in the WAL (ii) application guarantees that recovered transactions will be rolled back/commit before new transactions start.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4346
      
      Differential Revision: D9759149
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f896e84fa58b0b584be904c7fd3883a41ea3215b
      3f528226
  24. 30 8月, 2018 1 次提交
    • M
      Avoiding write stall caused by manual flushes (#4297) · 927f2749
      Mikhail Antonov 提交于
      Summary:
      Basically at the moment it seems it's possible to cause write stall by calling flush (either manually vis DB::Flush(), or from Backup Engine directly calling FlushMemTable() while background flush may be already happening.
      
      One of the ways to fix it is that in DBImpl::CompactRange() we already check for possible stall and delay flush if needed before we actually proceed to call FlushMemTable(). We can simply move this delay logic to separate method and call it from FlushMemTable.
      
      This is draft patch, for first look; need to check tests/update SyncPoints and most certainly would need to add allow_write_stall method to FlushOptions().
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4297
      
      Differential Revision: D9420705
      
      Pulled By: mikhail-antonov
      
      fbshipit-source-id: f81d206b55e1d7b39e4dc64242fdfbceeea03fcc
      927f2749
  25. 29 8月, 2018 1 次提交
    • A
      Sync CURRENT file during checkpoint (#4322) · 42733637
      Andrew Kryczka 提交于
      Summary: For the CURRENT file forged during checkpoint, we were forgetting to `fsync` or `fdatasync` it after its creation. This PR fixes it.
      
      Differential Revision: D9525939
      
      Pulled By: ajkr
      
      fbshipit-source-id: a505483644026ee3f501cfc0dcbe74832165b2e3
      42733637
  26. 25 8月, 2018 1 次提交
    • A
      Reduce empty SST creation/deletion during compaction (#4311) · 17f9a181
      Andrew Kryczka 提交于
      Summary:
      I have a PR to start calling `OnTableFileCreated` for empty SSTs: #4307. However, it is a behavior change so should not go into a patch release.
      
      This PR adds back a check to make sure range deletions at least exist before starting file creation. This PR should be safe to backport to earlier versions.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4311
      
      Differential Revision: D9493734
      
      Pulled By: ajkr
      
      fbshipit-source-id: f0d43cda4cfd904f133cfe3a6eb622f52a9ccbe8
      17f9a181
  27. 24 8月, 2018 1 次提交
  28. 23 8月, 2018 1 次提交
  29. 22 8月, 2018 1 次提交
  30. 17 8月, 2018 2 次提交