1. 17 2月, 2018 2 次提交
    • M
      WritePrepared Txn: optimizations for sysbench update_noindex · c178da05
      Maysam Yabandeh 提交于
      Summary:
      These are optimization that we applied to improve sysbech's update_noindex performance.
      1. Make use of LIKELY compiler hint
      2. Move std::atomic so the subclass
      3. Make use of skip_prepared in non-2pc transactions.
      Closes https://github.com/facebook/rocksdb/pull/3512
      
      Differential Revision: D7000075
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 1ab8292584df1f6305a4992973fb1b7933632181
      c178da05
    • M
      Fix deadlock in ColumnFamilyData::InstallSuperVersion() · 97307d88
      Mike Kolupaev 提交于
      Summary:
      Deadlock: a memtable flush holds DB::mutex_ and calls ThreadLocalPtr::Scrape(), which locks ThreadLocalPtr mutex; meanwhile, a thread exit handler locks ThreadLocalPtr mutex and calls SuperVersionUnrefHandle, which tries to lock DB::mutex_.
      
      This deadlock is hit all the time on our workload. It blocks our release.
      
      In general, the problem is that ThreadLocalPtr takes an arbitrary callback and calls it while holding a lock on a global mutex. The same global mutex is (at least in some cases) locked by almost all ThreadLocalPtr methods, on any instance of ThreadLocalPtr. So, there'll be a deadlock if the callback tries to do anything to any instance of ThreadLocalPtr, or waits for another thread to do so.
      
      So, probably the only safe way to use ThreadLocalPtr callbacks is to do only do simple and lock-free things in them.
      
      This PR fixes the deadlock by making sure that local_sv_ never holds the last reference to a SuperVersion, and therefore SuperVersionUnrefHandle never has to do any nontrivial cleanup.
      
      I also searched for other uses of ThreadLocalPtr to see if they may have similar bugs. There's only one other use, in transaction_lock_mgr.cc, and it looks fine.
      Closes https://github.com/facebook/rocksdb/pull/3510
      
      Reviewed By: sagar0
      
      Differential Revision: D7005346
      
      Pulled By: al13n321
      
      fbshipit-source-id: 37575591b84f07a891d6659e87e784660fde815f
      97307d88
  2. 16 2月, 2018 3 次提交
    • M
      Unbreak MemTableRep API change · 8eb1d445
      Maysam Yabandeh 提交于
      Summary:
      The MemTableRep API was broken by this commit: 813719e9
      This patch reverts the changes and instead adds InsertKey (and etc.) overloads to extend the MemTableRep API without breaking the existing classes that inherit from it.
      Closes https://github.com/facebook/rocksdb/pull/3513
      
      Differential Revision: D7004134
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: e568d91fe1e17dd76c0c1f6c7dd51a18633b1c4f
      8eb1d445
    • J
      Several small "fixes" · 4e7a182d
      jsteemann 提交于
      Summary:
      - removed a few unneeded variables
      - fused some variable declarations and their assignments
      - fixed right-trimming code in string_util.cc to not underflow
      - simplifed an assertion
      - move non-nullptr check assertion before dereferencing of that pointer
      - pass an std::string function parameter by const reference instead of by value (avoiding potential copy)
      Closes https://github.com/facebook/rocksdb/pull/3507
      
      Differential Revision: D7004679
      
      Pulled By: sagar0
      
      fbshipit-source-id: 52944952d9b56dfcac3bea3cd7878e315bb563c4
      4e7a182d
    • Z
      Tweak external file ingestion seqno logic under universal compaction · c88c57cd
      Zhongyi Xie 提交于
      Summary:
      Right now it is possible that a file gets assigned to L0 but also assigned the seqno from a higher level which it doesn't fit
      Under the current impl, it is possibe that seqno in lower levels (Ln) can be equal to smallest seqno of higher levels (Ln-1), which is undesirable from universal compaction's point of view.
      This should fix the intermittent failure of `ExternalSSTFileBasicTest.IngestFileWithGlobalSeqnoPickedSeqno`
      Closes https://github.com/facebook/rocksdb/pull/3411
      
      Differential Revision: D6813802
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 693d0462fa94725ccfb9d8858743e6d2d9992d14
      c88c57cd
  3. 15 2月, 2018 1 次提交
  4. 14 2月, 2018 2 次提交
  5. 13 2月, 2018 3 次提交
    • S
      Customized BlockBasedTableIterator and LevelIterator · b555ed30
      Siying Dong 提交于
      Summary:
      Use a customzied BlockBasedTableIterator and LevelIterator to replace current implementations leveraging two-level-iterator. Hope the customized logic will make code easier to understand. As a side effect, BlockBasedTableIterator reduces the allocation for the data block iterator object, and avoid the virtual function call to it, because we can directly reference BlockIter, a final class. Similarly, LevelIterator reduces virtual function call to the dummy iterator iterating the file metadata. It also enabled further optimization.
      
      The upper bound check is also moved from index block to data block. This implementation fits this iterator better. After the change, forwared iterator is slightly optimized to ensure we trim those iterators.
      
      The two-level-iterator now is only used by partitioned index, so it is simplified.
      Closes https://github.com/facebook/rocksdb/pull/3406
      
      Differential Revision: D6809041
      
      Pulled By: siying
      
      fbshipit-source-id: 7da3b9b1d3c8e9d9405302c15920af1fcaf50ffa
      b555ed30
    • A
      Add delay before flush in CompactRange to avoid write stalling · ee1c8026
      Andrew Kryczka 提交于
      Summary:
      - Refactored logic for checking write stall condition to a helper function: `GetWriteStallConditionAndCause`. Now it is decoupled from the logic for updating WriteController / stats in `RecalculateWriteStallConditions`, so we can reuse it for predicting whether write stall will occur.
      - Updated `CompactRange` to first check whether the one additional immutable memtable / L0 file would cause stalling before it flushes. If so, it waits until that is no longer true.
      - Updated `bg_cv_` to be signaled on `SetOptions` calls. The stall conditions `CompactRange` cares about can change when (1) flush finishes, (2) compaction finishes, or (3) options dynamically change. The cv was already signaled for (1) and (2) but not yet for (3).
      Closes https://github.com/facebook/rocksdb/pull/3381
      
      Differential Revision: D6754983
      
      Pulled By: ajkr
      
      fbshipit-source-id: 5613e03f1524df7192dc6ae885d40fd8f091d972
      ee1c8026
    • Z
      make flush_reason_ atomic to keep TSAN happy · 3f1bb073
      Zhongyi Xie 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3487
      
      Differential Revision: D6967098
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 48e0accf2e3b3f589ddb797ff8083c8520269bf0
      3f1bb073
  6. 10 2月, 2018 3 次提交
    • S
      Explictly fail writes if key or value is not smaller than 4GB · ef29d2a2
      Siying Dong 提交于
      Summary:
      Right now, users will encounter unexpected bahavior if they use key or value larger than 4GB. We should explicitly fail the queriers.
      Closes https://github.com/facebook/rocksdb/pull/3484
      
      Differential Revision: D6953895
      
      Pulled By: siying
      
      fbshipit-source-id: b60491e1af064fc5d52971956661f6c18ceac24f
      ef29d2a2
    • Y
      WritePrepared Txn: Support merge operator · fe228da0
      Yi Wu 提交于
      Summary:
      CompactionIterator invoke MergeHelper::MergeUntil() to do partial merge between snapshot boundaries. Previously it only depend on sequence number to tell snapshot boundary, but we also need to make use of snapshot_checker to verify visibility of the merge operands to the snapshots. For example, say there is a snapshot with seq = 2 but only can see data with seq <= 1. There are three merges, each with seq = 1, 2, 3. A correct compaction output would be (1),(2+3). Without taking snapshot_checker into account when generating merge result, compaction will generate output (1+2),(3).
      
      By filtering uncommitted keys with read callback, the read path already take care of merges well and don't need additional updates.
      Closes https://github.com/facebook/rocksdb/pull/3475
      
      Differential Revision: D6926087
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 8f539d6f897cfe29b6dc27a8992f68c2a629d40a
      fe228da0
    • Z
      log flush reason for better debugging experience · 945f618b
      Zhongyi Xie 提交于
      Summary:
      It's always a mystery from the logs why flush was triggered -- user triggered it manually, WriteBufferManager triggered it,  logs were full, write buffer was full, etc.
      This PR logs Flush reason whenever a flush is scheduled.
      Closes https://github.com/facebook/rocksdb/pull/3401
      
      Differential Revision: D6788142
      
      Pulled By: miasantreble
      
      fbshipit-source-id: a867e54d493c06adf5172bd36a180fb3faae3511
      945f618b
  7. 08 2月, 2018 1 次提交
  8. 07 2月, 2018 1 次提交
    • Y
      WritePrepared Txn: update compaction_iterator_test and db_iterator_test · 81736d8a
      Yi Wu 提交于
      Summary:
      Update compaction_iterator_test with write-prepared transaction DB related tests. Transaction related tests are group in CompactionIteratorWithSnapshotCheckerTest. The existing test are duplicated to make them also test with dummy SnapshotChecker that will say every key is visible to every snapshot (this is okay, we still compare sequence number to verify visibility). Merge related tests are disabled and will be revisit in another PR.
      
      Existing db_iterator_tests are also duplicated to test with dummy read_callback that will say every key is committed.
      Closes https://github.com/facebook/rocksdb/pull/3466
      
      Differential Revision: D6909253
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 2ae4656b843a55e2e9ff8beecf21f2832f96cd25
      81736d8a
  9. 06 2月, 2018 2 次提交
    • M
      WritePrepared Txn: Duplicate Keys, Txn Part · 88d8b2a2
      Maysam Yabandeh 提交于
      Summary:
      This patch takes advantage of memtable being able to detect duplicate <key,seq> and returning TryAgain to handle duplicate keys in WritePrepared Txns. Through WriteBatchWithIndex's index it detects existence of at least a duplicate key in the write batch. If duplicate key was reported, it then pays the cost of counting the number of sub-patches by iterating over the write batch and pass it to DBImpl::Write. DB will make use of the provided batch_count to assign proper sequence numbers before sending them to the WAL. When later inserting the batch to the memtable, it increases the seq each time memtbale reports a duplicate (a sub-patch in our counting) and tries again.
      Closes https://github.com/facebook/rocksdb/pull/3455
      
      Differential Revision: D6873699
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: db8487526c3a5dc1ddda0ea49f0f979b26ae648d
      88d8b2a2
    • A
      Handle error return from WriteBuffer() · 4b124fb9
      Anand Ananthabhotla 提交于
      Summary:
      There are a couple of places where we swallow any error from
      WriteBuffer() - in SwitchMemtable() and DBImpl::CloseImpl(). Propagate
      the error up in those cases rather than ignoring it.
      Closes https://github.com/facebook/rocksdb/pull/3404
      
      Differential Revision: D6879954
      
      Pulled By: anand1976
      
      fbshipit-source-id: 2ef88b554be5286b0a8bad7384ba17a105395bdb
      4b124fb9
  10. 03 2月, 2018 2 次提交
  11. 02 2月, 2018 1 次提交
  12. 01 2月, 2018 2 次提交
  13. 31 1月, 2018 2 次提交
  14. 30 1月, 2018 4 次提交
    • Y
      Fix DBFlushTest::ManualFlushWithMinWriteBufferNumberToMerge dead lock · 4bdf06e7
      Yi Wu 提交于
      Summary:
      In the test, there can be a dead lock between background flush thread and foreground main thread as following:
      * background flush thread:
        - holding db mutex, while
        - waiting on "DBImpl::FlushMemTableToOutputFile:BeforeInstallSV" sync point.
      * foreground thread:
        - waiting for db mutex to write "key2"
      
      Fixing by let background flush thread wait without holding db mutex.
      Closes https://github.com/facebook/rocksdb/pull/3436
      
      Differential Revision: D6841334
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: b020768ac94e166e40953c5d09e505515a5f244d
      4bdf06e7
    • S
      Tests for dynamic universal compaction options · e6605e53
      Sagar Vemuri 提交于
      Summary:
      Added a test for three dynamic universal compaction options, in the realm of read amplification:
      - size_ratio
      - min_merge_width
      - max_merge_width
      
      Also updated DynamicUniversalCompactionSizeAmplification by adding a check on compaction reason.
      Found a bug in compaction reason setting while working on this PR, and fixed in #3412 .
      
      TODO for later: Still to add tests for these options: compression_size_percent, stop_style and trivial_move.
      Closes https://github.com/facebook/rocksdb/pull/3419
      
      Differential Revision: D6822217
      
      Pulled By: sagar0
      
      fbshipit-source-id: 074573fca6389053cbac229891a0163f38bb56c4
      e6605e53
    • Z
      Use block cache to track memory usage when ReadOptions.fill_cache=false · 3fe09371
      Zhongyi Xie 提交于
      Summary:
      ReadOptions.fill_cache is set in compaction inputs and can be set by users in their queries too. It tells RocksDB not to put a data block used to block cache.
      
      The memory used by the data block is, however, not trackable by users.
      
      To make the system more manageable, we can cost the block to block cache while using it, and then release it after using.
      Closes https://github.com/facebook/rocksdb/pull/3333
      
      Differential Revision: D6670230
      
      Pulled By: miasantreble
      
      fbshipit-source-id: ab848d3ed286bd081a13ee1903de357b56cbc308
      3fe09371
    • M
      Suppress lint in old files · b8eb32f8
      Mark Isaacson 提交于
      Summary: Grandfather in super old lint issues to make a clean slate for moving forward that allows us to have stronger enforcement on new issues.
      
      Reviewed By: yiwu-arbug
      
      Differential Revision: D6821806
      
      fbshipit-source-id: 22797d31ec58e9eb0255d3b66fedfcfcb0dc127c
      b8eb32f8
  15. 27 1月, 2018 1 次提交
    • S
      Incorrect Universal Compaction reason · 7fcc1d0d
      Sagar Vemuri 提交于
      Summary:
      While writing tests for dynamic Universal Compaction options, I found that the compaction reasons we set for size-ratio based and sorted-run based universal compactions are swapped with each other. Fixed it.
      Closes https://github.com/facebook/rocksdb/pull/3412
      
      Differential Revision: D6820540
      
      Pulled By: sagar0
      
      fbshipit-source-id: 270a188968ba25b2c96a8339904416c4c87ff5b3
      7fcc1d0d
  16. 24 1月, 2018 4 次提交
  17. 23 1月, 2018 2 次提交
  18. 20 1月, 2018 2 次提交
  19. 19 1月, 2018 2 次提交
    • Y
      Fix Flush() keep waiting after flush finish · f1cb83fc
      Yi Wu 提交于
      Summary:
      Flush() call could be waiting indefinitely if min_write_buffer_number_to_merge is used. Consider the sequence:
      1. User call Flush() with flush_options.wait = true
      2. The manual flush started in the background
      3. New memtable become immutable because of writes. The new memtable will not trigger flush if min_write_buffer_number_to_merge is not reached.
      4. The manual flush finish.
      
      Because of the new memtable created at step 3 not being flush, previous logic of WaitForFlushMemTable() keep waiting, despite the memtables it intent to flush has been flushed.
      
      Here instead of checking if there are any more memtables to flush, WaitForFlushMemTable() also check the id of the earliest memtable. If the id is larger than that of latest memtable at the time flush was initiated, it means all the memtable at the time of flush start has all been flush.
      Closes https://github.com/facebook/rocksdb/pull/3378
      
      Differential Revision: D6746789
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 35e698f71c7f90b06337a93e6825f4ea3b619bfa
      f1cb83fc
    • T
      Fixed get version on windows, moved throwing exceptions into cc file. · b9873162
      topilski 提交于
      Summary:
      Fixes for msys2 and mingw, hide exceptions into cpp  file.
      Closes https://github.com/facebook/rocksdb/pull/3377
      
      Differential Revision: D6746707
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 456b38df80bc48b8386a2cf87f669b5a4f9999a4
      b9873162