1. 21 2月, 2018 4 次提交
    • S
      Add rocksdb.iterator.internal-key property · 8ada876d
      Sagar Vemuri 提交于
      Summary:
      Added a new iterator property: `rocksdb.iterator.internal-key` to get the internal-key (converted to user key) at which the iterator stopped.
      Closes https://github.com/facebook/rocksdb/pull/3525
      
      Differential Revision: D7033694
      
      Pulled By: sagar0
      
      fbshipit-source-id: d51e6c00f5e9d766c6276ef79774b81c6c5216f8
      8ada876d
    • J
      save redundant key lookup in map of locked keys · e9c31ab1
      jsteemann 提交于
      Summary:
      In case it is found that a key is already marked as locked in a
      stripe's map of locked keys, it is not necessary to look it up
      again using `std::unordered_map<std::string, ...>::at(size_t)`.
      
      Instead, we can use the already found position using the iterator
      produced by the previous `find` operation. Reusing the iterator
      will avoid having to hash the key again and do additional "random"
      memory lookups in the map of keys (though the data will very
      likely sit available in caches here already due to the previous
      find operation)
      Closes https://github.com/facebook/rocksdb/pull/3505
      
      Differential Revision: D7036446
      
      Pulled By: sagar0
      
      fbshipit-source-id: cced51547b2bd2d49394f6bc8c5896f09fa80f68
      e9c31ab1
    • A
      fix handling of empty string as checkpoint directory · 1960e73e
      Andrew Kryczka 提交于
      Summary:
      - made `CreateCheckpoint` properly return `InvalidArgument` when called with an empty directory. Previously it triggered an assertion failure due to a bug in the logic.
      - made `ldb` set empty `checkpoint_dir` if that's what the user specifies, so that we can use it to properly test `CreateCheckpoint` in the future.
      
      Differential Revision: D6874562
      
      fbshipit-source-id: dcc1bd41768261d9338987fa7711444289707ed7
      1960e73e
    • I
      fix shift UBSAN error in col_buf_encoder.cc · 5263da63
      Igor Sugak 提交于
      Summary:
      Add a static cast to perform the left shift as with an unsigned type.
      
      make ubsan_check
      Closes https://github.com/facebook/rocksdb/pull/3517
      
      Reviewed By: sagar0
      
      Differential Revision: D7016044
      
      Pulled By: igorsugak
      
      fbshipit-source-id: baf72f6197edd8f7220d010b15a23d6de6a72c49
      5263da63
  2. 17 2月, 2018 3 次提交
    • P
      Fix build with USE_RTTI=0 · ab446dc2
      Po-Chuan Hsieh 提交于
      Summary:
      utilities/column_aware_encoding_util.cc:61:23: error: cannot use dynamic_cast with -fno-rtti
        table_reader_.reset(dynamic_cast<BlockBasedTable*>(table_reader.release()));
                            ^
      1 error generated.
      
      It was added as a [local patch](https://svnweb.freebsd.org/ports/head/databases/rocksdb/files/patch-utilities-column_aware_encoding_util.cc) on FreeBSD since RocksDB 5.8.
      It also fixes #2707.
      Closes https://github.com/facebook/rocksdb/pull/3514
      
      Differential Revision: D7005571
      
      Pulled By: siying
      
      fbshipit-source-id: 351a9055d21d0accdd7a932e8e7bfcd3c8e22068
      ab446dc2
    • M
      WritePrepared Txn: optimizations for sysbench update_noindex · c178da05
      Maysam Yabandeh 提交于
      Summary:
      These are optimization that we applied to improve sysbech's update_noindex performance.
      1. Make use of LIKELY compiler hint
      2. Move std::atomic so the subclass
      3. Make use of skip_prepared in non-2pc transactions.
      Closes https://github.com/facebook/rocksdb/pull/3512
      
      Differential Revision: D7000075
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 1ab8292584df1f6305a4992973fb1b7933632181
      c178da05
    • M
      Fix deadlock in ColumnFamilyData::InstallSuperVersion() · 97307d88
      Mike Kolupaev 提交于
      Summary:
      Deadlock: a memtable flush holds DB::mutex_ and calls ThreadLocalPtr::Scrape(), which locks ThreadLocalPtr mutex; meanwhile, a thread exit handler locks ThreadLocalPtr mutex and calls SuperVersionUnrefHandle, which tries to lock DB::mutex_.
      
      This deadlock is hit all the time on our workload. It blocks our release.
      
      In general, the problem is that ThreadLocalPtr takes an arbitrary callback and calls it while holding a lock on a global mutex. The same global mutex is (at least in some cases) locked by almost all ThreadLocalPtr methods, on any instance of ThreadLocalPtr. So, there'll be a deadlock if the callback tries to do anything to any instance of ThreadLocalPtr, or waits for another thread to do so.
      
      So, probably the only safe way to use ThreadLocalPtr callbacks is to do only do simple and lock-free things in them.
      
      This PR fixes the deadlock by making sure that local_sv_ never holds the last reference to a SuperVersion, and therefore SuperVersionUnrefHandle never has to do any nontrivial cleanup.
      
      I also searched for other uses of ThreadLocalPtr to see if they may have similar bugs. There's only one other use, in transaction_lock_mgr.cc, and it looks fine.
      Closes https://github.com/facebook/rocksdb/pull/3510
      
      Reviewed By: sagar0
      
      Differential Revision: D7005346
      
      Pulled By: al13n321
      
      fbshipit-source-id: 37575591b84f07a891d6659e87e784660fde815f
      97307d88
  3. 16 2月, 2018 6 次提交
    • A
      fix advance reservation of arena block addresses · 0454f781
      Andrew Kryczka 提交于
      Summary:
      Calling `std::vector::reserve()` causes memory to be reallocated and then data to be moved. It was called prior to adding every block. This reallocation could be done a huge amount of times, e.g., for users with large index blocks.
      
      Instead, we can simply use `std::vector::emplace_back()` in such a way that preserves the no-memory-leak guarantee, while letting the vector decide when to reallocate space. Now I see reallocation/moving happen O(logN) times, rather than O(N) times, where N is the final size of vector.
      Closes https://github.com/facebook/rocksdb/pull/3508
      
      Differential Revision: D6994228
      
      Pulled By: ajkr
      
      fbshipit-source-id: ab7c11e13ff37c8c6c8249be7a79566a4068cd27
      0454f781
    • Y
      Legocastle job to report lite build binary size to scuba · 989d1231
      Yi Wu 提交于
      Summary:
      Add a legocastle job to continuously build the last 10 commits every 4 hours and report lite build binary size to scuba.
      Closes https://github.com/facebook/rocksdb/pull/3511
      
      Differential Revision: D7001730
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 7c8ca87c46d663c786a0d32be69ebbe7b19a5eb9
      989d1231
    • M
      Unbreak MemTableRep API change · 8eb1d445
      Maysam Yabandeh 提交于
      Summary:
      The MemTableRep API was broken by this commit: 813719e9
      This patch reverts the changes and instead adds InsertKey (and etc.) overloads to extend the MemTableRep API without breaking the existing classes that inherit from it.
      Closes https://github.com/facebook/rocksdb/pull/3513
      
      Differential Revision: D7004134
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: e568d91fe1e17dd76c0c1f6c7dd51a18633b1c4f
      8eb1d445
    • J
      Several small "fixes" · 4e7a182d
      jsteemann 提交于
      Summary:
      - removed a few unneeded variables
      - fused some variable declarations and their assignments
      - fixed right-trimming code in string_util.cc to not underflow
      - simplifed an assertion
      - move non-nullptr check assertion before dereferencing of that pointer
      - pass an std::string function parameter by const reference instead of by value (avoiding potential copy)
      Closes https://github.com/facebook/rocksdb/pull/3507
      
      Differential Revision: D7004679
      
      Pulled By: sagar0
      
      fbshipit-source-id: 52944952d9b56dfcac3bea3cd7878e315bb563c4
      4e7a182d
    • Z
      Tweak external file ingestion seqno logic under universal compaction · c88c57cd
      Zhongyi Xie 提交于
      Summary:
      Right now it is possible that a file gets assigned to L0 but also assigned the seqno from a higher level which it doesn't fit
      Under the current impl, it is possibe that seqno in lower levels (Ln) can be equal to smallest seqno of higher levels (Ln-1), which is undesirable from universal compaction's point of view.
      This should fix the intermittent failure of `ExternalSSTFileBasicTest.IngestFileWithGlobalSeqnoPickedSeqno`
      Closes https://github.com/facebook/rocksdb/pull/3411
      
      Differential Revision: D6813802
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 693d0462fa94725ccfb9d8858743e6d2d9992d14
      c88c57cd
    • J
      fix wrong indentation · 6a30b98f
      jsteemann 提交于
      Summary:
      Somehow the indentation was incorrect in this file.
      The only change in this PR is to get it right again in order to make the code more readable.
      Please reject if you think it's not worth it.
      Closes https://github.com/facebook/rocksdb/pull/3504
      
      Differential Revision: D6996011
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 060514a3a8c910d34bad795b36eb4d278512b154
      6a30b98f
  4. 15 2月, 2018 1 次提交
  5. 14 2月, 2018 5 次提交
  6. 13 2月, 2018 5 次提交
    • S
      Customized BlockBasedTableIterator and LevelIterator · b555ed30
      Siying Dong 提交于
      Summary:
      Use a customzied BlockBasedTableIterator and LevelIterator to replace current implementations leveraging two-level-iterator. Hope the customized logic will make code easier to understand. As a side effect, BlockBasedTableIterator reduces the allocation for the data block iterator object, and avoid the virtual function call to it, because we can directly reference BlockIter, a final class. Similarly, LevelIterator reduces virtual function call to the dummy iterator iterating the file metadata. It also enabled further optimization.
      
      The upper bound check is also moved from index block to data block. This implementation fits this iterator better. After the change, forwared iterator is slightly optimized to ensure we trim those iterators.
      
      The two-level-iterator now is only used by partitioned index, so it is simplified.
      Closes https://github.com/facebook/rocksdb/pull/3406
      
      Differential Revision: D6809041
      
      Pulled By: siying
      
      fbshipit-source-id: 7da3b9b1d3c8e9d9405302c15920af1fcaf50ffa
      b555ed30
    • M
      WritePrepared Txn: use TransactionDBWriteOptimizations (2nd attempt) · 8a04ee4f
      Maysam Yabandeh 提交于
      Summary:
      TransactionDB::Write can receive some optimization hints from the user. One is to skip the concurrency control mechanism. WritePreparedTxnDB is currently ignoring such hints. This patch optimizes WritePreparedTxnDB::Write for skip_concurrency_control and skip_duplicate_key_check hints.
      Closes https://github.com/facebook/rocksdb/pull/3496
      
      Differential Revision: D6971784
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: cbab10ad538fa2b8bcb47e37c77724afe6e30f03
      8a04ee4f
    • A
      Add delay before flush in CompactRange to avoid write stalling · ee1c8026
      Andrew Kryczka 提交于
      Summary:
      - Refactored logic for checking write stall condition to a helper function: `GetWriteStallConditionAndCause`. Now it is decoupled from the logic for updating WriteController / stats in `RecalculateWriteStallConditions`, so we can reuse it for predicting whether write stall will occur.
      - Updated `CompactRange` to first check whether the one additional immutable memtable / L0 file would cause stalling before it flushes. If so, it waits until that is no longer true.
      - Updated `bg_cv_` to be signaled on `SetOptions` calls. The stall conditions `CompactRange` cares about can change when (1) flush finishes, (2) compaction finishes, or (3) options dynamically change. The cv was already signaled for (1) and (2) but not yet for (3).
      Closes https://github.com/facebook/rocksdb/pull/3381
      
      Differential Revision: D6754983
      
      Pulled By: ajkr
      
      fbshipit-source-id: 5613e03f1524df7192dc6ae885d40fd8f091d972
      ee1c8026
    • A
      db_bench separate options for partition index and filters · 0a0fad44
      Andrew Kryczka 提交于
      Summary:
      Some workloads (like my current benchmarking) may want partitioned indexes without partitioned filters. Particularly, when `-optimize_filters_for_hits=true`, the total index size may be larger than the total filter size, so it can make sense to hold all filters in-memory but not all indexes.
      Closes https://github.com/facebook/rocksdb/pull/3492
      
      Differential Revision: D6970092
      
      Pulled By: ajkr
      
      fbshipit-source-id: b7fa1828e1d13829339aefb90fd56eb7c5337f61
      0a0fad44
    • Z
      make flush_reason_ atomic to keep TSAN happy · 3f1bb073
      Zhongyi Xie 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3487
      
      Differential Revision: D6967098
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 48e0accf2e3b3f589ddb797ff8083c8520269bf0
      3f1bb073
  7. 10 2月, 2018 5 次提交
  8. 08 2月, 2018 4 次提交
    • A
      Eliminate a memcpy for uncompressed blocks · e78715c2
      Andrew Kryczka 提交于
      Summary:
      `ReadBlockFromFile` uses a stack buffer to hold small data blocks before passing them to the compression library, which outputs uncompressed data in a heap buffer. In the case of `kNoCompression` there is a `memcpy` to copy from stack buffer to heap buffer.
      
      This PR optimizes `ReadBlockFromFile` to skip the stack buffer for files whose blocks are known to be uncompressed. We determine this using the SST file property, "compression_name", if it's available.
      Closes https://github.com/facebook/rocksdb/pull/3472
      
      Differential Revision: D6920848
      
      Pulled By: ajkr
      
      fbshipit-source-id: 5c753e804efc178b9229ae5dbe6a4adc32031f07
      e78715c2
    • S
      Fix UBSAN Error in WritePreparedTransactionTest · a0931b31
      Siying Dong 提交于
      Summary:
      WritePreparedTransactionTest has the UBSAN error because the wrong order of its parent class construction. Fix it.
      Closes https://github.com/facebook/rocksdb/pull/3478
      
      Differential Revision: D6928975
      
      Pulled By: siying
      
      fbshipit-source-id: 13edfd5cb9cf73f1ac5ae3b6f53061d32783733d
      a0931b31
    • S
      Disable options_settable_test in UBSAN and fix UBSAN failure in blob_… · 821e0b16
      Siying Dong 提交于
      Summary:
      …db_test
      
      options_settable_test won't pass UBSAN so disable it.
      blob_db_test fails in UBSAN as SnapshotList doesn't initialize all the fields in dummy snapshot. Fix it. I don't understand why only blob_db_test fails though.
      Closes https://github.com/facebook/rocksdb/pull/3477
      
      Differential Revision: D6928681
      
      Pulled By: siying
      
      fbshipit-source-id: e31dd300fcdecdfd4f6af279a0987fd0cdec5122
      821e0b16
    • S
      Disable alignment check in UBSAN · 1336a774
      Siying Dong 提交于
      Summary:
      Disable alignment check in UBSAN for now. Now we can't get signals to meaningful failures. We can reenable it after we figure out how we can suppress failures in finer grain manner.
      Closes https://github.com/facebook/rocksdb/pull/3473
      
      Differential Revision: D6925971
      
      Pulled By: siying
      
      fbshipit-source-id: a0f1a242cde866abbc5c1eeee9ff8d1d7d582ac4
      1336a774
  9. 07 2月, 2018 4 次提交
    • M
      Add skip_cc option to TransactionDB::Write · 8feee280
      Maysam Yabandeh 提交于
      Summary:
      Compared to DB::Write, TransactionDB::Write has the additional overhead of creating and initializing an internal transaction object, as well as the overhead of locking/unlocking the keys. This patch extends the TransactionDB::Write with an skip_cc option to allow the users to indicate that the write batch do not conflict with others and the concurrency control and its overhead can be skipped. TransactionDB::Write by default calls DB::Write when skip_cc is set, which works for WriteCommitted WritePolicy. Any other flavor of TransactionDB that is not compatible with this default behavior (such as WritePreparedTxnDB) can extend ::Write and implement their own approach for taking into account the skip_cc optimization.
      Closes https://github.com/facebook/rocksdb/pull/3457
      
      Differential Revision: D6877318
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 56f4e21db87ff71492db4e376fb7c2b03dfeab6b
      8feee280
    • M
      Fix leak report by asan on DuplicateKeys test · 8f8eb4f1
      Maysam Yabandeh 提交于
      Summary:
      Deletes the transaction object at the end of the test.
      Verified by:
      - COMPILE_WITH_ASAN=1 make -j32 transaction_test
      - ./transaction_test --gtest_filter="DBA**Duplicate*"
      Closes https://github.com/facebook/rocksdb/pull/3470
      
      Differential Revision: D6916473
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 8303df25408635d5d3ac2b25f309a3d15957c937
      8f8eb4f1
    • Y
      WritePrepared Txn: update compaction_iterator_test and db_iterator_test · 81736d8a
      Yi Wu 提交于
      Summary:
      Update compaction_iterator_test with write-prepared transaction DB related tests. Transaction related tests are group in CompactionIteratorWithSnapshotCheckerTest. The existing test are duplicated to make them also test with dummy SnapshotChecker that will say every key is visible to every snapshot (this is okay, we still compare sequence number to verify visibility). Merge related tests are disabled and will be revisit in another PR.
      
      Existing db_iterator_tests are also duplicated to test with dummy read_callback that will say every key is committed.
      Closes https://github.com/facebook/rocksdb/pull/3466
      
      Differential Revision: D6909253
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 2ae4656b843a55e2e9ff8beecf21f2832f96cd25
      81736d8a
    • Z
      split RandomizedHarnessTest more ways · 2f299917
      Zhongyi Xie 提交于
      Summary:
      RandomizedHarnessTest enumerates different combinations of test type, compression type, restart interval, etc. For some combinations it takes very long to finish, causing the test to time out in test infrastructure.
      This PR split the test input into smaller trunks in the hope that they will fit in the timeout window. Another possibility is to reduce `num_entries` of course
      Closes https://github.com/facebook/rocksdb/pull/3467
      
      Differential Revision: D6910235
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 717246ee5d21a8a48ad82d4d9c04f9051a66f07f
      2f299917
  10. 06 2月, 2018 2 次提交
    • M
      WritePrepared Txn: Duplicate Keys, Txn Part · 88d8b2a2
      Maysam Yabandeh 提交于
      Summary:
      This patch takes advantage of memtable being able to detect duplicate <key,seq> and returning TryAgain to handle duplicate keys in WritePrepared Txns. Through WriteBatchWithIndex's index it detects existence of at least a duplicate key in the write batch. If duplicate key was reported, it then pays the cost of counting the number of sub-patches by iterating over the write batch and pass it to DBImpl::Write. DB will make use of the provided batch_count to assign proper sequence numbers before sending them to the WAL. When later inserting the batch to the memtable, it increases the seq each time memtbale reports a duplicate (a sub-patch in our counting) and tries again.
      Closes https://github.com/facebook/rocksdb/pull/3455
      
      Differential Revision: D6873699
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: db8487526c3a5dc1ddda0ea49f0f979b26ae648d
      88d8b2a2
    • A
      Handle error return from WriteBuffer() · 4b124fb9
      Anand Ananthabhotla 提交于
      Summary:
      There are a couple of places where we swallow any error from
      WriteBuffer() - in SwitchMemtable() and DBImpl::CloseImpl(). Propagate
      the error up in those cases rather than ignoring it.
      Closes https://github.com/facebook/rocksdb/pull/3404
      
      Differential Revision: D6879954
      
      Pulled By: anand1976
      
      fbshipit-source-id: 2ef88b554be5286b0a8bad7384ba17a105395bdb
      4b124fb9
  11. 04 2月, 2018 1 次提交