1. 06 4月, 2018 3 次提交
    • D
      Fix pre_release callback argument list. · 147dfc7b
      Dmitri Smirnov 提交于
      Summary:
      Primitive types constness does not affect the signature of the
        method and has no influence on whether the overriding method would
        actually have that const bool instead of just bool. In addition,
        it is rarely useful but does produce a compatibility warnings
        in VS 2015 compiler.
      Closes https://github.com/facebook/rocksdb/pull/3663
      
      Differential Revision: D7475739
      
      Pulled By: ajkr
      
      fbshipit-source-id: fb275378b5acc397399420ae6abb4b6bfe5bd32f
      147dfc7b
    • Y
      Blob DB: blob_dump to show uncompressed values · 36a9f229
      Yi Wu 提交于
      Summary:
      Make blob_dump tool able to show uncompressed values if the blob file is compressed. Also show total compressed vs. raw size at the end if --show_summary is provided.
      Closes https://github.com/facebook/rocksdb/pull/3633
      
      Differential Revision: D7348926
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: ca709cb4ed5cf6a550ff2987df8033df81516f8e
      36a9f229
    • Z
      fix build for rocksdb lite · c827b2dc
      Zhongyi Xie 提交于
      Summary:
      currently rocksdb lite build fails due to the following errors:
      > db/db_sst_test.cc:29:51: error: ‘FlushJobInfo’ does not name a type
         virtual void OnFlushCompleted(DB* /*db*/, const FlushJobInfo& info) override {
                                                         ^
      db/db_sst_test.cc:29:16: error: ‘virtual void rocksdb::FlushedFileCollector::OnFlushCompleted(rocksdb::DB*, const int&)’ marked ‘override’, but does not override
         virtual void OnFlushCompleted(DB* /*db*/, const FlushJobInfo& info) override {
                      ^
      db/db_sst_test.cc:24:7: error: ‘class rocksdb::FlushedFileCollector’ has virtual functions and accessible non-virtual destructor [-Werror=non-virtual-dtor]
       class FlushedFileCollector : public EventListener {
             ^
      db/db_sst_test.cc: In member function ‘virtual void rocksdb::FlushedFileCollector::OnFlushCompleted(rocksdb::DB*, const int&)’:
      db/db_sst_test.cc:31:35: error: request for member ‘file_path’ in ‘info’, which is of non-class type ‘const int’
           flushed_files_.push_back(info.file_path);
                                         ^
      cc1plus: all warnings being treated as errors
      make: *** [db/db_sst_test.o] Error 1
      Closes https://github.com/facebook/rocksdb/pull/3676
      
      Differential Revision: D7493006
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 77dff0a5b23e27db51be9b9798e3744e6fdec64f
      c827b2dc
  2. 05 4月, 2018 1 次提交
  3. 04 4月, 2018 2 次提交
    • D
      Make Optimistic Tx database stackable · 2a62ca17
      Dmitri Smirnov 提交于
      Summary:
      This change models Optimistic Tx db after Pessimistic TX db. The motivation for this change is to make the ptr polymorphic so it can be held by the same raw or smart ptr.
      
      Currently, due to the inheritance of the Opt Tx db not being rooted in the manner of Pess Tx from a single DB root it is more difficult to write clean code and have clear ownership of the database in cases when options dictate instantiate of plan DB, Pess Tx DB or Opt tx db.
      Closes https://github.com/facebook/rocksdb/pull/3566
      
      Differential Revision: D7184502
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 31d06efafd79497bb0c230e971857dba3bd962c3
      2a62ca17
    • A
      Reduce default --nooverwritepercent in black-box crash tests · b058a337
      Andrew Kryczka 提交于
      Summary:
      Previously `python tools/db_crashtest.py blackbox` would do no useful work as the crash interval (two minutes) was shorter than the preparation phase. The preparation phase is slow because of the ridiculously inefficient way it computes which keys should not be overwritten. It was doing this for 60M keys since default values were `FLAGS_nooverwritepercent == 60` and `FLAGS_max_key == 100000000`.
      
      Move the "nooverwritepercent" override from whitebox-specific to the general options so it also applies to blackbox test runs. Now preparation phase takes a few seconds.
      Closes https://github.com/facebook/rocksdb/pull/3671
      
      Differential Revision: D7457732
      
      Pulled By: ajkr
      
      fbshipit-source-id: 601f4461a6a7e49e50449dcf15aebc9b8a98d6f0
      b058a337
  4. 03 4月, 2018 6 次提交
    • A
      Some small improvements to the build_tools · 12b400e8
      Adam Retter 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3664
      
      Differential Revision: D7459433
      
      Pulled By: sagar0
      
      fbshipit-source-id: 3817e5d45fc70e83cb26f9800eaa0f4566c8dc0e
      12b400e8
    • S
      Level Compaction with TTL · 04c11b86
      Sagar Vemuri 提交于
      Summary:
      Level Compaction with TTL.
      
      As of today, a file could exist in the LSM tree without going through the compaction process for a really long time if there are no updates to the data in the file's key range. For example, in certain use cases, the keys are not actually "deleted"; instead they are just set to empty values. There might not be any more writes to this "deleted" key range, and if so, such data could remain in the LSM for a really long time resulting in wasted space.
      
      Introducing a TTL could solve this problem. Files (and, in turn, data) older than TTL will be scheduled for compaction when there is no other background work. This will make the data go through the regular compaction process and get rid of old unwanted data.
      This also has the (good) side-effect of all the data in the non-bottommost level being newer than ttl, and all data in the bottommost level older than ttl. It could lead to more writes while reducing space.
      
      This functionality can be controlled by the newly introduced column family option -- ttl.
      
      TODO for later:
      - Make ttl mutable
      - Extend TTL to Universal compaction as well? (TTL is already supported in FIFO)
      - Maybe deprecate CompactionOptionsFIFO.ttl in favor of this new ttl option.
      Closes https://github.com/facebook/rocksdb/pull/3591
      
      Differential Revision: D7275442
      
      Pulled By: sagar0
      
      fbshipit-source-id: dcba484717341200d419b0953dafcdf9eb2f0267
      04c11b86
    • K
      Fix 3-way SSE4.2 crc32c usage in MSVC with CMake · df144244
      Koby Kahane 提交于
      Summary:
      The introduction of the 3-way SSE4.2 optimized crc32c implementation in commit f54d7f5f added the `HAVE_PCLMUL` definition when the compiler supports intrinsics for that instruction, but did not modify CMakeLists.txt to set that definition on MSVC when appropriate. As a result, 3-way SSE4.2 is not used in MSVC builds with CMake although it could be.
      
      Since the existing test program in CMakeLists.txt for `HAVE_SSE42` already uses `_mm_clmulepi64_si128` which is a PCLMUL instruction, this PR sets `HAVE_PCLMUL` as well if that program builds successfully, fixing the problem.
      Closes https://github.com/facebook/rocksdb/pull/3673
      
      Differential Revision: D7473975
      
      Pulled By: miasantreble
      
      fbshipit-source-id: bc346b9eb38920e427aa1a253e6dd9811efa269e
      df144244
    • M
      WritePrepared Txn: smallest_prepare optimization · b225de7e
      Maysam Yabandeh 提交于
      Summary:
      The is an optimization to reduce lookup in the CommitCache when querying IsInSnapshot. The optimization takes the smallest uncommitted data at the time that the snapshot was taken and if the sequence number of the read data is lower than that number it assumes the data as committed.
      To implement this optimization two changes are required: i) The AddPrepared function must be called sequentially to avoid out of order insertion in the PrepareHeap (otherwise the top of the heap does not indicate the smallest prepare in future too), ii) non-2PC transactions also call AddPrepared if they do not commit in one step.
      Closes https://github.com/facebook/rocksdb/pull/3649
      
      Differential Revision: D7388630
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: b79506238c17467d590763582960d4d90181c600
      b225de7e
    • A
      Enable cancelling manual compactions if they hit the sfm size limit · 1579626d
      Amy Tai 提交于
      Summary:
      Manual compactions should be cancelled, just like scheduled compactions are cancelled, if sfm->EnoughRoomForCompaction is not true.
      Closes https://github.com/facebook/rocksdb/pull/3670
      
      Differential Revision: D7457683
      
      Pulled By: amytai
      
      fbshipit-source-id: 669b02fdb707f75db576d03d2c818fb98d1876f5
      1579626d
    • Z
      Revert "Avoid adding tombstones of the same file to RangeDelAggregato… · 44653c7b
      Zhongyi Xie 提交于
      Summary:
      …r multiple times"
      
      This reverts commit e80709a3.
      
      lingbin PR https://github.com/facebook/rocksdb/pull/3635 is causing some performance regression for seekrandom workloads
      I'm reverting the commit for now but feel free to submit new patches 😃
      
      To reproduce the regression, you can run the following db_bench command
      > ./db_bench --benchmarks=fillrandom,seekrandomwhilewriting --threads=1 --num=1000000 --reads=150000 --key_size=66 --value_size=1262 --statistics=0 --compression_ratio=0.5 --histogram=1 --seek_nexts=1 --stats_per_interval=1 --stats_interval_seconds=600 --max_background_flushes=4 --num_multi_db=1 --max_background_compactions=16 --seed=1522388277 -write_buffer_size=1048576 --level0_file_num_compaction_trigger=10000 --compression_type=none
      
      write stats printed by db_bench:
      
      Table | | | | | | | | | | |
       --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | ---
      revert commit | Percentiles: | P50: | 80.77  | P75: |102.94  |P99: | 1786.44 | P99.9: | 1892.39 |P99.99: 2645.10 |
      keep commit | Percentiles: | P50: | 221.72 | P75: | 686.62 | P99: | 1842.57 | P99.9: | 1899.70|  P99.99: 2814.29|
      Closes https://github.com/facebook/rocksdb/pull/3672
      
      Differential Revision: D7463315
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 8e779c87591127f2c3694b91a56d9b459011959d
      44653c7b
  5. 02 4月, 2018 1 次提交
  6. 31 3月, 2018 4 次提交
    • F
      Throw NoSpace instead of IOError when out of space. · d12112d0
      Fosco Marotto 提交于
      Summary:
      Replaces #1702 and is updated from feedback.
      Closes https://github.com/facebook/rocksdb/pull/3531
      
      Differential Revision: D7457395
      
      Pulled By: gfosco
      
      fbshipit-source-id: 25a21dd8cfa5a6e42e024208b444d9379d920c82
      d12112d0
    • F
      Update buckifier and TARGETS · d9bfb35d
      Fosco Marotto 提交于
      Summary:
      Some flags used via make were not applied in the buckifier/targets file, causing some failures to be missed by testing infra ( ie the one fixed by #3434 )
      Closes https://github.com/facebook/rocksdb/pull/3452
      
      Differential Revision: D7457419
      
      Pulled By: gfosco
      
      fbshipit-source-id: e4aed2915ca3038c1485bbdeebedfc33d5704a49
      d9bfb35d
    • F
      Update 64-bit shift in compression.h · c3eb762b
      Fosco Marotto 提交于
      Summary:
      This was failing the build on windows with zstd, warning treated as an error, 32-bit shift implicitly converted to 64-bit.
      Closes https://github.com/facebook/rocksdb/pull/3624
      
      Differential Revision: D7307883
      
      Pulled By: gfosco
      
      fbshipit-source-id: 68110e9b5b1b59b668dec6cf86b67556402574e7
      c3eb762b
    • M
      Skip deleted WALs during recovery · 73f21a7b
      Maysam Yabandeh 提交于
      Summary:
      This patch record the deleted WAL numbers in the manifest to ignore them and any WAL older than them during recovery. This is to avoid scenarios when we have a gap between the WAL files are fed to the recovery procedure. The gap could happen by for example out-of-order WAL deletion. Such gap could cause problems in 2PC recovery where the prepared and commit entry are placed into two separate WAL and gap in the WALs could result into not processing the WAL with the commit entry and hence breaking the 2PC recovery logic.
      Closes https://github.com/facebook/rocksdb/pull/3488
      
      Differential Revision: D6967893
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 13119feb155a08ab6d4909f437c7a750480dc8a1
      73f21a7b
  7. 30 3月, 2018 1 次提交
    • M
      WritePrepared Txn: fix a bug in publishing recoverable state seq · 89d989ed
      Maysam Yabandeh 提交于
      Summary:
      When using two_write_queue, the published seq and the last allocated sequence could be ahead of the LastSequence, even if both write queues are stopped as in WriteRecoverableState. The patch fixes a bug in WriteRecoverableState in which LastSequence was used as a reference but the result was applied to last fetched sequence and last published seq.
      Closes https://github.com/facebook/rocksdb/pull/3665
      
      Differential Revision: D7446099
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 1449bed9aed8e9db6af85946efd347cb8efd3c0b
      89d989ed
  8. 29 3月, 2018 3 次提交
    • A
      Allow rocksdbjavastatic to also be built as debug build · 3cb59195
      Adam Retter 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3654
      
      Differential Revision: D7417948
      
      Pulled By: sagar0
      
      fbshipit-source-id: 9514df9328181e54a6384764444c0c7ce66e7f5f
      3cb59195
    • M
      WritePrepared Txn: make recoverable state visible after flush · 0377ff9d
      Maysam Yabandeh 提交于
      Summary:
      Currently if the CommitTimeWriteBatch is set to be used only as a state that is required only for recovery , the user cannot see that in DB until it is restarted. This while the state is already inserted into the DB after the memtable flush. It would be useful for debugging if make this state visible to the user after the flush by committing it. The patch does it by a invoking a callback that does the commit on the recoverable state.
      Closes https://github.com/facebook/rocksdb/pull/3661
      
      Differential Revision: D7424577
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 137f9408662f0853938b33fa440f27f04c1bbf5c
      0377ff9d
    • Y
      Fix race condition causing double deletion of ssts · 1f5def16
      Yanqin Jin 提交于
      Summary:
      Possible interleaved execution of background compaction thread calling `FindObsoleteFiles (no full scan) / PurgeObsoleteFiles` and user thread calling `FindObsoleteFiles (full scan) / PurgeObsoleteFiles` can lead to race condition on which RocksDB attempts to delete a file twice. The second attempt will fail and return `IO error`. This may occur to other files,  but this PR targets sst.
      Also add a unit test to verify that this PR fixes the issue.
      
      The newly added unit test `obsolete_files_test` has a test case for this scenario, implemented in `ObsoleteFilesTest#RaceForObsoleteFileDeletion`. `TestSyncPoint`s are used to coordinate the interleaving the `user_thread` and background compaction thread. They execute as follows
      ```
      timeline              user_thread                background_compaction thread
      t1   |                                          FindObsoleteFiles(full_scan=false)
      t2   |     FindObsoleteFiles(full_scan=true)
      t3   |                                          PurgeObsoleteFiles
      t4   |     PurgeObsoleteFiles
           V
      ```
      When `user_thread` invokes `FindObsoleteFiles` with full scan, it collects ALL files in RocksDB directory, including the ones that background compaction thread have collected in its job context. Then `user_thread` will see an IO error when trying to delete these files in `PurgeObsoleteFiles` because background compaction thread has already deleted the file in `PurgeObsoleteFiles`.
      To fix this, we make RocksDB remember which (SST) files have been found by threads after calling `FindObsoleteFiles` (see `DBImpl#files_grabbed_for_purge_`). Therefore, when another thread calls `FindObsoleteFiles` with full scan, it will not collect such files.
      
      ajkr could you take a look and comment? Thanks!
      Closes https://github.com/facebook/rocksdb/pull/3638
      
      Differential Revision: D7384372
      
      Pulled By: riversand963
      
      fbshipit-source-id: 01489516d60012e722ee65a80e1449e589ce26d3
      1f5def16
  9. 28 3月, 2018 2 次提交
    • S
      Update comments about MergeOperator::AllowSingleOperand · 90c54234
      Sagar Vemuri 提交于
      Summary:
      Updated comments around AllowSingleOperand.
      Reason: A couple of users were confused and encountered issues due to no overriding PartialMerge with AllowSingleOperand=true.
      
      I'll also look into modifying the default merge operator implementation so that overriding PartialMerge is not mandatory when AllowSingleOp=true.
      Closes https://github.com/facebook/rocksdb/pull/3659
      
      Differential Revision: D7422691
      
      Pulled By: sagar0
      
      fbshipit-source-id: 3d075a6ced0120f5d65cb7ae5412936f1862f342
      90c54234
    • S
      Fix a leak in FilterBlockBuilder when adding prefix · d6876702
      Sagar Vemuri 提交于
      Summary:
      Our valgrind continuous test found an interesting leak which got introduced in #3614. We were adding the prefix key before saving the previous prefix start offset, due to which previous prefix offset is always incorrect. Fixed it by saving the the previous sate before adding the key.
      Closes https://github.com/facebook/rocksdb/pull/3660
      
      Differential Revision: D7418698
      
      Pulled By: sagar0
      
      fbshipit-source-id: 9933685f943cf2547ed5c553f490035a2fa785cf
      d6876702
  10. 27 3月, 2018 4 次提交
    • A
      Align SST file data blocks to avoid spanning multiple pages · f9f4d40f
      Anand Ananthabhotla 提交于
      Summary:
      Provide a block_align option in BlockBasedTableOptions to allow
      alignment of SST file data blocks. This will avoid higher
      IOPS/throughput load due to < 4KB data blocks spanning 2 4KB pages.
      When this option is set to true, the block alignment is set to lower of
      block size and 4KB.
      Closes https://github.com/facebook/rocksdb/pull/3502
      
      Differential Revision: D7400897
      
      Pulled By: anand1976
      
      fbshipit-source-id: 04cc3bd144e88e3431a4f97604e63ad7a0f06d44
      f9f4d40f
    • M
      WritePrepared Txn: Increase commit cache size to 2^23 · 0999e9b7
      Maysam Yabandeh 提交于
      Summary:
      Current commit cache size is 2^21. This was due to a type. With 2^23 commit entries we can have transactions as long as 64s without incurring the cost of having them evicted from the commit cache before their commit. Here is the math:
      2^23 / 2 (one out of two seq numbers are for commit) / 2^16 TPS = 2^6 = 64s
      Closes https://github.com/facebook/rocksdb/pull/3657
      
      Differential Revision: D7411211
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: e7cacf40579f3acf940643d8a1cfe5dd201caa35
      0999e9b7
    • M
      Fix race condition via concurrent FlushWAL · 35a4469b
      Maysam Yabandeh 提交于
      Summary:
      Currently log_writer->AddRecord in WriteImpl is protected from concurrent calls via FlushWAL only if two_write_queues_ option is set. The patch fixes the problem by i) skip log_writer->AddRecord in FlushWAL if manual_wal_flush is not set, ii) protects log_writer->AddRecord in WriteImpl via log_write_mutex_ if manual_wal_flush_ is set but two_write_queues_ is not.
      
      Fixes #3599
      Closes https://github.com/facebook/rocksdb/pull/3656
      
      Differential Revision: D7405608
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: d6cc265051c77ae49c7c6df4f427350baaf46934
      35a4469b
    • S
      Exclude MySQLStyleTransactionTest.TransactionStressTest* from valgrind · 23f9d93f
      Sagar Vemuri 提交于
      Summary:
      I found that each instance of MySQLStyleTransactionTest.TransactionStressTest/x is taking more than 10 hours to complete on our continuous testing environment, causing the whole valgrind run to timeout after a day. So excluding these tests.
      Closes https://github.com/facebook/rocksdb/pull/3652
      
      Differential Revision: D7400332
      
      Pulled By: sagar0
      
      fbshipit-source-id: 987810574506d01487adf7c2de84d4817ec3d22d
      23f9d93f
  11. 24 3月, 2018 7 次提交
    • M
      WritePrepared Txn: AddPrepared for all sub-batches · 3e417a66
      Maysam Yabandeh 提交于
      Summary:
      Currently AddPrepared is performed only on the first sub-batch if there are duplicate keys in the write batch. This could cause a problem if the transaction takes too long to commit and the seq number of the first sub-patch moved to old_prepared_ but not the seq of the later ones. The patch fixes this by calling AddPrepared for all sub-patches.
      Closes https://github.com/facebook/rocksdb/pull/3651
      
      Differential Revision: D7388635
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 0ccd80c150d9bc42fe955e49ddb9d7ca353067b4
      3e417a66
    • D
      Imporve perf of random read and insert compare by suggesting inlining to the compiler · d382ae7d
      Dmitri Smirnov 提交于
      Summary:
      Results from 2015 compiler. This improve sequential insert. Random Read results are inconclusive but I hope 2017 will do a better job at inlining.
      
      Before:
      fillseq      :       **3.638 micros/op 274866 ops/sec;  213.9 MB/s**
      
      After:
      fillseq      :       **3.379 micros/op 295979 ops/sec;  230.3 MB/s**
      Closes https://github.com/facebook/rocksdb/pull/3645
      
      Differential Revision: D7382711
      
      Pulled By: siying
      
      fbshipit-source-id: 092a07ffe8a6e598d1226ceff0f11b35e6c5c8e4
      d382ae7d
    • D
      Refactor sync_point to make implementation either customizable or replaceable · 53d66df0
      Dmitri Smirnov 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3637
      
      Differential Revision: D7354373
      
      Pulled By: ajkr
      
      fbshipit-source-id: 6816c7bbc192ed0fb944942b11c7074bf24eddf1
      53d66df0
    • S
      Add 5.11 and 5.12 to tools/check_format_compatible.sh · a993c013
      Sagar Vemuri 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3646
      
      Differential Revision: D7384727
      
      Pulled By: sagar0
      
      fbshipit-source-id: f713af7adb2ffea5303bbf0fac8a8a1630af7b38
      a993c013
    • daheiantian's avatar
      Avoid adding tombstones of the same file to RangeDelAggregator multiple times · e80709a3
      daheiantian 提交于
      Summary:
      RangeDelAggregator will remember the files whose range tombstones have been added,
      so the caller can check whether the file has been added before call AddTombstones.
      
      Closes https://github.com/facebook/rocksdb/pull/3635
      
      Differential Revision: D7354604
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9b9f7ec130556028df417e650711554b46d8d107
      e80709a3
    • S
      Add Java-API-Changes section to History · 7ffce280
      Sagar Vemuri 提交于
      Summary:
      We have not been updating our HISTORY.md change log with the RocksJava changes. Going forward, lets add Java changes also to HISTORY.md.
      There is an old java/HISTORY-JAVA.md, but it hasn't been updated in years. It is much easier to remember to update the change log in a single file, HISTORY.md.
      
      I added information about shared block cache here, which was introduced in #3623.
      Closes https://github.com/facebook/rocksdb/pull/3647
      
      Differential Revision: D7384448
      
      Pulled By: sagar0
      
      fbshipit-source-id: 9b6e569f44e6df5cb7ba06413d9975df0b517d20
      7ffce280
    • R
      InlineSkiplist: don't decode keys unnecessarily during comparisons · 09b6bf82
      Radoslaw Zarzynski 提交于
      Summary:
      Summary
      ========
      `InlineSkipList<>::Insert` takes the `key` parameter as a C-string. Then, it performs multiple comparisons with it requiring the `GetLengthPrefixedSlice()` to be spawn in `MemTable::KeyComparator::operator()(const char* prefix_len_key1, const char* prefix_len_key2)` on the same data over and over. The patch tries to optimize that.
      
      Rough performance comparison
      =====
      Big keys, no compression.
      
      ```
      $ ./db_bench --writes 20000000 --benchmarks="fillrandom" --compression_type none -key_size 256
      (...)
      fillrandom   :       4.222 micros/op 236836 ops/sec;   80.4 MB/s
      ```
      
      ```
      $ ./db_bench --writes 20000000 --benchmarks="fillrandom" --compression_type none -key_size 256
      (...)
      fillrandom   :       4.064 micros/op 246059 ops/sec;   83.5 MB/s
      ```
      
      TODO
      ======
      In ~~a separated~~ this PR:
      - [x] Go outside the write path. Maybe even eradicate the C-string-taking variant of `KeyIsAfterNode` entirely.
      - [x] Try to cache the transformations applied by `KeyComparator` & friends in situations where we havy many comparisons with the same key.
      Closes https://github.com/facebook/rocksdb/pull/3516
      
      Differential Revision: D7059300
      
      Pulled By: ajkr
      
      fbshipit-source-id: 6f027dbb619a488129f79f79b5f7dbe566fb2dbb
      09b6bf82
  12. 23 3月, 2018 6 次提交
    • Z
      FlushReason improvement · 1cbc96d2
      Zhongyi Xie 提交于
      Summary:
      Right now flush reason "SuperVersion Change" covers a few different scenarios which is a bit vague. For example, the following db_bench job should trigger "Write Buffer Full"
      
      > $ TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304
      $ grep 'flush_reason' /dev/shm/dbbench/LOG
      ...
      2018/03/06-17:30:42.543638 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242543634, "job": 192, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018024, "flush_reason": "SuperVersion Change"}
      2018/03/06-17:30:42.569541 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242569536, "job": 193, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "SuperVersion Change"}
      2018/03/06-17:30:42.596396 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242596392, "job": 194, "event": "flush_started", "num_memtables": 1, "num_entries": 7008, "num_deletes": 0, "memory_usage": 1018048, "flush_reason": "SuperVersion Change"}
      2018/03/06-17:30:42.622444 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242622440, "job": 195, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "SuperVersion Change"}
      
      With the fix:
      > 2018/03/19-14:40:02.341451 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602341444, "job": 98, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018008, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.379655 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602379642, "job": 100, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018016, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.418479 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602418474, "job": 101, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.455084 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602455079, "job": 102, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018048, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.492293 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602492288, "job": 104, "event": "flush_started", "num_memtables": 1, "num_entries": 7007, "num_deletes": 0, "memory_usage": 1018056, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.528720 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602528715, "job": 105, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.566255 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602566238, "job": 107, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018112, "flush_reason": "Write Buffer Full"}
      Closes https://github.com/facebook/rocksdb/pull/3627
      
      Differential Revision: D7328772
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 67c94065fbdd36930f09930aad0aaa6d2c152bb8
      1cbc96d2
    • A
      Add unit test for WAL corruption · 82137f0c
      Andrew Kryczka 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3618
      
      Differential Revision: D7301053
      
      Pulled By: ajkr
      
      fbshipit-source-id: a9dde90caa548c294d03d6386f78428c8536ca14
      82137f0c
    • S
      Fsync after writing global seq number in ExternalSstFileIngestionJob · 2e3d4077
      Sagar Vemuri 提交于
      Summary:
      Fsync after writing global sequence number to the ingestion file in ExternalSstFileIngestionJob. Otherwise the file metadata could be incorrect.
      Closes https://github.com/facebook/rocksdb/pull/3644
      
      Differential Revision: D7373813
      
      Pulled By: sagar0
      
      fbshipit-source-id: 4da2c9e71a8beb5c08b4ac955f288ee1576358b8
      2e3d4077
    • A
      Rename function for handling WAL write error · 4d51feab
      Andrew Kryczka 提交于
      Summary:
      It was misnamed. It actually updates `bg_error_` if `PreprocessWrite()` or `WriteToWAL()` fail, not related to the user callback.
      Closes https://github.com/facebook/rocksdb/pull/3485
      
      Differential Revision: D6955787
      
      Pulled By: ajkr
      
      fbshipit-source-id: bd7afc3fdb7a52830c021cbfc25fcbc3ab7d5e10
      4d51feab
    • S
      SstFileManager: add bytes_max_delete_chunk · 118058ba
      Siying Dong 提交于
      Summary:
      Add `bytes_max_delete_chunk` in SstFileManager so that we can drop a large file in multiple batches.
      Closes https://github.com/facebook/rocksdb/pull/3640
      
      Differential Revision: D7358679
      
      Pulled By: siying
      
      fbshipit-source-id: ef17f0da2f5723dbece2669485a9b91b3edc0bb7
      118058ba
    • A
      log value of CompressionOptions::zstd_max_train_bytes · 88c3e26c
      Andrew Kryczka 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3587
      
      Differential Revision: D7206901
      
      Pulled By: ajkr
      
      fbshipit-source-id: 5d4b1a2653627b44aa3c22db7d98c9cd5dcdb67a
      88c3e26c