1. 29 3月, 2018 2 次提交
    • M
      WritePrepared Txn: make recoverable state visible after flush · 0377ff9d
      Maysam Yabandeh 提交于
      Summary:
      Currently if the CommitTimeWriteBatch is set to be used only as a state that is required only for recovery , the user cannot see that in DB until it is restarted. This while the state is already inserted into the DB after the memtable flush. It would be useful for debugging if make this state visible to the user after the flush by committing it. The patch does it by a invoking a callback that does the commit on the recoverable state.
      Closes https://github.com/facebook/rocksdb/pull/3661
      
      Differential Revision: D7424577
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 137f9408662f0853938b33fa440f27f04c1bbf5c
      0377ff9d
    • Y
      Fix race condition causing double deletion of ssts · 1f5def16
      Yanqin Jin 提交于
      Summary:
      Possible interleaved execution of background compaction thread calling `FindObsoleteFiles (no full scan) / PurgeObsoleteFiles` and user thread calling `FindObsoleteFiles (full scan) / PurgeObsoleteFiles` can lead to race condition on which RocksDB attempts to delete a file twice. The second attempt will fail and return `IO error`. This may occur to other files,  but this PR targets sst.
      Also add a unit test to verify that this PR fixes the issue.
      
      The newly added unit test `obsolete_files_test` has a test case for this scenario, implemented in `ObsoleteFilesTest#RaceForObsoleteFileDeletion`. `TestSyncPoint`s are used to coordinate the interleaving the `user_thread` and background compaction thread. They execute as follows
      ```
      timeline              user_thread                background_compaction thread
      t1   |                                          FindObsoleteFiles(full_scan=false)
      t2   |     FindObsoleteFiles(full_scan=true)
      t3   |                                          PurgeObsoleteFiles
      t4   |     PurgeObsoleteFiles
           V
      ```
      When `user_thread` invokes `FindObsoleteFiles` with full scan, it collects ALL files in RocksDB directory, including the ones that background compaction thread have collected in its job context. Then `user_thread` will see an IO error when trying to delete these files in `PurgeObsoleteFiles` because background compaction thread has already deleted the file in `PurgeObsoleteFiles`.
      To fix this, we make RocksDB remember which (SST) files have been found by threads after calling `FindObsoleteFiles` (see `DBImpl#files_grabbed_for_purge_`). Therefore, when another thread calls `FindObsoleteFiles` with full scan, it will not collect such files.
      
      ajkr could you take a look and comment? Thanks!
      Closes https://github.com/facebook/rocksdb/pull/3638
      
      Differential Revision: D7384372
      
      Pulled By: riversand963
      
      fbshipit-source-id: 01489516d60012e722ee65a80e1449e589ce26d3
      1f5def16
  2. 28 3月, 2018 2 次提交
    • S
      Update comments about MergeOperator::AllowSingleOperand · 90c54234
      Sagar Vemuri 提交于
      Summary:
      Updated comments around AllowSingleOperand.
      Reason: A couple of users were confused and encountered issues due to no overriding PartialMerge with AllowSingleOperand=true.
      
      I'll also look into modifying the default merge operator implementation so that overriding PartialMerge is not mandatory when AllowSingleOp=true.
      Closes https://github.com/facebook/rocksdb/pull/3659
      
      Differential Revision: D7422691
      
      Pulled By: sagar0
      
      fbshipit-source-id: 3d075a6ced0120f5d65cb7ae5412936f1862f342
      90c54234
    • S
      Fix a leak in FilterBlockBuilder when adding prefix · d6876702
      Sagar Vemuri 提交于
      Summary:
      Our valgrind continuous test found an interesting leak which got introduced in #3614. We were adding the prefix key before saving the previous prefix start offset, due to which previous prefix offset is always incorrect. Fixed it by saving the the previous sate before adding the key.
      Closes https://github.com/facebook/rocksdb/pull/3660
      
      Differential Revision: D7418698
      
      Pulled By: sagar0
      
      fbshipit-source-id: 9933685f943cf2547ed5c553f490035a2fa785cf
      d6876702
  3. 27 3月, 2018 4 次提交
    • A
      Align SST file data blocks to avoid spanning multiple pages · f9f4d40f
      Anand Ananthabhotla 提交于
      Summary:
      Provide a block_align option in BlockBasedTableOptions to allow
      alignment of SST file data blocks. This will avoid higher
      IOPS/throughput load due to < 4KB data blocks spanning 2 4KB pages.
      When this option is set to true, the block alignment is set to lower of
      block size and 4KB.
      Closes https://github.com/facebook/rocksdb/pull/3502
      
      Differential Revision: D7400897
      
      Pulled By: anand1976
      
      fbshipit-source-id: 04cc3bd144e88e3431a4f97604e63ad7a0f06d44
      f9f4d40f
    • M
      WritePrepared Txn: Increase commit cache size to 2^23 · 0999e9b7
      Maysam Yabandeh 提交于
      Summary:
      Current commit cache size is 2^21. This was due to a type. With 2^23 commit entries we can have transactions as long as 64s without incurring the cost of having them evicted from the commit cache before their commit. Here is the math:
      2^23 / 2 (one out of two seq numbers are for commit) / 2^16 TPS = 2^6 = 64s
      Closes https://github.com/facebook/rocksdb/pull/3657
      
      Differential Revision: D7411211
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: e7cacf40579f3acf940643d8a1cfe5dd201caa35
      0999e9b7
    • M
      Fix race condition via concurrent FlushWAL · 35a4469b
      Maysam Yabandeh 提交于
      Summary:
      Currently log_writer->AddRecord in WriteImpl is protected from concurrent calls via FlushWAL only if two_write_queues_ option is set. The patch fixes the problem by i) skip log_writer->AddRecord in FlushWAL if manual_wal_flush is not set, ii) protects log_writer->AddRecord in WriteImpl via log_write_mutex_ if manual_wal_flush_ is set but two_write_queues_ is not.
      
      Fixes #3599
      Closes https://github.com/facebook/rocksdb/pull/3656
      
      Differential Revision: D7405608
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: d6cc265051c77ae49c7c6df4f427350baaf46934
      35a4469b
    • S
      Exclude MySQLStyleTransactionTest.TransactionStressTest* from valgrind · 23f9d93f
      Sagar Vemuri 提交于
      Summary:
      I found that each instance of MySQLStyleTransactionTest.TransactionStressTest/x is taking more than 10 hours to complete on our continuous testing environment, causing the whole valgrind run to timeout after a day. So excluding these tests.
      Closes https://github.com/facebook/rocksdb/pull/3652
      
      Differential Revision: D7400332
      
      Pulled By: sagar0
      
      fbshipit-source-id: 987810574506d01487adf7c2de84d4817ec3d22d
      23f9d93f
  4. 24 3月, 2018 7 次提交
    • M
      WritePrepared Txn: AddPrepared for all sub-batches · 3e417a66
      Maysam Yabandeh 提交于
      Summary:
      Currently AddPrepared is performed only on the first sub-batch if there are duplicate keys in the write batch. This could cause a problem if the transaction takes too long to commit and the seq number of the first sub-patch moved to old_prepared_ but not the seq of the later ones. The patch fixes this by calling AddPrepared for all sub-patches.
      Closes https://github.com/facebook/rocksdb/pull/3651
      
      Differential Revision: D7388635
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 0ccd80c150d9bc42fe955e49ddb9d7ca353067b4
      3e417a66
    • D
      Imporve perf of random read and insert compare by suggesting inlining to the compiler · d382ae7d
      Dmitri Smirnov 提交于
      Summary:
      Results from 2015 compiler. This improve sequential insert. Random Read results are inconclusive but I hope 2017 will do a better job at inlining.
      
      Before:
      fillseq      :       **3.638 micros/op 274866 ops/sec;  213.9 MB/s**
      
      After:
      fillseq      :       **3.379 micros/op 295979 ops/sec;  230.3 MB/s**
      Closes https://github.com/facebook/rocksdb/pull/3645
      
      Differential Revision: D7382711
      
      Pulled By: siying
      
      fbshipit-source-id: 092a07ffe8a6e598d1226ceff0f11b35e6c5c8e4
      d382ae7d
    • D
      Refactor sync_point to make implementation either customizable or replaceable · 53d66df0
      Dmitri Smirnov 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3637
      
      Differential Revision: D7354373
      
      Pulled By: ajkr
      
      fbshipit-source-id: 6816c7bbc192ed0fb944942b11c7074bf24eddf1
      53d66df0
    • S
      Add 5.11 and 5.12 to tools/check_format_compatible.sh · a993c013
      Sagar Vemuri 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3646
      
      Differential Revision: D7384727
      
      Pulled By: sagar0
      
      fbshipit-source-id: f713af7adb2ffea5303bbf0fac8a8a1630af7b38
      a993c013
    • daheiantian's avatar
      Avoid adding tombstones of the same file to RangeDelAggregator multiple times · e80709a3
      daheiantian 提交于
      Summary:
      RangeDelAggregator will remember the files whose range tombstones have been added,
      so the caller can check whether the file has been added before call AddTombstones.
      
      Closes https://github.com/facebook/rocksdb/pull/3635
      
      Differential Revision: D7354604
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9b9f7ec130556028df417e650711554b46d8d107
      e80709a3
    • S
      Add Java-API-Changes section to History · 7ffce280
      Sagar Vemuri 提交于
      Summary:
      We have not been updating our HISTORY.md change log with the RocksJava changes. Going forward, lets add Java changes also to HISTORY.md.
      There is an old java/HISTORY-JAVA.md, but it hasn't been updated in years. It is much easier to remember to update the change log in a single file, HISTORY.md.
      
      I added information about shared block cache here, which was introduced in #3623.
      Closes https://github.com/facebook/rocksdb/pull/3647
      
      Differential Revision: D7384448
      
      Pulled By: sagar0
      
      fbshipit-source-id: 9b6e569f44e6df5cb7ba06413d9975df0b517d20
      7ffce280
    • R
      InlineSkiplist: don't decode keys unnecessarily during comparisons · 09b6bf82
      Radoslaw Zarzynski 提交于
      Summary:
      Summary
      ========
      `InlineSkipList<>::Insert` takes the `key` parameter as a C-string. Then, it performs multiple comparisons with it requiring the `GetLengthPrefixedSlice()` to be spawn in `MemTable::KeyComparator::operator()(const char* prefix_len_key1, const char* prefix_len_key2)` on the same data over and over. The patch tries to optimize that.
      
      Rough performance comparison
      =====
      Big keys, no compression.
      
      ```
      $ ./db_bench --writes 20000000 --benchmarks="fillrandom" --compression_type none -key_size 256
      (...)
      fillrandom   :       4.222 micros/op 236836 ops/sec;   80.4 MB/s
      ```
      
      ```
      $ ./db_bench --writes 20000000 --benchmarks="fillrandom" --compression_type none -key_size 256
      (...)
      fillrandom   :       4.064 micros/op 246059 ops/sec;   83.5 MB/s
      ```
      
      TODO
      ======
      In ~~a separated~~ this PR:
      - [x] Go outside the write path. Maybe even eradicate the C-string-taking variant of `KeyIsAfterNode` entirely.
      - [x] Try to cache the transformations applied by `KeyComparator` & friends in situations where we havy many comparisons with the same key.
      Closes https://github.com/facebook/rocksdb/pull/3516
      
      Differential Revision: D7059300
      
      Pulled By: ajkr
      
      fbshipit-source-id: 6f027dbb619a488129f79f79b5f7dbe566fb2dbb
      09b6bf82
  5. 23 3月, 2018 10 次提交
    • Z
      FlushReason improvement · 1cbc96d2
      Zhongyi Xie 提交于
      Summary:
      Right now flush reason "SuperVersion Change" covers a few different scenarios which is a bit vague. For example, the following db_bench job should trigger "Write Buffer Full"
      
      > $ TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304
      $ grep 'flush_reason' /dev/shm/dbbench/LOG
      ...
      2018/03/06-17:30:42.543638 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242543634, "job": 192, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018024, "flush_reason": "SuperVersion Change"}
      2018/03/06-17:30:42.569541 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242569536, "job": 193, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "SuperVersion Change"}
      2018/03/06-17:30:42.596396 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242596392, "job": 194, "event": "flush_started", "num_memtables": 1, "num_entries": 7008, "num_deletes": 0, "memory_usage": 1018048, "flush_reason": "SuperVersion Change"}
      2018/03/06-17:30:42.622444 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242622440, "job": 195, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "SuperVersion Change"}
      
      With the fix:
      > 2018/03/19-14:40:02.341451 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602341444, "job": 98, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018008, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.379655 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602379642, "job": 100, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018016, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.418479 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602418474, "job": 101, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.455084 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602455079, "job": 102, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018048, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.492293 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602492288, "job": 104, "event": "flush_started", "num_memtables": 1, "num_entries": 7007, "num_deletes": 0, "memory_usage": 1018056, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.528720 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602528715, "job": 105, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.566255 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602566238, "job": 107, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018112, "flush_reason": "Write Buffer Full"}
      Closes https://github.com/facebook/rocksdb/pull/3627
      
      Differential Revision: D7328772
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 67c94065fbdd36930f09930aad0aaa6d2c152bb8
      1cbc96d2
    • A
      Add unit test for WAL corruption · 82137f0c
      Andrew Kryczka 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3618
      
      Differential Revision: D7301053
      
      Pulled By: ajkr
      
      fbshipit-source-id: a9dde90caa548c294d03d6386f78428c8536ca14
      82137f0c
    • S
      Fsync after writing global seq number in ExternalSstFileIngestionJob · 2e3d4077
      Sagar Vemuri 提交于
      Summary:
      Fsync after writing global sequence number to the ingestion file in ExternalSstFileIngestionJob. Otherwise the file metadata could be incorrect.
      Closes https://github.com/facebook/rocksdb/pull/3644
      
      Differential Revision: D7373813
      
      Pulled By: sagar0
      
      fbshipit-source-id: 4da2c9e71a8beb5c08b4ac955f288ee1576358b8
      2e3d4077
    • A
      Rename function for handling WAL write error · 4d51feab
      Andrew Kryczka 提交于
      Summary:
      It was misnamed. It actually updates `bg_error_` if `PreprocessWrite()` or `WriteToWAL()` fail, not related to the user callback.
      Closes https://github.com/facebook/rocksdb/pull/3485
      
      Differential Revision: D6955787
      
      Pulled By: ajkr
      
      fbshipit-source-id: bd7afc3fdb7a52830c021cbfc25fcbc3ab7d5e10
      4d51feab
    • S
      SstFileManager: add bytes_max_delete_chunk · 118058ba
      Siying Dong 提交于
      Summary:
      Add `bytes_max_delete_chunk` in SstFileManager so that we can drop a large file in multiple batches.
      Closes https://github.com/facebook/rocksdb/pull/3640
      
      Differential Revision: D7358679
      
      Pulled By: siying
      
      fbshipit-source-id: ef17f0da2f5723dbece2669485a9b91b3edc0bb7
      118058ba
    • A
      log value of CompressionOptions::zstd_max_train_bytes · 88c3e26c
      Andrew Kryczka 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3587
      
      Differential Revision: D7206901
      
      Pulled By: ajkr
      
      fbshipit-source-id: 5d4b1a2653627b44aa3c22db7d98c9cd5dcdb67a
      88c3e26c
    • A
      parse CompressionOptions::zstd_max_train_bytes in options string · 620823f8
      Andrew Kryczka 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3588
      
      Differential Revision: D7208087
      
      Pulled By: ajkr
      
      fbshipit-source-id: 688f7a7c447cb17bee1b410d1fd891c0bf966617
      620823f8
    • F
      Update history for future 5.13 release · de6cf95a
      Fosco Marotto 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3631
      
      Differential Revision: D7367519
      
      Pulled By: gfosco
      
      fbshipit-source-id: 57826cc1c9ffc9f2b351075567b8ad929809cb74
      de6cf95a
    • M
      WritePrepared Txn: fix race condition on publishing seq · 7429b20e
      Maysam Yabandeh 提交于
      Summary:
      This commit fixes a race condition on calling SetLastPublishedSequence. The function must be called only from the 2nd write queue when two_write_queues is enabled. However there was a bug that would also call it from the main write queue if CommitTimeWriteBatch is provided to the commit request and yet use_only_the_last_commit_time_batch_for_recovery optimization is not enabled. To fix that we penalize the commit request in such cases by doing an additional write solely to publish the seq number from the 2nd queue.
      Closes https://github.com/facebook/rocksdb/pull/3641
      
      Differential Revision: D7361508
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: bf8f7a27e5cccf5425dccbce25eb0032e8e5a4d7
      7429b20e
    • R
      Fixed buffer overrun in BackupEngineImpl::BackupMeta::StoreToFile · fa8c050e
      Rohan Rathi 提交于
      Summary:
      The 10MB buffer in BackupEngineImpl::BackupMeta::StoreToFile can be corrupted with a large number of files. Added a check to determine current buffer length and append data to file if buffer becomes full.
      
      Resolves https://github.com/facebook/rocksdb/issues/3228
      Closes https://github.com/facebook/rocksdb/pull/3636
      
      Differential Revision: D7354160
      
      Pulled By: ajkr
      
      fbshipit-source-id: eec12d38095a0d17551a4aaee52b99d30a555722
      fa8c050e
  6. 22 3月, 2018 6 次提交
  7. 21 3月, 2018 3 次提交
  8. 20 3月, 2018 2 次提交
    • A
      fix db_compaction_test when compression disabled · d1b26507
      Andrew Kryczka 提交于
      Summary:
      Previously, the compaction in `DBCompactionTestWithParam.ForceBottommostLevelCompaction` generated multiple files in no-compression use case, andone file in compression use case. I increased `target_file_size_base` so it generates one file in both use cases.
      Closes https://github.com/facebook/rocksdb/pull/3625
      
      Differential Revision: D7311885
      
      Pulled By: ajkr
      
      fbshipit-source-id: 97f249fa83a9924ac34357a4bb3189c969ecb107
      d1b26507
    • T
      Enable compilation on OpenBSD · ccb76136
      Tobias Tschinkowitz 提交于
      Summary:
      I modified the Makefile so that we can compile rocksdb on OpenBSD.
      The instructions for building have been added to INSTALL.md.
      The whole compilation process works fine like this on OpenBSD-current
      Closes https://github.com/facebook/rocksdb/pull/3617
      
      Differential Revision: D7323754
      
      Pulled By: siying
      
      fbshipit-source-id: 990037d1cc69138d22f85bd77ef4dc8c1ba9edea
      ccb76136
  9. 19 3月, 2018 1 次提交
    • Y
      Fix the command used to generate ctags · 1139422d
      Yanqin Jin 提交于
      Summary:
      In original $ROCKSDB_HOME/Makefile, the command used to generate ctags is
      ```
      ctags * -R
      ```
      However, this failed to generate tags for me.
      I did some search on the usage of ctags command and found that it should be
      ```
      ctags -R .
      ```
      or
      ```
      ctags -R *
      ```
      After the change, I can find the tags in vim using `:ts <identifier>`.
      Closes https://github.com/facebook/rocksdb/pull/3626
      
      Reviewed By: ajkr
      
      Differential Revision: D7320217
      
      Pulled By: riversand963
      
      fbshipit-source-id: e4cd8f8a67842370a2343f0213df3cbd07754111
      1139422d
  10. 17 3月, 2018 3 次提交