1. 24 7月, 2018 1 次提交
    • M
      WriteUnPrepared: Implement unprepared batches for transactions (#4104) · ea212e53
      Manuel Ung 提交于
      Summary:
      This adds support for writing unprepared batches based on size defined in `TransactionOptions::max_write_batch_size`. This is done by overriding methods that modify data (Put/Delete/SingleDelete/Merge) and checking first if write batch size has exceeded threshold. If so, the write batch is written to DB as an unprepared batch.
      
      Support for Commit/Rollback for unprepared batch is added as well. This has been done by simply extending the WritePrepared Commit/Rollback logic to take care of all unprep_seq numbers either when updating prepare heap, or adding to commit map. For updating the commit map, this logic exists inside `WriteUnpreparedCommitEntryPreReleaseCallback`.
      
      A test change was also made to have transactions unregister themselves when committing without prepare. This is because with write unprepared, there may be unprepared entries (which act similarly to prepared entries) already when a commit is done without prepare.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4104
      
      Differential Revision: D8785717
      
      Pulled By: lth
      
      fbshipit-source-id: c02006e281ec1ce00f628e2a7beec0ee73096a91
      ea212e53
  2. 29 6月, 2018 1 次提交
    • A
      Allow DB resume after background errors (#3997) · 52d4c9b7
      Anand Ananthabhotla 提交于
      Summary:
      Currently, if RocksDB encounters errors during a write operation (user requested or BG operations), it sets DBImpl::bg_error_ and fails subsequent writes. This PR allows the DB to be resumed for certain classes of errors. It consists of 3 parts -
      1. Introduce Status::Severity in rocksdb::Status to indicate whether a given error can be recovered from or not
      2. Refactor the error handling code so that setting bg_error_ and deciding on severity is in one place
      3. Provide an API for the user to clear the error and resume the DB instance
      
      This whole change is broken up into multiple PRs. Initially, we only allow clearing the error for Status::NoSpace() errors during background flush/compaction. Subsequent PRs will expand this to include more errors and foreground operations such as Put(), and implement a polling mechanism for out-of-space errors.
      Closes https://github.com/facebook/rocksdb/pull/3997
      
      Differential Revision: D8653831
      
      Pulled By: anand1976
      
      fbshipit-source-id: 6dc835c76122443a7668497c0226b4f072bc6afd
      52d4c9b7
  3. 01 6月, 2018 1 次提交
  4. 23 5月, 2018 1 次提交
    • A
      Avoid sleep in DBTest.GroupCommitTest to fix flakiness · 7db721b9
      Andrew Kryczka 提交于
      Summary:
      DBTest.GroupCommitTest would often fail when run under valgrind because its sleeps were insufficient to guarantee a group commit had multiple entries. Instead we can use sync point to force a leader to wait until a non-leader thread has enqueued its work, thus guaranteeing a leader can do group commit work for multiple threads.
      Closes https://github.com/facebook/rocksdb/pull/3883
      
      Differential Revision: D8079429
      
      Pulled By: ajkr
      
      fbshipit-source-id: 61dc50fad29d2c85547842f681288de60fa29049
      7db721b9
  5. 04 5月, 2018 1 次提交
    • S
      Skip deleted WALs during recovery · d5954929
      Siying Dong 提交于
      Summary:
      This patch record min log number to keep to the manifest while flushing SST files to ignore them and any WAL older than them during recovery. This is to avoid scenarios when we have a gap between the WAL files are fed to the recovery procedure. The gap could happen by for example out-of-order WAL deletion. Such gap could cause problems in 2PC recovery where the prepared and commit entry are placed into two separate WAL and gap in the WALs could result into not processing the WAL with the commit entry and hence breaking the 2PC recovery logic.
      
      Before the commit, for 2PC case, we determined which log number to keep in FindObsoleteFiles(). We looked at the earliest logs with outstanding prepare entries, or prepare entries whose respective commit or abort are in memtable. With the commit, the same calculation is done while we apply the SST flush. Just before installing the flush file, we precompute the earliest log file to keep after the flush finishes using the same logic (but skipping the memtables just flushed), record this information to the manifest entry for this new flushed SST file. This pre-computed value is also remembered in memory, and will later be used to determine whether a log file can be deleted. This value is unlikely to change until next flush because the commit entry will stay in memtable. (In WritePrepared, we could have removed the older log files as soon as all prepared entries are committed. It's not yet done anyway. Even if we do it, the only thing we loss with this new approach is earlier log deletion between two flushes, which does not guarantee to happen anyway because the obsolete file clean-up function is only executed after flush or compaction)
      
      This min log number to keep is stored in the manifest using the safely-ignore customized field of AddFile entry, in order to guarantee that the DB generated using newer release can be opened by previous releases no older than 4.2.
      Closes https://github.com/facebook/rocksdb/pull/3765
      
      Differential Revision: D7747618
      
      Pulled By: siying
      
      fbshipit-source-id: d00c92105b4f83852e9754a1b70d6b64cb590729
      d5954929
  6. 24 4月, 2018 1 次提交
    • M
      Improve write time breakdown stats · affe01b0
      Mike Kolupaev 提交于
      Summary:
      There's a group of stats in PerfContext for profiling the write path. They break down the write time into WAL write, memtable insert, throttling, and everything else. We use these stats a lot for figuring out the cause of slow writes.
      
      These stats got a bit out of date and are now categorizing some interesting things as "everything else", and also do some double counting. This PR fixes it and adds two new stats: time spent waiting for other threads of the batch group, and time spent waiting for scheduling flushes/compactions. Probably these will be enough to explain all the occasional abnormally slow (multiple seconds) writes that we're seeing.
      Closes https://github.com/facebook/rocksdb/pull/3602
      
      Differential Revision: D7251562
      
      Pulled By: al13n321
      
      fbshipit-source-id: 0a2d0f5a4fa5677455e1f566da931cb46efe2a0d
      affe01b0
  7. 30 3月, 2018 1 次提交
    • M
      WritePrepared Txn: fix a bug in publishing recoverable state seq · 89d989ed
      Maysam Yabandeh 提交于
      Summary:
      When using two_write_queue, the published seq and the last allocated sequence could be ahead of the LastSequence, even if both write queues are stopped as in WriteRecoverableState. The patch fixes a bug in WriteRecoverableState in which LastSequence was used as a reference but the result was applied to last fetched sequence and last published seq.
      Closes https://github.com/facebook/rocksdb/pull/3665
      
      Differential Revision: D7446099
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 1449bed9aed8e9db6af85946efd347cb8efd3c0b
      89d989ed
  8. 29 3月, 2018 1 次提交
    • M
      WritePrepared Txn: make recoverable state visible after flush · 0377ff9d
      Maysam Yabandeh 提交于
      Summary:
      Currently if the CommitTimeWriteBatch is set to be used only as a state that is required only for recovery , the user cannot see that in DB until it is restarted. This while the state is already inserted into the DB after the memtable flush. It would be useful for debugging if make this state visible to the user after the flush by committing it. The patch does it by a invoking a callback that does the commit on the recoverable state.
      Closes https://github.com/facebook/rocksdb/pull/3661
      
      Differential Revision: D7424577
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 137f9408662f0853938b33fa440f27f04c1bbf5c
      0377ff9d
  9. 27 3月, 2018 1 次提交
    • M
      Fix race condition via concurrent FlushWAL · 35a4469b
      Maysam Yabandeh 提交于
      Summary:
      Currently log_writer->AddRecord in WriteImpl is protected from concurrent calls via FlushWAL only if two_write_queues_ option is set. The patch fixes the problem by i) skip log_writer->AddRecord in FlushWAL if manual_wal_flush is not set, ii) protects log_writer->AddRecord in WriteImpl via log_write_mutex_ if manual_wal_flush_ is set but two_write_queues_ is not.
      
      Fixes #3599
      Closes https://github.com/facebook/rocksdb/pull/3656
      
      Differential Revision: D7405608
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: d6cc265051c77ae49c7c6df4f427350baaf46934
      35a4469b
  10. 24 3月, 2018 1 次提交
    • M
      WritePrepared Txn: AddPrepared for all sub-batches · 3e417a66
      Maysam Yabandeh 提交于
      Summary:
      Currently AddPrepared is performed only on the first sub-batch if there are duplicate keys in the write batch. This could cause a problem if the transaction takes too long to commit and the seq number of the first sub-patch moved to old_prepared_ but not the seq of the later ones. The patch fixes this by calling AddPrepared for all sub-patches.
      Closes https://github.com/facebook/rocksdb/pull/3651
      
      Differential Revision: D7388635
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 0ccd80c150d9bc42fe955e49ddb9d7ca353067b4
      3e417a66
  11. 23 3月, 2018 3 次提交
    • Z
      FlushReason improvement · 1cbc96d2
      Zhongyi Xie 提交于
      Summary:
      Right now flush reason "SuperVersion Change" covers a few different scenarios which is a bit vague. For example, the following db_bench job should trigger "Write Buffer Full"
      
      > $ TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304
      $ grep 'flush_reason' /dev/shm/dbbench/LOG
      ...
      2018/03/06-17:30:42.543638 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242543634, "job": 192, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018024, "flush_reason": "SuperVersion Change"}
      2018/03/06-17:30:42.569541 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242569536, "job": 193, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "SuperVersion Change"}
      2018/03/06-17:30:42.596396 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242596392, "job": 194, "event": "flush_started", "num_memtables": 1, "num_entries": 7008, "num_deletes": 0, "memory_usage": 1018048, "flush_reason": "SuperVersion Change"}
      2018/03/06-17:30:42.622444 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242622440, "job": 195, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "SuperVersion Change"}
      
      With the fix:
      > 2018/03/19-14:40:02.341451 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602341444, "job": 98, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018008, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.379655 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602379642, "job": 100, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018016, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.418479 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602418474, "job": 101, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.455084 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602455079, "job": 102, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018048, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.492293 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602492288, "job": 104, "event": "flush_started", "num_memtables": 1, "num_entries": 7007, "num_deletes": 0, "memory_usage": 1018056, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.528720 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602528715, "job": 105, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.566255 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602566238, "job": 107, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018112, "flush_reason": "Write Buffer Full"}
      Closes https://github.com/facebook/rocksdb/pull/3627
      
      Differential Revision: D7328772
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 67c94065fbdd36930f09930aad0aaa6d2c152bb8
      1cbc96d2
    • A
      Rename function for handling WAL write error · 4d51feab
      Andrew Kryczka 提交于
      Summary:
      It was misnamed. It actually updates `bg_error_` if `PreprocessWrite()` or `WriteToWAL()` fail, not related to the user callback.
      Closes https://github.com/facebook/rocksdb/pull/3485
      
      Differential Revision: D6955787
      
      Pulled By: ajkr
      
      fbshipit-source-id: bd7afc3fdb7a52830c021cbfc25fcbc3ab7d5e10
      4d51feab
    • M
      WritePrepared Txn: fix race condition on publishing seq · 7429b20e
      Maysam Yabandeh 提交于
      Summary:
      This commit fixes a race condition on calling SetLastPublishedSequence. The function must be called only from the 2nd write queue when two_write_queues is enabled. However there was a bug that would also call it from the main write queue if CommitTimeWriteBatch is provided to the commit request and yet use_only_the_last_commit_time_batch_for_recovery optimization is not enabled. To fix that we penalize the commit request in such cases by doing an additional write solely to publish the seq number from the 2nd queue.
      Closes https://github.com/facebook/rocksdb/pull/3641
      
      Differential Revision: D7361508
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: bf8f7a27e5cccf5425dccbce25eb0032e8e5a4d7
      7429b20e
  12. 07 3月, 2018 1 次提交
    • F
      uint64_t and size_t changes to compile for iOS · d518fe1d
      Fosco Marotto 提交于
      Summary:
      In attempting to build a static lib for use in iOS, I ran in to lots of type errors between uint64_t and size_t.  This PR contains the changes I made to get `TARGET_OS=IOS make static_lib` to succeed while also getting Xcode to build successfully with the resulting `librocksdb.a` library imported.
      
      This also compiles for me on macOS and tests fine, but I'm really not sure if I made the correct decisions about where to `static_cast` and where to change types.
      
      Also up for discussion: is iOS worth supporting?  Getting the static lib is just part one, we aren't providing any bridging headers or wrappers like the ObjectiveRocks project, it won't be a great experience.
      Closes https://github.com/facebook/rocksdb/pull/3503
      
      Differential Revision: D7106457
      
      Pulled By: gfosco
      
      fbshipit-source-id: 82ac2073de7e1f09b91f6b4faea91d18bd311f8e
      d518fe1d
  13. 06 3月, 2018 1 次提交
  14. 23 2月, 2018 2 次提交
  15. 10 2月, 2018 2 次提交
  16. 06 2月, 2018 2 次提交
    • M
      WritePrepared Txn: Duplicate Keys, Txn Part · 88d8b2a2
      Maysam Yabandeh 提交于
      Summary:
      This patch takes advantage of memtable being able to detect duplicate <key,seq> and returning TryAgain to handle duplicate keys in WritePrepared Txns. Through WriteBatchWithIndex's index it detects existence of at least a duplicate key in the write batch. If duplicate key was reported, it then pays the cost of counting the number of sub-patches by iterating over the write batch and pass it to DBImpl::Write. DB will make use of the provided batch_count to assign proper sequence numbers before sending them to the WAL. When later inserting the batch to the memtable, it increases the seq each time memtbale reports a duplicate (a sub-patch in our counting) and tries again.
      Closes https://github.com/facebook/rocksdb/pull/3455
      
      Differential Revision: D6873699
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: db8487526c3a5dc1ddda0ea49f0f979b26ae648d
      88d8b2a2
    • A
      Handle error return from WriteBuffer() · 4b124fb9
      Anand Ananthabhotla 提交于
      Summary:
      There are a couple of places where we swallow any error from
      WriteBuffer() - in SwitchMemtable() and DBImpl::CloseImpl(). Propagate
      the error up in those cases rather than ignoring it.
      Closes https://github.com/facebook/rocksdb/pull/3404
      
      Differential Revision: D6879954
      
      Pulled By: anand1976
      
      fbshipit-source-id: 2ef88b554be5286b0a8bad7384ba17a105395bdb
      4b124fb9
  17. 10 1月, 2018 1 次提交
  18. 21 12月, 2017 1 次提交
    • M
      Disable need_log_sync on bg err · 0ef3fdd7
      Maysam Yabandeh 提交于
      Summary:
      When there is a background error PreprocessWrite returns without marking the logs synced. If we keep need_log_sync to true, it would try to sync them at the end, which would break the logic. The patch would unset need_log_sync if the logs end up not being marked for sync in PreprocessWrite.
      Closes https://github.com/facebook/rocksdb/pull/3293
      
      Differential Revision: D6602347
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 37ee04209e8dcfd78de891654ce50d0954abeb38
      0ef3fdd7
  19. 13 12月, 2017 1 次提交
    • M
      disableWAL with WriteImplWALOnly · 546a6327
      Maysam Yabandeh 提交于
      Summary:
      Currently WriteImplWALOnly simply returns when disableWAL is set. This is an incorrect behavior since it does not allocated the sequence number, which is a side-effect of writing to the WAL. This patch fixes the issue.
      Closes https://github.com/facebook/rocksdb/pull/3262
      
      Differential Revision: D6550974
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 745a83ae8f04e7ca6c8ffb247d6ef16c287c52e7
      546a6327
  20. 01 12月, 2017 1 次提交
    • M
      WritePrepared Txn: PreReleaseCallback · 18dcf7f9
      Maysam Yabandeh 提交于
      Summary:
      Add PreReleaseCallback to be called at the end of WriteImpl but before publishing the sequence number. The callback is used in WritePrepareTxn to i) update the commit map, ii) update the last published sequence number in the 2nd write queue. It also ensures that all the commits will go to the 2nd queue.
      These changes will ensure that the commit map is updated before the sequence number is published and used by reading snapshots. If we use two write queues, the snapshots will use the seq number published by the 2nd queue. If we use one write queue (the default, the snapshots will use the last seq number in the memtable, which also indicates the last published seq number.
      Closes https://github.com/facebook/rocksdb/pull/3205
      
      Differential Revision: D6438959
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f8b6c434e94bc5f5ab9cb696879d4c23e2577ab9
      18dcf7f9
  21. 29 11月, 2017 1 次提交
    • Y
      Fix IOError on WAL write doesn't propagate to write group follower · 3cf562be
      Yi Wu 提交于
      Summary:
      This is a simpler version of #3097 by removing all unrelated changes.
      
      Fixing the bug where concurrent writes may get Status::OK while it actually gets IOError on WAL write. This happens when multiple writes form a write batch group, and the leader get an IOError while writing to WAL. The leader failed to pass the error to followers in the group, and the followers end up returning Status::OK() while actually writing nothing. The bug only affect writes in a batch group. Future writes after the batch group will correctly return immediately with the IOError.
      Closes https://github.com/facebook/rocksdb/pull/3201
      
      Differential Revision: D6421644
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 1c2a455c5b73f6842423785eb8a9dbfbb191dc0e
      3cf562be
  22. 16 11月, 2017 1 次提交
  23. 11 11月, 2017 2 次提交
  24. 02 11月, 2017 1 次提交
    • M
      WritePrepared Txn: Optimize for recoverable state · 17731a43
      Maysam Yabandeh 提交于
      Summary:
      GetCommitTimeWriteBatch is currently used to store some state as part of commit in 2PC. In MyRocks it is specifically used to store some data that would be needed only during recovery. So it is not need to be stored in memtable right after each commit.
      This patch enables an optimization to write the GetCommitTimeWriteBatch only to the WAL. The batch will be written to memtable during recovery when the WAL is replayed. To cover the case when WAL is deleted after memtable flush, the batch is also buffered and written to memtable right before each memtable flush.
      Closes https://github.com/facebook/rocksdb/pull/3071
      
      Differential Revision: D6148023
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 2d09bae5565abe2017c0327421010d5c0d55eaa7
      17731a43
  25. 29 10月, 2017 1 次提交
  26. 07 10月, 2017 1 次提交
  27. 06 10月, 2017 1 次提交
  28. 29 9月, 2017 1 次提交
  29. 28 9月, 2017 1 次提交
    • Q
      Make bytes_per_sync and wal_bytes_per_sync mutable · 6a541afc
      Quinn Jarrell 提交于
      Summary:
      SUMMARY
      Moves the bytes_per_sync and wal_bytes_per_sync options from immutableoptions to mutable options. Also if wal_bytes_per_sync is changed, the wal file and memtables are flushed.
      TEST PLAN
      ran make check
      all passed
      
      Two new tests SetBytesPerSync, SetWalBytesPerSync check that after issuing setoptions with a new value for the var, the db options have the new value.
      Closes https://github.com/facebook/rocksdb/pull/2893
      
      Reviewed By: yiwu-arbug
      
      Differential Revision: D5845814
      
      Pulled By: TheRushingWookie
      
      fbshipit-source-id: 93b52d779ce623691b546679dcd984a06d2ad1bd
      6a541afc
  30. 19 9月, 2017 1 次提交
    • M
      WritePrepared Txn: Advance seq one per batch · 60beefd6
      Maysam Yabandeh 提交于
      Summary:
      By default the seq number in DB is increased once per written key. WritePrepared txns requires the seq to be increased once per the entire batch so that the seq would be used as the prepare timestamp by which the transaction is identified. Also we need to increase seq for the commit marker since it would give a unique id to the commit timestamp of transactions.
      
      Two unit tests are added to verify our understanding of how the seq should be increased. The recovery path requires much more work and is left to another patch.
      Closes https://github.com/facebook/rocksdb/pull/2885
      
      Differential Revision: D5837843
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: a08960b93d727e1cf438c254d0c2636fb133cc1c
      60beefd6
  31. 12 9月, 2017 1 次提交
  32. 17 8月, 2017 2 次提交
    • F
      fix some misspellings · ac8fb77a
      follitude 提交于
      Summary:
      PTAL ajkr
      Closes https://github.com/facebook/rocksdb/pull/2750
      
      Differential Revision: D5648052
      
      Pulled By: ajkr
      
      fbshipit-source-id: 7cd1ddd61364d5a55a10fdd293fa74b2bf89dd98
      ac8fb77a
    • M
      Update WritePrepared with the pseudo code · eb642530
      Maysam Yabandeh 提交于
      Summary:
      Implement the main body of WritePrepared pseudo code. This includes PrepareInternal and CommitInternal, as well as AddCommitted which updates the commit map. It also provides a IsInSnapshot method that could be later called form the read path to decide if a version is in the read snapshot or it should other be skipped.
      
      This patch lacks unit tests and does not attempt to offer an efficient implementation. The idea is that to have the API specified so that we can work on related tasks in parallel.
      Closes https://github.com/facebook/rocksdb/pull/2713
      
      Differential Revision: D5640021
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: bfa7a05e8d8498811fab714ce4b9c21530514e1c
      eb642530
  33. 26 7月, 2017 1 次提交