1. 27 3月, 2018 1 次提交
    • M
      WritePrepared Txn: Increase commit cache size to 2^23 · 0999e9b7
      Maysam Yabandeh 提交于
      Summary:
      Current commit cache size is 2^21. This was due to a type. With 2^23 commit entries we can have transactions as long as 64s without incurring the cost of having them evicted from the commit cache before their commit. Here is the math:
      2^23 / 2 (one out of two seq numbers are for commit) / 2^16 TPS = 2^6 = 64s
      Closes https://github.com/facebook/rocksdb/pull/3657
      
      Differential Revision: D7411211
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: e7cacf40579f3acf940643d8a1cfe5dd201caa35
      0999e9b7
  2. 24 3月, 2018 1 次提交
    • M
      WritePrepared Txn: AddPrepared for all sub-batches · 3e417a66
      Maysam Yabandeh 提交于
      Summary:
      Currently AddPrepared is performed only on the first sub-batch if there are duplicate keys in the write batch. This could cause a problem if the transaction takes too long to commit and the seq number of the first sub-patch moved to old_prepared_ but not the seq of the later ones. The patch fixes this by calling AddPrepared for all sub-patches.
      Closes https://github.com/facebook/rocksdb/pull/3651
      
      Differential Revision: D7388635
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 0ccd80c150d9bc42fe955e49ddb9d7ca353067b4
      3e417a66
  3. 23 3月, 2018 1 次提交
    • M
      WritePrepared Txn: fix race condition on publishing seq · 7429b20e
      Maysam Yabandeh 提交于
      Summary:
      This commit fixes a race condition on calling SetLastPublishedSequence. The function must be called only from the 2nd write queue when two_write_queues is enabled. However there was a bug that would also call it from the main write queue if CommitTimeWriteBatch is provided to the commit request and yet use_only_the_last_commit_time_batch_for_recovery optimization is not enabled. To fix that we penalize the commit request in such cases by doing an additional write solely to publish the seq number from the 2nd queue.
      Closes https://github.com/facebook/rocksdb/pull/3641
      
      Differential Revision: D7361508
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: bf8f7a27e5cccf5425dccbce25eb0032e8e5a4d7
      7429b20e
  4. 09 3月, 2018 1 次提交
  5. 07 3月, 2018 1 次提交
    • D
      Windows cumulative patch · c364eb42
      Dmitri Smirnov 提交于
      Summary:
      This patch addressed several issues.
        Portability including db_test std::thread -> port::Thread Cc: @
        and %z to ROCKSDB portable macro. Cc: maysamyabandeh
      
        Implement Env::AreFilesSame
      
        Make the implementation of file unique number more robust
      
        Get rid of C-runtime and go directly to Windows API when dealing
        with file primitives.
      
        Implement GetSectorSize() and aling unbuffered read on the value if
        available.
      
        Adjust Windows Logger for the new interface, implement CloseImpl() Cc: anand1976
      
        Fix test running script issue where $status var was of incorrect scope
        so the failures were swallowed and not reported.
      
        DestroyDB() creates a logger and opens a LOG file in the directory
        being cleaned up. This holds a lock on the folder and the cleanup is
        prevented. This fails one of the checkpoin tests. We observe the same in production.
        We close the log file in this change.
      
       Fix DBTest2.ReadAmpBitmapLiveInCacheAfterDBClose failure where the test
       attempts to open a directory with NewRandomAccessFile which does not
       work on Windows.
        Fix DBTest.SoftLimit as it is dependent on thread timing. CC: yiwu-arbug
      Closes https://github.com/facebook/rocksdb/pull/3552
      
      Differential Revision: D7156304
      
      Pulled By: siying
      
      fbshipit-source-id: 43db0a757f1dfceffeb2b7988043156639173f5b
      c364eb42
  6. 06 3月, 2018 2 次提交
    • M
      WritePrepared Txn: Move DuplicateDetector to util · 62277e15
      Maysam Yabandeh 提交于
      Summary:
      Move DuplicateDetector and SetComparator to its own header file in util. It would also address a complaint in the unity test.
      Closes https://github.com/facebook/rocksdb/pull/3567
      
      Differential Revision: D7163268
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 6ddf82773473646dbbc1284ae601a78c4907c778
      62277e15
    • M
      WritePrepared Txn: Fix bug with duplicate keys during recovery · 680864ae
      Maysam Yabandeh 提交于
      Summary:
      Fix the following bugs:
      - During recovery a duplicate key was inserted twice into the write batch of the recovery transaction,
      once when the memtable returns false (because it was duplicates) and once for the 2nd attempt. This would result into different SubBatch count measured when the recovered transactions is committing.
      - If a cf is flushed during recovery the memtable is not available to assist in detecting the duplicate key. This could result into not advancing the sequence number when iterating over duplicate keys of a flushed cf and hence inserting the next key with the wrong sequence number.
      - SubBacthCounter would reset the comparator to default comparator after the first duplicate key. The 2nd duplicate key hence would have gone through a wrong comparator and not being detected.
      Closes https://github.com/facebook/rocksdb/pull/3562
      
      Differential Revision: D7149440
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 91ec317b165f363f5d11ff8b8c47c81cebb8ed77
      680864ae
  7. 22 2月, 2018 1 次提交
    • M
      WritePrepared Txn: fix non-emptied PreparedHeap bug · 828211e9
      Maysam Yabandeh 提交于
      Summary:
      Under a certain sequence of accessing PreparedHeap, there was a bug that would not successfully empty the heap. This would result in performance issues when the heap content is moved to old_prepared_ after max_evicted_seq_ advances the orphan prepared sequence numbers. The patch fixed the bug and add more unit tests. It also does more logging when the unlikely scenarios are faced
      Closes https://github.com/facebook/rocksdb/pull/3526
      
      Differential Revision: D7038486
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f1e40bea558f67b03d2a29131fcb8734c65fce97
      828211e9
  8. 17 2月, 2018 1 次提交
  9. 13 2月, 2018 1 次提交
  10. 07 2月, 2018 1 次提交
    • M
      Add skip_cc option to TransactionDB::Write · 8feee280
      Maysam Yabandeh 提交于
      Summary:
      Compared to DB::Write, TransactionDB::Write has the additional overhead of creating and initializing an internal transaction object, as well as the overhead of locking/unlocking the keys. This patch extends the TransactionDB::Write with an skip_cc option to allow the users to indicate that the write batch do not conflict with others and the concurrency control and its overhead can be skipped. TransactionDB::Write by default calls DB::Write when skip_cc is set, which works for WriteCommitted WritePolicy. Any other flavor of TransactionDB that is not compatible with this default behavior (such as WritePreparedTxnDB) can extend ::Write and implement their own approach for taking into account the skip_cc optimization.
      Closes https://github.com/facebook/rocksdb/pull/3457
      
      Differential Revision: D6877318
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 56f4e21db87ff71492db4e376fb7c2b03dfeab6b
      8feee280
  11. 06 2月, 2018 1 次提交
    • M
      WritePrepared Txn: Duplicate Keys, Txn Part · 88d8b2a2
      Maysam Yabandeh 提交于
      Summary:
      This patch takes advantage of memtable being able to detect duplicate <key,seq> and returning TryAgain to handle duplicate keys in WritePrepared Txns. Through WriteBatchWithIndex's index it detects existence of at least a duplicate key in the write batch. If duplicate key was reported, it then pays the cost of counting the number of sub-patches by iterating over the write batch and pass it to DBImpl::Write. DB will make use of the provided batch_count to assign proper sequence numbers before sending them to the WAL. When later inserting the batch to the memtable, it increases the seq each time memtbale reports a duplicate (a sub-patch in our counting) and tries again.
      Closes https://github.com/facebook/rocksdb/pull/3455
      
      Differential Revision: D6873699
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: db8487526c3a5dc1ddda0ea49f0f979b26ae648d
      88d8b2a2
  12. 30 1月, 2018 1 次提交
    • M
      Split SnapshotConcurrentAccessTest into 20 sub tests · 3073b1c5
      Maysam Yabandeh 提交于
      Summary:
      SnapshotConcurrentAccessTest sometimes times out when running on the test infra. This patch splits the test into smaller sub-tests to avoid the timeout. It also benefits from lower run-time of each sub-test and increases the coverage of the test. The overall run-time of each final sub-test is at most half of the original test so we should no longer see a timeout.
      Closes https://github.com/facebook/rocksdb/pull/3435
      
      Differential Revision: D6839427
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: d53fdb157109e2438ca7fe447d0cf4b71f304bd8
      3073b1c5
  13. 10 1月, 2018 1 次提交
  14. 19 12月, 2017 1 次提交
    • M
      WritePrepared Txn: non-2pc write in one round · a6d3c762
      Maysam Yabandeh 提交于
      Summary:
      Currently non-2pc writes do the 2nd dummy write to actually commit the transaction. This was necessary to ensure that publishing the commit sequence number will be done only from one queue (the queue that does not write to memtable). This is however not necessary when we have only one write queue, which is actually the setup that would be used by non-2pc writes. This patch eliminates the 2nd write when two_write_queues are disabled by updating the commit map in the 1st write.
      Closes https://github.com/facebook/rocksdb/pull/3277
      
      Differential Revision: D6575392
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 8ab458f7ca506905962f9166026b2ec81e749c46
      a6d3c762
  15. 13 12月, 2017 1 次提交
  16. 01 12月, 2017 1 次提交
    • M
      WritePrepared Txn: PreReleaseCallback · 18dcf7f9
      Maysam Yabandeh 提交于
      Summary:
      Add PreReleaseCallback to be called at the end of WriteImpl but before publishing the sequence number. The callback is used in WritePrepareTxn to i) update the commit map, ii) update the last published sequence number in the 2nd write queue. It also ensures that all the commits will go to the 2nd queue.
      These changes will ensure that the commit map is updated before the sequence number is published and used by reading snapshots. If we use two write queues, the snapshots will use the seq number published by the 2nd queue. If we use one write queue (the default, the snapshots will use the last seq number in the memtable, which also indicates the last published seq number.
      Closes https://github.com/facebook/rocksdb/pull/3205
      
      Differential Revision: D6438959
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f8b6c434e94bc5f5ab9cb696879d4c23e2577ab9
      18dcf7f9
  17. 28 11月, 2017 1 次提交
  18. 16 11月, 2017 1 次提交
  19. 03 11月, 2017 1 次提交
  20. 02 11月, 2017 1 次提交
    • M
      WritePrepared Txn: Optimize for recoverable state · 17731a43
      Maysam Yabandeh 提交于
      Summary:
      GetCommitTimeWriteBatch is currently used to store some state as part of commit in 2PC. In MyRocks it is specifically used to store some data that would be needed only during recovery. So it is not need to be stored in memtable right after each commit.
      This patch enables an optimization to write the GetCommitTimeWriteBatch only to the WAL. The batch will be written to memtable during recovery when the WAL is replayed. To cover the case when WAL is deleted after memtable flush, the batch is also buffered and written to memtable right before each memtable flush.
      Closes https://github.com/facebook/rocksdb/pull/3071
      
      Differential Revision: D6148023
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 2d09bae5565abe2017c0327421010d5c0d55eaa7
      17731a43
  21. 10 10月, 2017 1 次提交
    • Y
      WritePrepared Txn: Iterator · 8c392a31
      Yi Wu 提交于
      Summary:
      On iterator create, take a snapshot, create a ReadCallback and pass the ReadCallback to the underlying DBIter to check if key is committed.
      Closes https://github.com/facebook/rocksdb/pull/2981
      
      Differential Revision: D6001471
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 3565c4cdaf25370ba47008b0e0cb65b31dfe79fe
      8c392a31
  22. 07 10月, 2017 2 次提交
    • Y
      WritePrepare Txn: Cancel flush/compaction before destruction · 17c6325e
      Yi Wu 提交于
      Summary:
      On WritePreparedTxnDB destruct there could be running compaction/flush holding a SnapshotChecker, which holds a pointer back to WritePreparedTxnDB. Make sure those jobs finished before destructing WritePreparedTxnDB.
      
      This is caught by TransactionTest::SeqAdvanceTest.
      Closes https://github.com/facebook/rocksdb/pull/2982
      
      Differential Revision: D6002957
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: f1e70390c9798d1bd7959f5c8e2a1c14100773c3
      17c6325e
    • Y
      WritePrepared Txn: Compaction/Flush · d1b74b0c
      Yi Wu 提交于
      Summary:
      Update Compaction/Flush to support WritePreparedTxnDB: Add SnapshotChecker which is a proxy to query WritePreparedTxnDB::IsInSnapshot. Pass SnapshotChecker to DBImpl on WritePreparedTxnDB open. CompactionIterator use it to check if a key has been committed and if it is visible to a snapshot. In CompactionIterator:
      * check if key has been committed. If not, output uncommitted keys AS-IS.
      * use SnapshotChecker to check if key is visible to a snapshot when in need.
      * do not output key with seq = 0 if the key is not committed.
      Closes https://github.com/facebook/rocksdb/pull/2926
      
      Differential Revision: D5902907
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 945e037fdf0aa652dc5ba0ad879461040baa0320
      d1b74b0c
  23. 03 10月, 2017 1 次提交
    • M
      WritePrepared Txn: Rollback · d27258d3
      Maysam Yabandeh 提交于
      Summary:
      Implement the rollback of WritePrepared txns. For each modified value, it reads the value before the txn and write it back. This would cancel out the effect of transaction. It also remove the rolled back txn from prepared heap.
      Closes https://github.com/facebook/rocksdb/pull/2946
      
      Differential Revision: D5937575
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: a6d3c47f44db3729f44b287a80f97d08dc4e888d
      d27258d3
  24. 29 9月, 2017 1 次提交
  25. 28 9月, 2017 1 次提交
  26. 14 9月, 2017 1 次提交
    • M
      WritePrepared Txn: Lock-free CommitMap · 09713a64
      Maysam Yabandeh 提交于
      Summary:
      We had two proposals for lock-free commit maps. This patch implements the latter one that was simpler. We can later experiment with both proposals.
      
      In this impl each entry is an std::atomic of uint64_t, which are accessed via memory_order_acquire/release. In x86_64 arch this is compiled to simple reads and writes from memory.
      Closes https://github.com/facebook/rocksdb/pull/2861
      
      Differential Revision: D5800724
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 41abae9a4a5df050a8eb696c43de11c2770afdda
      09713a64
  27. 12 9月, 2017 1 次提交
    • M
      write-prepared txn: call IsInSnapshot · f46464d3
      Maysam Yabandeh 提交于
      Summary:
      This patch instruments the read path to verify each read value against an optional ReadCallback class. If the value is rejected, the reader moves on to the next value. The WritePreparedTxn makes use of this feature to skip sequence numbers that are not in the read snapshot.
      Closes https://github.com/facebook/rocksdb/pull/2850
      
      Differential Revision: D5787375
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 49d808b3062ab35e7ae98ad388f659757794184c
      f46464d3
  28. 09 9月, 2017 1 次提交
    • M
      Advance max evicted seq in coarser granularity · fce6c892
      Maysam Yabandeh 提交于
      Summary:
      This patch advances the max_evicted_seq_ is larger granularities to reduce the overhead of updating the relevant data structures.
      
      It also refactor the related code and adds testing to that. As part of this patch some of the TODOs for removing usage of non-static const members are also addressed.
      Closes https://github.com/facebook/rocksdb/pull/2844
      
      Differential Revision: D5772928
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f4fcc2948be69c034f10812cf922ce5ab82ef98c
      fce6c892
  29. 01 9月, 2017 1 次提交
  30. 26 8月, 2017 1 次提交
    • M
      WriteAtPrepare: Efficient read from snapshot list · fbfa3e7a
      Maysam Yabandeh 提交于
      Summary:
      Divide the old snapshots to two lists: a few that fit into a cached array and the rest in a vector, which is expected to be empty in normal cases. The former is to optimize concurrent reads from snapshots without requiring locks. It is done by an array of std::atomic, from which std::memory_order_acquire reads are compiled to simple read instructions in most of the x86_64 architectures.
      Closes https://github.com/facebook/rocksdb/pull/2758
      
      Differential Revision: D5660504
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 524fcf9a8e7f90a92324536456912a99aaa6740c
      fbfa3e7a
  31. 24 8月, 2017 1 次提交
  32. 18 8月, 2017 1 次提交
    • A
      Added mechanism to track deadlock chain · bddd5d36
      Archit Mishra 提交于
      Summary:
      Changes:
      * extended the wait_txn_map to track additional information
      * designed circular buffer to store n latest deadlocks' information
      * added test coverage to verify the additional information tracked is accurately stored in the buffer
      Closes https://github.com/facebook/rocksdb/pull/2630
      
      Differential Revision: D5478025
      
      Pulled By: armishra
      
      fbshipit-source-id: 2b138de7b5a73f5ca554fc3ff8220a3be49f39e7
      bddd5d36
  33. 17 8月, 2017 1 次提交
    • M
      Update WritePrepared with the pseudo code · eb642530
      Maysam Yabandeh 提交于
      Summary:
      Implement the main body of WritePrepared pseudo code. This includes PrepareInternal and CommitInternal, as well as AddCommitted which updates the commit map. It also provides a IsInSnapshot method that could be later called form the read path to decide if a version is in the read snapshot or it should other be skipped.
      
      This patch lacks unit tests and does not attempt to offer an efficient implementation. The idea is that to have the API specified so that we can work on related tasks in parallel.
      Closes https://github.com/facebook/rocksdb/pull/2713
      
      Differential Revision: D5640021
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: bfa7a05e8d8498811fab714ce4b9c21530514e1c
      eb642530
  34. 08 8月, 2017 1 次提交
    • M
      Refactor PessimisticTransaction · bdc056f8
      Maysam Yabandeh 提交于
      Summary:
      This patch splits Commit and Prepare into lock-related logic and db-write-related logic. It moves lock-related logic to PessimisticTransaction to be reused by all children classes and movies the existing impl of db-write-related to PrepareInternal, CommitSingleInternal, and CommitInternal in WriteCommittedTxnImpl.
      Closes https://github.com/facebook/rocksdb/pull/2691
      
      Differential Revision: D5569464
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: d1b8698e69801a4126c7bc211745d05c636f5325
      bdc056f8
  35. 06 8月, 2017 1 次提交
  36. 03 8月, 2017 1 次提交
    • M
      Refactor TransactionImpl · c3d5c4d3
      Maysam Yabandeh 提交于
      Summary:
      This patch refactors TransactionImpl by separating the logic for pessimistic concurrency control from the implementation of how to write the data to rocksdb. The existing implementation is named WriteCommittedTxnImpl as it writes committed data to the db. A template named WritePreparedTxnImpl is also added which will be later completed to provide a an alternative implementation.
      Closes https://github.com/facebook/rocksdb/pull/2676
      
      Differential Revision: D5549998
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 16298e86b43ca4849324c1f35c731913c6d17bec
      c3d5c4d3
  37. 16 7月, 2017 1 次提交
  38. 28 4月, 2017 1 次提交