1. 13 8月, 2019 1 次提交
  2. 07 6月, 2019 1 次提交
  3. 01 6月, 2019 1 次提交
  4. 31 5月, 2019 2 次提交
  5. 29 5月, 2019 1 次提交
    • M
      WritePrepared: disableWAL in commit without prepare (#5327) · f5576c33
      Maysam Yabandeh 提交于
      Summary:
      When committing a transaction without prepare, WritePrepared simply writes the batch to db and add the commit entry to CommitCache. When two_write_queues=true, following the rule of committing only from 2nd write queue, the first write, writes the batch and the only thing the 2nd write does is to write the commit entry to CommitCache. Currently the write batch in 2nd write is set to an empty LogData entry, while the write to the WAL could simply be entirely disabled.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5327
      
      Differential Revision: D15424546
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 3d9ea3922d5196984c584d62a3ed57e1f7ca7b9f
      f5576c33
  6. 16 5月, 2019 1 次提交
    • M
      WritePrepared: Fix deadlock in WriteRecoverableState (#5306) · f0e82161
      Maysam Yabandeh 提交于
      Summary:
      The recent improvement in https://github.com/facebook/rocksdb/pull/3661 could cause a deadlock: When writing recoverable state, we also commit its sequence number to commit table, which could result into evicting existing commit entry, which could result into advancing max_evicted_seq_, which would need to get snapshots from database, which requires obtaining db mutex. The patch releases db_mutex before calling the callback in WriteRecoverableState to avoid the potential deadlock. It also improves the stress tests to let the issue be manifested in the tests.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5306
      
      Differential Revision: D15341458
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 05dcbed7e21b789fd1e5fd5ee8eea08077162323
      f0e82161
  7. 14 5月, 2019 1 次提交
    • M
      Unordered Writes (#5218) · f383641a
      Maysam Yabandeh 提交于
      Summary:
      Performing unordered writes in rocksdb when unordered_write option is set to true. When enabled the writes to memtable are done without joining any write thread. This offers much higher write throughput since the upcoming writes would not have to wait for the slowest memtable write to finish. The tradeoff is that the writes visible to a snapshot might change over time. If the application cannot tolerate that, it should implement its own mechanisms to work around that. Using TransactionDB with WRITE_PREPARED write policy is one way to achieve that. Doing so increases the max throughput by 2.2x without however compromising the snapshot guarantees.
      The patch is prepared based on an original by siying
      Existing unit tests are extended to include unordered_write option.
      
      Benchmark Results:
      ```
      TEST_TMPDIR=/dev/shm/ ./db_bench_unordered --benchmarks=fillrandom --threads=32 --num=10000000 -max_write_buffer_number=16 --max_background_jobs=64 --batch_size=8 --writes=3000000 -level0_file_num_compaction_trigger=99999 --level0_slowdown_writes_trigger=99999 --level0_stop_writes_trigger=99999 -enable_pipelined_write=false -disable_auto_compactions  --unordered_write=1
      ```
      With WAL
      - Vanilla RocksDB: 78.6 MB/s
      - WRITER_PREPARED with unordered_write: 177.8 MB/s (2.2x)
      - unordered_write: 368.9 MB/s (4.7x with relaxed snapshot guarantees)
      
      Without WAL
      - Vanilla RocksDB: 111.3 MB/s
      - WRITER_PREPARED with unordered_write: 259.3 MB/s MB/s (2.3x)
      - unordered_write: 645.6 MB/s (5.8x with relaxed snapshot guarantees)
      
      - WRITER_PREPARED with unordered_write disable concurrency control: 185.3 MB/s MB/s (2.35x)
      
      Limitations:
      - The feature is not yet extended to `max_successive_merges` > 0. The feature is also incompatible with `enable_pipelined_write` = true as well as with `allow_concurrent_memtable_write` = false.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5218
      
      Differential Revision: D15219029
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 38f2abc4af8780148c6128acdba2b3227bc81759
      f383641a
  8. 20 2月, 2019 1 次提交
    • M
      WritePrepared: Improve stress tests with slow threads (#4974) · 0f4244fe
      Maysam Yabandeh 提交于
      Summary:
      The transaction stress tests, stress a high concurrency scenario. In WritePrepared/WriteUnPrepared we need to also stress the scenarios where an inserting/reading transaction is very slow. This would stress the corner cases that the caching is not sufficient and other slower data structures are engaged. To emulate such cases we make use of slow inserter/verifier threads and also reduce the size of cache data structures.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4974
      
      Differential Revision: D14143070
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 81eb674678faf9fae0f654cd60ebcc74e26aeee7
      0f4244fe
  9. 22 8月, 2018 1 次提交
  10. 10 8月, 2018 1 次提交
    • M
      Index value delta encoding (#3983) · caf0f53a
      Maysam Yabandeh 提交于
      Summary:
      Given that index value is a BlockHandle, which is basically an <offset, size> pair we can apply delta encoding on the values. The first value at each index restart interval encoded the full BlockHandle but the rest encode only the size. Refer to IndexBlockIter::DecodeCurrentValue for the detail of the encoding. This reduces the index size which helps using the  block cache more efficiently. The feature is enabled with using format_version 4.
      
      The feature comes with a bit of cpu overhead which should be paid back by the higher cache hits due to smaller index block size.
      Results with sysbench read-only using 4k blocks and using 16 index restart interval:
      Format 2:
      19585   rocksdb read-only range=100
      Format 3:
      19569   rocksdb read-only range=100
      Format 4:
      19352   rocksdb read-only range=100
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3983
      
      Differential Revision: D8361343
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f882ee082322acac32b0072e2bdbb0b5f854e651
      caf0f53a
  11. 24 7月, 2018 1 次提交
    • M
      WriteUnPrepared: Implement unprepared batches for transactions (#4104) · ea212e53
      Manuel Ung 提交于
      Summary:
      This adds support for writing unprepared batches based on size defined in `TransactionOptions::max_write_batch_size`. This is done by overriding methods that modify data (Put/Delete/SingleDelete/Merge) and checking first if write batch size has exceeded threshold. If so, the write batch is written to DB as an unprepared batch.
      
      Support for Commit/Rollback for unprepared batch is added as well. This has been done by simply extending the WritePrepared Commit/Rollback logic to take care of all unprep_seq numbers either when updating prepare heap, or adding to commit map. For updating the commit map, this logic exists inside `WriteUnpreparedCommitEntryPreReleaseCallback`.
      
      A test change was also made to have transactions unregister themselves when committing without prepare. This is because with write unprepared, there may be unprepared entries (which act similarly to prepared entries) already when a commit is done without prepare.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4104
      
      Differential Revision: D8785717
      
      Pulled By: lth
      
      fbshipit-source-id: c02006e281ec1ce00f628e2a7beec0ee73096a91
      ea212e53
  12. 14 7月, 2018 2 次提交
    • M
      Per-thread unique test db names (#4135) · 8581a93a
      Maysam Yabandeh 提交于
      Summary:
      The patch makes sure that two parallel test threads will operate on different db paths. This enables using open source tools such as gtest-parallel to run the tests of a file in parallel.
      Example: ``` ~/gtest-parallel/gtest-parallel ./table_test```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4135
      
      Differential Revision: D8846653
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 799bad1abb260e3d346bcb680d2ae207a852ba84
      8581a93a
    • M
      Exclude StackableDB from transaction stress tests (#4132) · 537a2339
      Maysam Yabandeh 提交于
      Summary:
      The transactions are currently tested with and without using StackableDB. This is mostly to check that the code path is consistent with stackable db as well. Slow, stress tests however do not benefit from being run again with StackableDB. The patch excludes StackableDB from such tests.
      On a single core it reduced the runtime of transaction_test from 199s to 135s.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4132
      
      Differential Revision: D8841655
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 7b9aaba2673b542b195439dfb306cef26bd63b19
      537a2339
  13. 29 6月, 2018 1 次提交
    • M
      WriteUnPrepared: Add new WAL marker kTypeBeginUnprepareXID (#4069) · 8ad63a4b
      Manuel Ung 提交于
      Summary:
      This adds a new WAL marker of type kTypeBeginUnprepareXID.
      
      Also, DBImpl now contains a field called batch_per_txn (meaning one WriteBatch per transaction, or possibly multiple WriteBatches). This would also indicate that this DB is using WriteUnprepared policy.
      
      Recovery code would be able to make use of this extra field on DBImpl in a separate diff. For now, it is just used to determine whether the WAL is compatible or not.
      Closes https://github.com/facebook/rocksdb/pull/4069
      
      Differential Revision: D8675099
      
      Pulled By: lth
      
      fbshipit-source-id: ca27cae1738e46d65f2bb92860fc759deb874749
      8ad63a4b
  14. 02 6月, 2018 1 次提交
  15. 12 5月, 2018 1 次提交
  16. 21 4月, 2018 1 次提交
    • M
      WritePrepared Txn: enable TryAgain for duplicates at the end of the batch · c3d1e36c
      Maysam Yabandeh 提交于
      Summary:
      The WriteBatch::Iterate will try with a larger sequence number if the memtable reports a duplicate. This status is specified with TryAgain status. So far the assumption was that the last entry in the batch will never return TryAgain, which is correct when WAL is created via WritePrepared since it always appends a batch separator if a natural one does not exist. However when reading a WAL generated by WriteCommitted this batch separator might  not exist. Although WritePrepared is not supposed to be able to read the WAL generated by WriteCommitted we should avoid confusing scenarios in which the behavior becomes unpredictable. The path fixes that by allowing TryAgain even for the last entry of the write batch.
      Closes https://github.com/facebook/rocksdb/pull/3747
      
      Differential Revision: D7708391
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: bfaddaa9b14a4cdaff6977f6f63c789a6ab1ee0d
      c3d1e36c
  17. 13 4月, 2018 1 次提交
    • M
      WritePrepared Txn: rollback_merge_operands hack · d15397ba
      Maysam Yabandeh 提交于
      Summary:
      This is a hack as temporary fix of MyRocks with rollbacking  the merge operands. The way MyRocks uses merge operands is without protection of locks, which violates the assumption behind the rollback algorithm. They are ok with not being rolled back as it would just create a gap in the autoincrement column. The hack add an option to disable the rollback of merge operands by default and only enables it to let the unit test pass.
      Closes https://github.com/facebook/rocksdb/pull/3711
      
      Differential Revision: D7597177
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 544be0f666c7e7abb7f651ec8b23124e05056728
      d15397ba
  18. 06 3月, 2018 1 次提交
    • M
      WritePrepared Txn: Fix bug with duplicate keys during recovery · 680864ae
      Maysam Yabandeh 提交于
      Summary:
      Fix the following bugs:
      - During recovery a duplicate key was inserted twice into the write batch of the recovery transaction,
      once when the memtable returns false (because it was duplicates) and once for the 2nd attempt. This would result into different SubBatch count measured when the recovered transactions is committing.
      - If a cf is flushed during recovery the memtable is not available to assist in detecting the duplicate key. This could result into not advancing the sequence number when iterating over duplicate keys of a flushed cf and hence inserting the next key with the wrong sequence number.
      - SubBacthCounter would reset the comparator to default comparator after the first duplicate key. The 2nd duplicate key hence would have gone through a wrong comparator and not being detected.
      Closes https://github.com/facebook/rocksdb/pull/3562
      
      Differential Revision: D7149440
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 91ec317b165f363f5d11ff8b8c47c81cebb8ed77
      680864ae
  19. 13 2月, 2018 1 次提交
  20. 31 1月, 2018 1 次提交
    • M
      Make WithParamInterface virtual in transaction_test · ec225d2e
      Maysam Yabandeh 提交于
      Summary:
      Without this patch, ubsan_check is currently failing with this error:
      ```
      utilities/transactions/write_prepared_transaction_test.cc:369:63: runtime error: member call on address 0x0000051649f8 which does not point to an object of type 'WithParamInterface'
      0x0000051649f8: note: object has invalid vptr
      ```
      Tested by `COMPILE_WITH_UBSAN=1 make -j32 transaction_test` and running `./write_prepared_transaction_test --gtest_filter=TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccessTest1/0`
      Closes https://github.com/facebook/rocksdb/pull/3444
      
      Differential Revision: D6850087
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 5b254da8504b8757f7aec8a820ad464154da1a1d
      ec225d2e
  21. 30 1月, 2018 1 次提交
    • M
      Split SnapshotConcurrentAccessTest into 20 sub tests · 3073b1c5
      Maysam Yabandeh 提交于
      Summary:
      SnapshotConcurrentAccessTest sometimes times out when running on the test infra. This patch splits the test into smaller sub-tests to avoid the timeout. It also benefits from lower run-time of each sub-test and increases the coverage of the test. The overall run-time of each final sub-test is at most half of the original test so we should no longer see a timeout.
      Closes https://github.com/facebook/rocksdb/pull/3435
      
      Differential Revision: D6839427
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: d53fdb157109e2438ca7fe447d0cf4b71f304bd8
      3073b1c5
  22. 05 1月, 2018 1 次提交
    • M
      Remove assert(s.ok()) from ::DeleteFile · 1c9ada59
      Maysam Yabandeh 提交于
      Summary:
      DestroyDB that is used in tests loops over the files returned by ::GetChildren and delete them one by one. Such files might be already deleted in the file system (during DeleteObsoleteFileImpl for example) but will get actually deleted with a delay sometimes before ::DeleteFile is called on the file name. We have some test failures where FaultInjectionTestEnv::DeleteFile fails on assert(s.ok()) during DestroyDB. This patch removes the assert statement to fix that.
      Closes https://github.com/facebook/rocksdb/pull/3324
      
      Differential Revision: D6659545
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 4c9552fbcd494dcf3e61d475c11fc965c4388b2c
      1c9ada59
  23. 19 12月, 2017 1 次提交
    • M
      WritePrepared Txn: non-2pc write in one round · a6d3c762
      Maysam Yabandeh 提交于
      Summary:
      Currently non-2pc writes do the 2nd dummy write to actually commit the transaction. This was necessary to ensure that publishing the commit sequence number will be done only from one queue (the queue that does not write to memtable). This is however not necessary when we have only one write queue, which is actually the setup that would be used by non-2pc writes. This patch eliminates the 2nd write when two_write_queues are disabled by updating the commit map in the 1st write.
      Closes https://github.com/facebook/rocksdb/pull/3277
      
      Differential Revision: D6575392
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 8ab458f7ca506905962f9166026b2ec81e749c46
      a6d3c762
  24. 01 12月, 2017 1 次提交
    • M
      WritePrepared Txn: PreReleaseCallback · 18dcf7f9
      Maysam Yabandeh 提交于
      Summary:
      Add PreReleaseCallback to be called at the end of WriteImpl but before publishing the sequence number. The callback is used in WritePrepareTxn to i) update the commit map, ii) update the last published sequence number in the 2nd write queue. It also ensures that all the commits will go to the 2nd queue.
      These changes will ensure that the commit map is updated before the sequence number is published and used by reading snapshots. If we use two write queues, the snapshots will use the seq number published by the 2nd queue. If we use one write queue (the default, the snapshots will use the last seq number in the memtable, which also indicates the last published seq number.
      Closes https://github.com/facebook/rocksdb/pull/3205
      
      Differential Revision: D6438959
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f8b6c434e94bc5f5ab9cb696879d4c23e2577ab9
      18dcf7f9
  25. 16 11月, 2017 1 次提交
  26. 12 11月, 2017 1 次提交
    • M
      WritePrepared Txn: cross-compatibility test · 2edc92bc
      Maysam Yabandeh 提交于
      Summary:
      Add tests to ensure that WritePrepared and WriteCommitted policies are cross compatible when the db WAL is empty. This is important when the admin want to switch between the policies. In such case, before the switch the admin needs to empty the WAL by i) committing/rollbacking all the pending transactions, ii) FlushMemTables
      Closes https://github.com/facebook/rocksdb/pull/3118
      
      Differential Revision: D6227247
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: bcde3d92c1e89cda3b9cfa69f6a20af5d8993db7
      2edc92bc
  27. 11 11月, 2017 1 次提交
  28. 24 10月, 2017 1 次提交
  29. 07 10月, 2017 1 次提交
  30. 29 9月, 2017 1 次提交
  31. 01 9月, 2017 1 次提交