1. 06 8月, 2019 1 次提交
    • M
      WritePrepared: fix Get without snapshot (#5664) · 208556ee
      Maysam Yabandeh 提交于
      Summary:
      if read_options.snapshot is not set, ::Get will take the last sequence number after taking a super-version and uses that as the sequence number. Theoretically max_eviceted_seq_ could advance this sequence number. This could lead ::IsInSnapshot that will be invoked by the ReadCallback to notice the absence of the snapshot. In this case, the ReadCallback should have passed a non-value to snap_released so that it could be set by the ::IsInSnapshot. The patch does that, and adds a unit test to verify it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5664
      
      Differential Revision: D16614033
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 06fb3fd4aacd75806ed1a1acec7961f5d02486f2
      208556ee
  2. 01 8月, 2019 2 次提交
    • M
      WriteUnPrepared: savepoint support (#5627) · f622ca2c
      Manuel Ung 提交于
      Summary:
      Add savepoint support when the current transaction has flushed unprepared batches.
      
      Rolling back to savepoint is similar to rolling back a transaction. It requires the set of keys that have changed since the savepoint, re-reading the keys at the snapshot at that savepoint, and the restoring the old keys by writing out another unprepared batch.
      
      For this strategy to work though, we must be capable of reading keys at a savepoint. This does not work if keys were written out using the same sequence number before and after a savepoint. Therefore, when we flush out unprepared batches, we must split the batch by savepoint if any savepoints exist.
      
      eg. If we have the following:
      ```
      Put(A)
      Put(B)
      Put(C)
      SetSavePoint()
      Put(D)
      Put(E)
      SetSavePoint()
      Put(F)
      ```
      
      Then we will write out 3 separate unprepared batches:
      ```
      Put(A) 1
      Put(B) 1
      Put(C) 1
      Put(D) 2
      Put(E) 2
      Put(F) 3
      ```
      
      This is so that when we rollback to eg. the first savepoint, we can just read keys at snapshot_seq = 1.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5627
      
      Differential Revision: D16584130
      
      Pulled By: lth
      
      fbshipit-source-id: 6d100dd548fb20c4b76661bd0f8a2647e64477fa
      f622ca2c
    • M
      WriteUnPrepared: use WriteUnpreparedTxnReadCallback for ValidateSnapshot (#5657) · d599135a
      Manuel Ung 提交于
      Summary:
      In DeferSnapshotSavePointTest, writes were failing with snapshot validation error because the key with the latest sequence number was an unprepared key from the current transaction.
      
      Fix this by passing down the correct read callback.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5657
      
      Differential Revision: D16582466
      
      Pulled By: lth
      
      fbshipit-source-id: 11645dac0e7c1374d917ef5fdf757d13c1d1108d
      d599135a
  3. 30 7月, 2019 1 次提交
  4. 27 7月, 2019 2 次提交
    • M
      Use int64_t instead of ssize_t (#5638) · 80d7067c
      Manuel Ung 提交于
      Summary:
      The ssize_t type was introduced in https://github.com/facebook/rocksdb/pull/5633, but it seems like it's a POSIX specific type.
      
      I just need a signed type to represent number of bytes, so use int64_t instead. It seems like we have a typedef from SSIZE_T for Windows, but it doesn't seem like we ever include "port/port.h" in our public header files.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5638
      
      Differential Revision: D16526269
      
      Pulled By: lth
      
      fbshipit-source-id: 8d3a5c41003951b74b29bc5f1d949b2b22da0cee
      80d7067c
    • M
      WriteUnPrepared: Add new variable write_batch_flush_threshold (#5633) · 41df7348
      Manuel Ung 提交于
      Summary:
      Instead of reusing `TransactionOptions::max_write_batch_size` for determining when to flush a write batch for write unprepared, add a new variable called `write_batch_flush_threshold` for this use case instead.
      
      Also add `TransactionDBOptions::default_write_batch_flush_threshold` which sets the default value if `TransactionOptions::write_batch_flush_threshold` is unspecified.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5633
      
      Differential Revision: D16520364
      
      Pulled By: lth
      
      fbshipit-source-id: d75ae5a2141ce7708982d5069dc3f0b58d250e8c
      41df7348
  5. 25 7月, 2019 1 次提交
    • M
      Simplify WriteUnpreparedTxnReadCallback and fix some comments (#5621) · 66b524a9
      Manuel Ung 提交于
      Summary:
      Simplify WriteUnpreparedTxnReadCallback so we just have one function `CalcMaxVisibleSeq`. Also, there's no need for the read callback to hold onto the transaction any more, so just hold the set of unprep_seqs, reducing about of indirection in `IsVisibleFullCheck`.
      
      Also, some comments about using transaction snapshot were out of date, so remove them.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5621
      
      Differential Revision: D16459883
      
      Pulled By: lth
      
      fbshipit-source-id: cd581323fd18982e817d99af57b6eaba59e599bb
      66b524a9
  6. 23 7月, 2019 1 次提交
    • M
      WriteUnPrepared: improve read your own write functionality (#5573) · eae83274
      Manuel Ung 提交于
      Summary:
      There are a number of fixes in this PR (with most bugs found via the added stress tests):
      1. Re-enable reseek optimization. This was initially disabled to avoid infinite loops in https://github.com/facebook/rocksdb/pull/3955 but this can be resolved by remembering not to reseek after a reseek has already been done. This problem only affects forward iteration in `DBIter::FindNextUserEntryInternal`, as we already disable reseeking in `DBIter::FindValueForCurrentKeyUsingSeek`.
      2. Verify that ReadOption.snapshot can be safely used for iterator creation. Some snapshots would not give correct results because snaphsot validation would not be enforced, breaking some assumptions in Prev() iteration.
      3. In the non-snapshot Get() case, reads done at `LastPublishedSequence` may not be enough, because unprepared sequence numbers are not published. Use `std::max(published_seq, max_visible_seq)` to do lookups instead.
      4. Add stress test to test reading own writes.
      5. Minor bug in the allow_concurrent_memtable_write case where we forgot to pass in batch_per_txn_.
      6. Minor performance optimization in `CalcMaxUnpreparedSequenceNumber` by assigning by reference instead of value.
      7. Add some more comments everywhere.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5573
      
      Differential Revision: D16276089
      
      Pulled By: lth
      
      fbshipit-source-id: 18029c944eb427a90a87dee76ac1b23f37ec1ccb
      eae83274
  7. 17 7月, 2019 1 次提交
    • M
      WriteUnPrepared: use tracked_keys_ to track keys needed for rollback (#5562) · 0acaa1a8
      Manuel Ung 提交于
      Summary:
      Currently, we are tracking keys we need to rollback via a separate structure specific to WriteUnprepared in write_set_keys_.
      
      We already have a data structure called tracked_keys_ used to track which keys to unlock on transaction termination. This is exactly what we want, since we should only rollback keys that we have locked anyway.
      
      Save some memory by reusing that data structure instead of making our own.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5562
      
      Differential Revision: D16206484
      
      Pulled By: lth
      
      fbshipit-source-id: 5894d2b824a4b19062d84adbd6e6e86f00047488
      0acaa1a8
  8. 13 4月, 2019 1 次提交
    • M
      WritePrepared: fix race condition in reading batch with duplicate keys (#5147) · fe642cbe
      Maysam Yabandeh 提交于
      Summary:
      When ReadOption doesn't specify a snapshot, WritePrepared::Get used kMaxSequenceNumber to avoid the cost of creating a new snapshot object (that requires sync over db_mutex). This creates a race condition if it is reading from the writes of a transaction that had duplicate keys: each instance of duplicate key is inserted with a different sequence number and depending on the ordering the ::Get might skip the newer one and read the older one that is obsolete.
      The patch fixes that by using last published seq as the snapshot sequence number. It also adds a check after the read is done to ensure that the max_evicted_seq has not advanced the aforementioned seq, which is a very unlikely event. If it did, then the read is not valid since the seq is not backed by an actually snapshot to let IsInSnapshot handle that properly when an overlapping commit is evicted from commit cache.
      A unit  test is added to reproduce the race condition with duplicate keys.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5147
      
      Differential Revision: D14758815
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: a56915657132cf6ba5e3f5ea1b5d78c803407719
      fe642cbe
  9. 04 4月, 2019 1 次提交
    • M
      WriteUnPrepared: fix ubsan complaint (#5148) · 7441a0ec
      Maysam Yabandeh 提交于
      Summary:
      Ubsna complains that in initialization of WriteUnpreparedTxnReadCallback the method of the child class is used before the parent class is constructed. The patch fixes that by making the aforementioned method static.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5148
      
      Differential Revision: D14760098
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: cf19b7c1fdb5de0a54e62c1deebe09a0fa048ded
      7441a0ec
  10. 03 4月, 2019 1 次提交
    • M
      WriteUnPrepared: less virtual in iterator callback (#5049) · 14b3f683
      Maysam Yabandeh 提交于
      Summary:
      WriteUnPrepared adds a virtual function, MaxUnpreparedSequenceNumber, to ReadCallback, which returns 0 unless WriteUnPrepared is enabled and the transaction has uncommitted data written to the DB. Together with snapshot sequence number, this determines the last sequence that is visible to reads.
      The patch clarifies the guarantees of the GetIterator API in WriteUnPrepared transactions and make use of that to statically initialize the read callback and thus avoid the virtual call.
      Furthermore it increases the minimum value for min_uncommitted from 0 to 1 as seq 0 is used only for last level keys that are committed in all snapshots.
      
      The following benchmark shows +0.26% higher throughput in seekrandom benchmark.
      
      Benchmark:
      ./db_bench --benchmarks=fillrandom --use_existing_db=0 --num=1000000 --db=/dev/shm/dbbench
      
      ./db_bench --benchmarks=seekrandom[X10] --use_existing_db=1 --db=/dev/shm/dbbench --num=1000000 --duration=60 --seek_nexts=100
      seekrandom [AVG    10 runs] : 20355 ops/sec;  225.2 MB/sec
      seekrandom [MEDIAN 10 runs] : 20425 ops/sec;  225.9 MB/sec
      
      ./db_bench_lessvirtual3 --benchmarks=seekrandom[X10] --use_existing_db=1 --db=/dev/shm/dbbench --num=1000000 --duration=60 --seek_nexts=100
      seekrandom [AVG    10 runs] : 20409 ops/sec;  225.8 MB/sec
      seekrandom [MEDIAN 10 runs] : 20487 ops/sec;  226.6 MB/sec
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5049
      
      Differential Revision: D14366459
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: ebaff8908332a5ae9af7defeadabcb624be660ef
      14b3f683
  11. 27 2月, 2019 1 次提交
    • M
      WritePrepared: optimize read path by avoiding virtual (#5018) · a661c0d2
      Maysam Yabandeh 提交于
      Summary:
      The read path includes a callback function, ReadCallback, which would eventually calls IsInSnapshot to figure if a particular seq is in the reading snapshot or not. This callback is virtual, which adds the cost of multiple virtual function call to each read. The first few checks in IsInSnapshot, however, are quite trivial and take care of majority of the cases. The patch moves those to a non-virtual function in the the parent class, ReadCallback, to lower the virtual callback cost.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5018
      
      Differential Revision: D14226562
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 6feed5b34f3b082e52092c5ef143e29b49c46b44
      a661c0d2
  12. 07 12月, 2018 1 次提交
    • M
      Extend Transaction::GetForUpdate with do_validate (#4680) · b878f93c
      Maysam Yabandeh 提交于
      Summary:
      Transaction::GetForUpdate is extended with a do_validate parameter with default value of true. If false it skips validating the snapshot (if there is any) before doing the read. After the read it also returns the latest value (expects the ReadOptions::snapshot to be nullptr). This allows RocksDB applications to use GetForUpdate similarly to how InnoDB does. Similarly ::Merge, ::Put, ::Delete, and ::SingleDelete are extended with assume_exclusive_tracked with default value of false. It true it indicates that call is assumed to be after a ::GetForUpdate(do_validate=false).
      The Java APIs are accordingly updated.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4680
      
      Differential Revision: D13068508
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f0b59db28f7f6a078b60844d902057140765e67d
      b878f93c
  13. 24 7月, 2018 1 次提交
    • M
      WriteUnPrepared: Implement unprepared batches for transactions (#4104) · ea212e53
      Manuel Ung 提交于
      Summary:
      This adds support for writing unprepared batches based on size defined in `TransactionOptions::max_write_batch_size`. This is done by overriding methods that modify data (Put/Delete/SingleDelete/Merge) and checking first if write batch size has exceeded threshold. If so, the write batch is written to DB as an unprepared batch.
      
      Support for Commit/Rollback for unprepared batch is added as well. This has been done by simply extending the WritePrepared Commit/Rollback logic to take care of all unprep_seq numbers either when updating prepare heap, or adding to commit map. For updating the commit map, this logic exists inside `WriteUnpreparedCommitEntryPreReleaseCallback`.
      
      A test change was also made to have transactions unregister themselves when committing without prepare. This is because with write unprepared, there may be unprepared entries (which act similarly to prepared entries) already when a commit is done without prepare.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4104
      
      Differential Revision: D8785717
      
      Pulled By: lth
      
      fbshipit-source-id: c02006e281ec1ce00f628e2a7beec0ee73096a91
      ea212e53
  14. 07 7月, 2018 1 次提交
    • M
      WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078) · b9846370
      Manuel Ung 提交于
      Summary:
      This adds support for recovering WriteUnprepared transactions through the following changes:
      - The information in `RecoveredTransaction` is extended so that it can reference multiple batches.
      - `MarkBeginPrepare` is extended with a bool indicating whether it is an unprepared begin, and this is passed down to `InsertRecoveredTransaction` to indicate whether the current transaction is prepared or not.
      - `WriteUnpreparedTxnDB::Initialize` is overridden so that it will rollback unprepared transactions from the recovered transactions. This can be done without updating the prepare heap/commit map, because this is before the DB has finished initializing, and after writing the rollback batch, those data structures should not contain information about the rolled back transaction anyway.
      
      Commit/Rollback of live transactions is still unimplemented and will come later.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4078
      
      Differential Revision: D8703382
      
      Pulled By: lth
      
      fbshipit-source-id: 7e0aada6c23bd39299f1f20d6c060492e0e6b60a
      b9846370
  15. 28 6月, 2018 1 次提交
    • M
      WriteUnPrepared Txn: Disable seek to snapshot optimization (#3955) · a16e00b7
      Manuel Ung 提交于
      Summary:
      This is implemented by extending ReadCallback with another function `MaxUnpreparedSequenceNumber` which returns the largest visible sequence number for the current transaction, if there is uncommitted data written to DB. Otherwise, it returns zero, indicating no uncommitted data.
      
      There are the places where reads had to be modified.
      - Get and Seek/Next was just updated to seek to max(snapshot_seq, MaxUnpreparedSequenceNumber()) instead, and iterate until a key was visible.
      - Prev did not need need updates since it did not use the Seek to sequence number optimization. Assuming that locks were held when writing unprepared keys, and ValidateSnapshot runs, there should only be committed keys and unprepared keys of the current transaction, all of which are visible. Prev will simply iterate to get the last visible key.
      - Reseeking to skip keys optimization was also disabled for write unprepared, since it's possible to hit the max_skip condition even while reseeking. There needs to be some way to resolve infinite looping in this case.
      Closes https://github.com/facebook/rocksdb/pull/3955
      
      Differential Revision: D8286688
      
      Pulled By: lth
      
      fbshipit-source-id: 25e42f47fdeb5f7accea0f4fd350ef35198caafe
      a16e00b7
  16. 01 6月, 2018 1 次提交
  17. 16 7月, 2017 1 次提交
  18. 10 2月, 2016 1 次提交
  19. 24 12月, 2014 1 次提交
  20. 08 11月, 2014 1 次提交
  21. 19 11月, 2013 1 次提交
  22. 09 10月, 2013 1 次提交
  23. 05 10月, 2013 1 次提交
  24. 24 8月, 2013 1 次提交
  25. 20 8月, 2013 1 次提交
  26. 08 12月, 2012 1 次提交
    • A
      GetUpdatesSince API to enable replication. · 80550089
      Abhishek Kona 提交于
      Summary:
      How it works:
      * GetUpdatesSince takes a SequenceNumber.
      * A LogFile with the first SequenceNumber nearest and lesser than the requested Sequence Number is found.
      * Seek in the logFile till the requested SeqNumber is found.
      * Return an iterator which contains logic to return record's one by one.
      
      Test Plan:
      * Test case included to check the good code path.
      * Will update with more test-cases.
      * Feedback required on test-cases.
      
      Reviewers: dhruba, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7119
      80550089