1. 06 10月, 2017 3 次提交
  2. 05 10月, 2017 4 次提交
    • M
      WritePrepared Txn: duplicate keys · 4e3c3d8c
      Maysam Yabandeh 提交于
      Summary:
      With WriteCommitted, when the write batch has duplicate keys, the txn db simply inserts them to the db with different seq numbers and let the db ignore/merge the duplicate values at the read time. With WritePrepared all the entries of the batch are inserted with the same seq number which prevents us from benefiting from this simple solution.
      
      This patch applies a hackish solution to unblock the end-to-end testing. The hack is to be replaced with a proper solution soon. The patch simply detects the duplicate key insertions, and mark the previous one as obsolete. Then before writing to the db it rewrites the batch eliminating the obsolete keys. This would incur a memcpy cost. Furthermore handing duplicate merge would require to do FullMerge instead of simply ignoring the previous value, which is not handled by this patch.
      Closes https://github.com/facebook/rocksdb/pull/2969
      
      Differential Revision: D5976337
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 114e65b66f137d8454ff2d1d782b8c05da95f989
      4e3c3d8c
    • A
      rate limit auto-tuning · 1026e794
      Andrew Kryczka 提交于
      Summary:
      Dynamic adjustment of rate limit according to demand for background I/O. It increases by a factor when limiter is drained too frequently, and decreases by the same factor when limiter is not drained frequently enough. The parameters for this behavior are fixed in `GenericRateLimiter::Tune`. Other changes:
      
      - make rate limiter's `Env*` configurable for testing
      - track num drain intervals in RateLimiter so we don't have to rely on stats, which may be shared across different DB instances from the ones that share the RateLimiter.
      Closes https://github.com/facebook/rocksdb/pull/2899
      
      Differential Revision: D5858704
      
      Pulled By: ajkr
      
      fbshipit-source-id: cc2bac30f85e7f6fd63655d0a6732ef9ed7403b1
      1026e794
    • A
      Added CPU prefetch for skiplist · 75f7f42d
      Adam Kupczyk 提交于
      Summary:
      This change causes following changes result of test:
      ./db_bench --writes 10000000 --benchmarks="fillrandom" --compression_type none
      from
      fillrandom   :       3.177 micros/op 314804 ops/sec;   34.8 MB/s
      to
      fillrandom   :       2.777 micros/op 360087 ops/sec;   39.8 MB/s
      Closes https://github.com/facebook/rocksdb/pull/2961
      
      Differential Revision: D5977822
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 1ea77707bffa978b1592b0c5d0fe76bfa1930f8d
      75f7f42d
    • M
      Allow upgrades from nullptr to some merge operator · 88ed1f6e
      Manuel Ung 提交于
      Summary:
      Currently, RocksDB does not allow reopening a preexisting DB with no merge operator defined, with a merge operator defined. This means that if a DB ever want to add a merge operator, there's no way to do so currently.
      
      Fix this by adding a new verification type `kByNameAllowFromNull` which will allow old values to be nullptr, and new values to be non-nullptr.
      Closes https://github.com/facebook/rocksdb/pull/2958
      
      Differential Revision: D5961131
      
      Pulled By: lth
      
      fbshipit-source-id: 06179bebd0d90db3d43690b5eb7345e2d5bab1eb
      88ed1f6e
  3. 04 10月, 2017 6 次提交
    • A
      Prevent threads from respawning during joining · 5b2cb64b
      Andrew Kryczka 提交于
      Summary:
      Previously the thread pool might be non-empty after joining since concurrent submissions could spawn new threads. This problem didn't affect our background flush/compaction thread pools because the `shutting_down_` flag prevented new jobs from being submitted during/after joining. But I wanted to be able to reuse the `ThreadPool` without such external synchronization.
      Closes https://github.com/facebook/rocksdb/pull/2953
      
      Differential Revision: D5951920
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0efec7d0056d36d1338367da75e8b0c089bbc973
      5b2cb64b
    • A
      pin L0 filters/indexes for compaction outputs · 82188703
      Andrew Kryczka 提交于
      Summary:
      We need to tell the iterator the compaction output file's level so it can apply proper optimizations, like pinning filter and index blocks when user enables `pin_l0_filter_and_index_blocks_in_cache` and the output file's level is zero.
      Closes https://github.com/facebook/rocksdb/pull/2949
      
      Differential Revision: D5945597
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2389decf9026ffaa32d45801a77d002529f64a62
      82188703
    • M
      fix valgrind leak report in unit test · 283d6076
      Maysam Yabandeh 提交于
      Summary:
      I cannot locally reproduce the valgrind leak report but based on my code inspection not deleting txn1 might be the reason.
      ```
      ==197848== 2,990 (544 direct, 2,446 indirect) bytes in 1 blocks are definitely lost in loss record 15 of 16
      ==197848==    at 0x4C2D06F: operator new(unsigned long) (in /usr/local/fbcode/gcc-5-glibc-2.23/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
      ==197848==    by 0x7D5B31: rocksdb::WritePreparedTxnDB::BeginTransaction(rocksdb::WriteOptions const&, rocksdb::TransactionOptions const&, rocksdb::Transaction*) (pessimistic_transaction_db.cc:173)
      ==197848==    by 0x7D80C1: rocksdb::PessimisticTransactionDB::Initialize(std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> > const&) (pessimistic_transaction_db.cc:115)
      ==197848==    by 0x7DC42F: rocksdb::WritePreparedTxnDB::Initialize(std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> > const&) (pessimistic_transaction_db.cc:151)
      ==197848==    by 0x7D8CA0: rocksdb::TransactionDB::WrapDB(rocksdb::DB*, rocksdb::TransactionDBOptions const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> > const&, rocksdb::TransactionDB**) (pessimistic_transaction_db.cc:275)
      ==197848==    by 0x7D9F26: rocksdb::TransactionDB::Open(rocksdb::DBOptions const&, rocksdb::TransactionDBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::TransactionDB**) (pessimistic_transaction_db.cc:227)
      ==197848==    by 0x7DB349: rocksdb::TransactionDB::Open(rocksdb::Options const&, rocksdb::TransactionDBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::TransactionDB**) (pessimistic_transaction_db.cc:198)
      ==197848==    by 0x52ABD2: rocksdb::TransactionTest::ReOpenNoDelete() (transaction_test.h:87)
      ==197848==    by 0x51F7B8: rocksdb::WritePreparedTransactionTest_BasicRecoveryTest_Test::TestBody() (write_prepared_transaction_test.cc:843)
      ==197848==    by 0x857557: HandleSehExceptionsInMethodIfSupported<testing::Test, void> (gtest-all.cc:3824)
      ==197848==    by 0x857557: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest-all.cc:3860)
      ==197848==    by 0x84E7EB: testing::Test::Run() [clone .part.485] (gtest-all.cc:3897)
      ==197848==    by 0x84E9BC: Run (gtest-all.cc:3888)
      ==197848==    by 0x84E9BC: testing::TestInfo::Run() [clone .part.486] (gtest-all.cc:4072)
      ```
      Closes https://github.com/facebook/rocksdb/pull/2963
      
      Differential Revision: D5968856
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 2ac512bbcad37dc8eeeffe4f363978913354180c
      283d6076
    • S
      Fix DBOptionsTest.SetBytesPerSync test when run with no compression · 377e0040
      Sagar Vemuri 提交于
      Summary:
      Also made the test more easier to understand:
      - changed the value size to ~1MB.
      - switched to NoCompression. We don't anyway need compression in this test for dynamic options.
      
      The test failures started happening starting from: #2893 .
      Closes https://github.com/facebook/rocksdb/pull/2957
      
      Differential Revision: D5959392
      
      Pulled By: sagar0
      
      fbshipit-source-id: 2d55641e429246328bc6d10fcb9ef540d6ce07da
      377e0040
    • Y
      speedup 'make check' · 92ccae71
      Yi Wu 提交于
      Summary:
      Make SnapshotConcurrentAccessTest run in the beginning of the queue.
      
      Test Plan
      `make all check -j64` on devserver
      Closes https://github.com/facebook/rocksdb/pull/2962
      
      Differential Revision: D5965871
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 8cb5a47c2468be0fbbb929226a143ec5848bfaa9
      92ccae71
    • Y
      Add ValueType::kTypeBlobIndex · d1cab2b6
      Yi Wu 提交于
      Summary:
      Add kTypeBlobIndex value type, which will be used by blob db only, to insert a (key, blob_offset) KV pair. The purpose is to
      1. Make it possible to open existing rocksdb instance as blob db. Existing value will be of kTypeIndex type, while value inserted by blob db will be of kTypeBlobIndex.
      2. Make rocksdb able to detect if the db contains value written by blob db, if so return error.
      3. Make it possible to have blob db optionally store value in SST file (with kTypeValue type) or as a blob value (with kTypeBlobIndex type).
      
      The root db (DBImpl) basically pretended kTypeBlobIndex are normal value on write. On Get if is_blob is provided, return whether the value read is of kTypeBlobIndex type, or return Status::NotSupported() status if is_blob is not provided. On scan allow_blob flag is pass and if the flag is true, return wether the value is of kTypeBlobIndex type via iter->IsBlob().
      
      Changes on blob db side will be in a separate patch.
      Closes https://github.com/facebook/rocksdb/pull/2886
      
      Differential Revision: D5838431
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 3c5306c62bc13bb11abc03422ec5cbcea1203cca
      d1cab2b6
  4. 03 10月, 2017 6 次提交
  5. 30 9月, 2017 2 次提交
  6. 29 9月, 2017 7 次提交
    • M
      Fix for when block.cache_handle is nullptr · ab0542f5
      Maysam Yabandeh 提交于
      Summary:
      When using with compressed cache it is possible that the status is ok but the block is not actually added to the block cache. The patch takes this case into account.
      Closes https://github.com/facebook/rocksdb/pull/2945
      
      Differential Revision: D5937613
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 5428cf1115e5046b3d01ab78d26cb181122af4c6
      ab0542f5
    • A
      fix deletion-triggered compaction in table builder · 5df172da
      Andrew Kryczka 提交于
      Summary:
      It was broken when `NotifyCollectTableCollectorsOnFinish` was introduced. That function called `Finish` on each of the `TablePropertiesCollector`s, and `CompactOnDeletionCollector::Finish()` was resetting all its internal state. Then, when we checked whether compaction is necessary, the flag had already been cleared.
      
      Fixed above issue by avoiding resetting internal state during `Finish()`. Multiple calls to `Finish()` are allowed, but callers cannot invoke `AddUserKey()` on the collector after any finishes.
      Closes https://github.com/facebook/rocksdb/pull/2936
      
      Differential Revision: D5918659
      
      Pulled By: ajkr
      
      fbshipit-source-id: 4f05e9d80e50ee762ba1e611d8d22620029dca6b
      5df172da
    • M
      WritePrepared Txn: Recovery · 385049ba
      Maysam Yabandeh 提交于
      Summary:
      Recover txns from the WAL. Also added some unit tests.
      Closes https://github.com/facebook/rocksdb/pull/2901
      
      Differential Revision: D5859596
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 6424967b231388093b4effffe0a3b1b7ec8caeb0
      385049ba
    • Y
      Default one to rocksdb:x64-windows · 8c724f5c
      Yu Shu 提交于
      Summary:
      The default one will try to install rocksdb:x86-windows, which would lead to failing of the build at the last step (CMake Error, Rocksdb only supports x64). Because it will try to install a serials of x86 version package, and those cannot proceed to rocksdb:x86-windows building. By using rocksdb:x64-windows, we can make sure to install x64 version.
      Tested on Win10 x64.
      Closes https://github.com/facebook/rocksdb/pull/2941
      
      Differential Revision: D5937139
      
      Pulled By: sagar0
      
      fbshipit-source-id: 15637fe23df59326a0e607bd4d5c48733e20bae3
      8c724f5c
    • S
      Introduce conditional merge-operator invocation in point lookups · 93c2b917
      Sagar Vemuri 提交于
      Summary:
      For every merge operand encountered for a key in the read path we now have the ability to decide whether to look further (to retrieve more merge operands for the key) or stop and invoke the merge operator to return the value. The user needs to override `ShouldMerge()` method with a condition to terminate search when true to avail this facility.
      
      This has a couple of advantages:
      1. It helps in limiting the number of merge operands that are looked at to compute a value as part of a user Get operation.
      2. It allows to peek at a merge key-value to see if further merge operands need to look at.
      
      Example: Limiting the number of merge operands that are looked at: Lets say you have 10 merge operands for a key spread over various levels. If you only want RocksDB to look at the latest two merge operands instead of all 10 to compute the value, it is now possible with this PR. You can set the condition in `ShouldMerge()` to return true when the size of the operand list is 2. Look at the example implementation in the unit test. Without this PR, a Get might look at all the 10 merge operands in different levels before invoking the merge-operator.
      
      Added a new unit test.
      Made sure that there is no perf regression by running benchmarks.
      
      Command line to Load data:
      ```
      TEST_TMPDIR=/dev/shm ./db_bench --benchmarks="mergerandom" --merge_operator="uint64add" --num=10000000
      ...
      mergerandom  :      12.861 micros/op 77757 ops/sec;    8.6 MB/s ( updates:10000000)
      ```
      
      **ReadRandomMergeRandom bechmark results:**
      Command line:
      ```
      TEST_TMPDIR=/dev/shm ./db_bench --benchmarks="readrandommergerandom" --merge_operator="uint64add" --num=10000000
      ```
      
      Base -- Without this code change (on commit fc7476be):
      ```
      readrandommergerandom :      38.586 micros/op 25916 ops/sec; (reads:3001599 merges:6998401 total:10000000 hits:842235 maxlength:8)
      ```
      
      With this code change:
      ```
      readrandommergerandom :      38.653 micros/op 25870 ops/sec; (reads:3001599 merges:6998401 total:10000000 hits:842235 maxlength:8)
      ```
      Closes https://github.com/facebook/rocksdb/pull/2923
      
      Differential Revision: D5898239
      
      Pulled By: sagar0
      
      fbshipit-source-id: daefa325019f77968639a75c851d46352c2303ef
      93c2b917
    • A
      Use RAII instead of pointers in cf_info_map · a48a398e
      Aliaksei Sandryhaila 提交于
      Summary:
      There is no need for smart pointers in cf_info_map, so use RAII. This should also placate valgrind.
      Closes https://github.com/facebook/rocksdb/pull/2943
      
      Differential Revision: D5932941
      
      Pulled By: asandryh
      
      fbshipit-source-id: 2c37df88573a9df2557880a31193926e4425e054
      a48a398e
    • M
      Blog post for 5.8 release · c7058662
      Maysam Yabandeh 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/2942
      
      Differential Revision: D5932858
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: e11f52a0b08d65149bb49d99d1dbc82cb5a96fa0
      c7058662
  7. 28 9月, 2017 3 次提交
  8. 27 9月, 2017 2 次提交
  9. 26 9月, 2017 1 次提交
  10. 23 9月, 2017 3 次提交
    • Z
      Add test kPointInTimeRecoveryCFConsistency · 1d6700f9
      Zhongyi Xie 提交于
      Summary:
      Context/problem:
      
      - CFs may be flushed at different times
      - A WAL can only be deleted after all CFs have flushed beyond end of that WAL.
      - Point-in-time recovery might stop upon reaching the first corruption.
      - Some CFs may have already flushed beyond that point, while others haven't. We should fail the Open() instead of proceeding with inconsistent CFs.
      Closes https://github.com/facebook/rocksdb/pull/2900
      
      Differential Revision: D5863281
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 180dbaf83d96c804cff49b3c406312a4ae61313e
      1d6700f9
    • Y
      Fix WritePreparedTransactionTest::SeqAdvanceTest ASAN failure · be97dbb1
      Yi Wu 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/2922
      
      Differential Revision: D5895310
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 52c635a25d22478ec1eca49b6817551202babac2
      be97dbb1
    • A
      Repair DBs with trailing slash in name · 4708a687
      Andrew Kryczka 提交于
      Summary:
      Problem:
      
      - `DB::SanitizeOptions` strips trailing slash from `wal_dir` but not `dbname`
      - We check whether `wal_dir` and `dbname` refer to the same directory using string equality: https://github.com/facebook/rocksdb/blob/master/db/repair.cc#L258
      - Providing `dbname` with trailing slash causes default `wal_dir` to be misidentified as a separate directory.
      - Then the repair tries to add all SST files to the `VersionEdit` twice (once for `dbname` dir, once for `wal_dir`) and fails with coredump.
      
      Solution:
      
      - Add a new `Env` function, `AreFilesSame`, which uses device and inode number to check whether files are the same. It's currently only implemented in `PosixEnv`.
      - Migrate repair to use `AreFilesSame` to check whether `dbname` and `wal_dir` are same. If unsupported, falls back to string comparison.
      Closes https://github.com/facebook/rocksdb/pull/2827
      
      Differential Revision: D5761349
      
      Pulled By: ajkr
      
      fbshipit-source-id: c839d548678b742af1166d60b09abd94e5476238
      4708a687
  11. 22 9月, 2017 3 次提交