1. 11 11月, 2017 1 次提交
  2. 10 11月, 2017 2 次提交
  3. 09 11月, 2017 2 次提交
    • Y
      Blob DB: Fix race condition between flush and write · 5e9e5a47
      Yi Wu 提交于
      Summary:
      A race condition will happen when:
      * a user thread writes a value, but it hits the write stop condition because there are too many un-flushed memtables, while holding blob_db_impl.write_mutex_.
      * Flush is triggered and call flush begin listener and try to acquire blob_db_impl.write_mutex_.
      
      Fixing it.
      Closes https://github.com/facebook/rocksdb/pull/3149
      
      Differential Revision: D6279805
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 0e3c58afb78795ebe3360a2c69e05651e3908c40
      5e9e5a47
    • Y
      Blob DB: Fix release build · ca75f0a6
      Yi Wu 提交于
      Summary:
      `compression` shadow the method name in `BlobFile`. Rename it.
      Closes https://github.com/facebook/rocksdb/pull/3148
      
      Differential Revision: D6274498
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 7d293596530998b23b6b8a8940f983f9b6343a98
      ca75f0a6
  4. 08 11月, 2017 3 次提交
  5. 07 11月, 2017 1 次提交
    • M
      Add lock wait time as a perf context counter · e03377c7
      Manuel Ung 提交于
      Summary:
      Adds two new counters:
      
      `key_lock_wait_count` counts how many times a lock was blocked by another transaction and had to wait, instead of being granted the lock immediately.
      `key_lock_wait_time` counts the time spent acquiring locks.
      Closes https://github.com/facebook/rocksdb/pull/3107
      
      Differential Revision: D6217332
      
      Pulled By: lth
      
      fbshipit-source-id: 55d4f46da5550c333e523263422fd61d6a46deb9
      e03377c7
  6. 04 11月, 2017 6 次提交
    • Y
      Fix PinnableSlice move assignment · be410ded
      Yi Wu 提交于
      Summary:
      After move assignment, we need to re-initialized the moved PinnableSlice.
      
      Also update blob_db_impl.cc to not reuse the moved PinnableSlice since it is supposed to be in an undefined state after move.
      Closes https://github.com/facebook/rocksdb/pull/3127
      
      Differential Revision: D6238585
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: bd99f2e37406c4f7de160c7dee6a2e8126bc224e
      be410ded
    • S
      Remove unnecessary status check in TableCache::NewIterator · a6d8e30c
      Sagar Vemuri 提交于
      Summary:
      While investigating the usage of `new_table_iterator_nanos` perf counter, I saw some code was wrapper around with unnecessary status check ... so removed it.
      Closes https://github.com/facebook/rocksdb/pull/3120
      
      Differential Revision: D6229181
      
      Pulled By: sagar0
      
      fbshipit-source-id: f8a44fe67f5a05df94553fdb233b21e54e88cc34
      a6d8e30c
    • P
      util: Fix coverity issues · 4c8f3364
      Prashant D 提交于
      Summary:
      util/concurrent_arena.h:
      CID 1396145 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
      2. uninit_member: Non-static class member free_begin_ is not initialized in this constructor nor in any functions that it calls.
       94    Shard() : allocated_and_unused_(0) {}
      
      util/dynamic_bloom.cc:
      	1. Condition hash_func == NULL, taking true branch.
      
      CID 1322821 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
      3. uninit_member: Non-static class member data_ is not initialized in this constructor nor in any functions that it calls.
      47      hash_func_(hash_func == nullptr ? &BloomHash : hash_func) {}
      48
      
      util/file_reader_writer.h:
      204 private:
      205  AlignedBuffer buffer_;
         	member_not_init_in_gen_ctor: The compiler-generated constructor for this class does not initialize buffer_offset_.
      206  uint64_t buffer_offset_;
      
      CID 1418246 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
      member_not_init_in_gen_ctor: The compiler-generated constructor for this class does not initialize buffer_len_.
      207  size_t buffer_len_;
      208};
      
      util/thread_local.cc:
      341#endif
      
      CID 1322795 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
      3. uninit_member: Non-static class member pthread_key_ is not initialized in this constructor nor in any functions that it calls.
      342}
      
      40struct ThreadData {
         	2. uninit_member: Non-static class member next is not initialized in this constructor nor in any functions that it calls.
      
      CID 1400668 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
      4. uninit_member: Non-static class member prev is not initialized in this constructor nor in any functions that it calls.
       41  explicit ThreadData(ThreadLocalPtr::StaticMeta* _inst) : entries(), inst(_inst) {}
       42  std::vector<Entry> entries;
         	1. member_decl: Class member declaration for next.
       43  ThreadData* next;
         	3. member_decl: Class member declaration for prev.
       44  ThreadData* prev;
       45  ThreadLocalPtr::StaticMeta* inst;
       46};
      Closes https://github.com/facebook/rocksdb/pull/3123
      
      Differential Revision: D6233566
      
      Pulled By: sagar0
      
      fbshipit-source-id: aa2068790ea69787a0035c0db39d59b0c25108db
      4c8f3364
    • A
      fix CopyFile status checks · cfb120f7
      Andrew Kryczka 提交于
      Summary:
      copied from internal diff D6156261
      Closes https://github.com/facebook/rocksdb/pull/3124
      
      Differential Revision: D6230167
      
      Pulled By: ajkr
      
      fbshipit-source-id: 17926bb1152d607556364e3aacfec0ef3c115748
      cfb120f7
    • Y
      Fix clang build error · d9561695
      Yi Wu 提交于
      Summary:
      Fix cast from size_t to unsigned int.
      Closes https://github.com/facebook/rocksdb/pull/3125
      
      Differential Revision: D6232863
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 4c6131168b1faec26f7820b2cf4a09c242d323b7
      d9561695
    • Y
      Blob DB: Fix BlobDBTest::SnapshotAndGarbageCollection asan failure · 2581c0a5
      Yi Wu 提交于
      Summary:
      Fix unreleased snapshot at the end of the test.
      Closes https://github.com/facebook/rocksdb/pull/3126
      
      Differential Revision: D6232867
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 651ca3144fc573ea2ab0ab20f0a752fb4a101d26
      2581c0a5
  7. 03 11月, 2017 13 次提交
    • A
      pass key/value samples through zstd compression dictionary generator · 24ad4306
      Andrew Kryczka 提交于
      Summary:
      Instead of using samples directly, we now support passing the samples through zstd's dictionary generator when `CompressionOptions::zstd_max_train_bytes` is set to nonzero. If set to zero, we will use the samples directly as the dictionary -- same as before.
      
      Note this is the first step of #2987, extracted into a separate PR per reviewer request.
      Closes https://github.com/facebook/rocksdb/pull/3057
      
      Differential Revision: D6116891
      
      Pulled By: ajkr
      
      fbshipit-source-id: 70ab13cc4c734fa02e554180eed0618b75255497
      24ad4306
    • A
      dynamically change current memtable size · c4c1f961
      Andrew Kryczka 提交于
      Summary:
      Previously setting `write_buffer_size` with `SetOptions` would only apply to new memtables. An internal user wanted it to take effect immediately, instead of at an arbitrary future point, to prevent OOM.
      
      This PR makes the memtable's size mutable, and makes `SetOptions()` mutate it. There is one case when we preserve the old behavior, which is when memtable prefix bloom filter is enabled and the user is increasing the memtable's capacity. That's because the prefix bloom filter's size is fixed and wouldn't work as well on a larger memtable.
      Closes https://github.com/facebook/rocksdb/pull/3119
      
      Differential Revision: D6228304
      
      Pulled By: ajkr
      
      fbshipit-source-id: e44bd9d10a5f8c9d8c464bf7436070bb3eafdfc9
      c4c1f961
    • Z
      add missing else · 30e4e01e
      Zhongyi Xie 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3121
      
      Differential Revision: D6229415
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 57c7ad2fddf5dd6b8d7e3aaf6f62348151327dfb
      30e4e01e
    • P
      Fix coverity issues in include/rocksdb · 602fe945
      Prashant D 提交于
      Summary:
      include/rocksdb/metadata.h:
      struct ColumnFamilyMetaData {
      
      CID 1322804 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
      2. uninit_member: Non-static class member file_count is not initialized in this constructor nor in any functions that it calls.
      
      struct SstFileMetaData {
              2. uninit_member: Non-static class member size is not initialized in this constructor nor in any functions that it calls.
              4. uninit_member: Non-static class member smallest_seqno is not initialized in this constructor nor in any functions that it calls.
              6. uninit_member: Non-static class member largest_seqno is not initialized in this constructor nor in any functions that it calls.
              8. uninit_member: Non-static class member num_reads_sampled is not initialized in this constructor nor in any functions that it calls.
      
      CID 1322807 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
      10. uninit_member: Non-static class member being_compacted is not initialized in this constructor nor in any functions that it calls.
      
      include/rocksdb/sst_file_writer.h:
      struct ExternalSstFileInfo {
              2. uninit_member: Non-static class member sequence_number is not initialized in this constructor nor in any functions that it calls.
              4. uninit_member: Non-static class member file_size is not initialized in this constructor nor in any functions that it calls.
              6. uninit_member: Non-static class member num_entries is not initialized in this constructor nor in any functions that it calls.
      
      CID 1351697 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
      8. uninit_member: Non-static class member version is not initialized in this constructor nor in any functions that it calls.
       31  ExternalSstFileInfo() {}
      
      include/rocksdb/utilities/transaction.h:
      explicit Transaction(const TransactionDB* db) {}
              2. uninit_member: Non-static class member log_number_ is not initialized in this constructor nor in any functions that it calls.
      
      CID 1396133 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
      4. uninit_member: Non-static class member field txn_state_._M_i is not initialized in this constructor nor in any functions that it calls.
      473  Transaction() {}
      Closes https://github.com/facebook/rocksdb/pull/3100
      
      Differential Revision: D6227651
      
      Pulled By: sagar0
      
      fbshipit-source-id: 5caa4a2cf9471d1f9c3c073f81473636e1f0aa14
      602fe945
    • Y
      Blob DB: Add compaction filter to remove expired blob index entries · 62578d80
      Yi Wu 提交于
      Summary:
      After adding expiration to blob index in #3066, we are now able to add a compaction filter to cleanup expired blob index entries.
      Closes https://github.com/facebook/rocksdb/pull/3090
      
      Differential Revision: D6183812
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 9cb03267a9702975290e758c9c176a2c03530b83
      62578d80
    • S
      Add Memtable Read Tier to RocksJava · 76c3fbd6
      Sagar Vemuri 提交于
      Summary:
      This options was introduced in the C++ API in #1953 .
      Closes https://github.com/facebook/rocksdb/pull/3064
      
      Differential Revision: D6139010
      
      Pulled By: sagar0
      
      fbshipit-source-id: 164de11d539d174cf3afe7cd40e667049f44b0bc
      76c3fbd6
    • Y
      Blob DB: fix snapshot handling · 7bfa8803
      Yi Wu 提交于
      Summary:
      Blob db will keep blob file if data in the file is visible to an active snapshot. Before this patch it checks whether there is an active snapshot has sequence number greater than the earliest sequence in the file. This is problematic since we take snapshot on every read, if it keep having reads, old blob files will not be cleanup. Change to check if there is an active snapshot falls in the range of [earliest_sequence, obsolete_sequence) where obsolete sequence is
      1. if data is relocated to another file by garbage collection, it is the latest sequence at the time garbage collection finish
      2. otherwise, it is the latest sequence of the file
      Closes https://github.com/facebook/rocksdb/pull/3087
      
      Differential Revision: D6182519
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: cdf4c35281f782eb2a9ad6a87b6727bbdff27a45
      7bfa8803
    • Y
      Blob DB: option to enable garbage collection · f662f8f0
      Yi Wu 提交于
      Summary:
      Add an option to enable/disable auto garbage collection, where we keep counting how many keys have been evicted by either deletion or compaction and decide whether to garbage collect a blob file.
      
      Default disable auto garbage collection for now since the whole logic is not fully tested and we plan to make major change to it.
      Closes https://github.com/facebook/rocksdb/pull/3117
      
      Differential Revision: D6224756
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: cdf53bdccec96a4580a2b3a342110ad9e8864dfe
      f662f8f0
    • Y
      Blob DB: Fix flaky BlobDBTest::GCExpiredKeyWhileOverwriting test · 167ba599
      Yi Wu 提交于
      Summary:
      The test intent to wait until key being overwritten until proceed with garbage collection. It failed to wait for `PutUntil` finally finish. Fixing it.
      Closes https://github.com/facebook/rocksdb/pull/3116
      
      Differential Revision: D6222833
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: fa9b57a772b92a66cf250b44e7975c43f62f45c5
      167ba599
    • S
      Blob DB: Evict oldest blob file when close to blob db size limit · 25ac1697
      Sagar Vemuri 提交于
      Summary:
      Evict oldest blob file and put it in obsolete_files list when close to blob db size limit. The file will be delete when the `DeleteObsoleteFiles` background job runs next time.
      For now I set `kEvictOldestFileAtSize` constant, which controls when to evict the oldest file, at 90%. It could be tweaked or made into an option if really needed; I didn't want to expose it as an option pre-maturely as there are already too many :) .
      Closes https://github.com/facebook/rocksdb/pull/3094
      
      Differential Revision: D6187340
      
      Pulled By: sagar0
      
      fbshipit-source-id: 687f8262101b9301bf964b94025a2fe9d8573421
      25ac1697
    • P
      HistogramStat: Handle divide by zero situation · 3c208e76
      Prashant D 提交于
      Summary:
      The num() might return cur_num as 0 and we are making sure that
      cur_num will not be 0 down the path. The mult variable is being set to
      100.0/cur_num which makes program crash when cur_num is 0.
      Closes https://github.com/facebook/rocksdb/pull/3105
      
      Differential Revision: D6222594
      
      Pulled By: ajkr
      
      fbshipit-source-id: 986154709897ff4dbbeb0e8aa81eb8c0b2a2db76
      3c208e76
    • M
      Remove the experimental notes about partitioning · 25fbd9a9
      Maysam Yabandeh 提交于
      Summary:
      This patch will remove the existing comments that declare partitioning indexes and filters as experimental.
      Closes https://github.com/facebook/rocksdb/pull/3115
      
      Differential Revision: D6222227
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 6179ec43b22c518494051b674d91c9e1b54d4ac0
      25fbd9a9
    • M
      WritePrepared Txn: Move DB class to its own file · 60d83df2
      Maysam Yabandeh 提交于
      Summary:
      Move  WritePreparedTxnDB from pessimistic_transaction_db.h to its own header, write_prepared_txn_db.h
      Closes https://github.com/facebook/rocksdb/pull/3114
      
      Differential Revision: D6220987
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 18893fb4fdc6b809fe117dabb544080f9b4a301b
      60d83df2
  8. 02 11月, 2017 5 次提交
    • A
      fix duplicate definition of GetEntryType() · 6778690b
      Andrew Kryczka 提交于
      Summary:
      It's also defined in db/dbformat.cc per 7fe3b328
      Closes https://github.com/facebook/rocksdb/pull/3111
      
      Differential Revision: D6219140
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0f2b14e41457334a4665c6b7e3f42f1a060a0f35
      6778690b
    • A
      release 5.9 · cd124215
      Andrew Kryczka 提交于
      Summary:
      updated HISTORY.md and version.h for the release.
      Closes https://github.com/facebook/rocksdb/pull/3110
      
      Differential Revision: D6218645
      
      Pulled By: ajkr
      
      fbshipit-source-id: 99ab8473e9088b02d7596e92351cce7a60a99e93
      cd124215
    • M
      WritePrepared Txn: ValidateSnapshot · 02693f64
      Maysam Yabandeh 提交于
      Summary:
      Implements ValidateSnapshot for WritePrepared txns and also adds a unit test to clarify the contract of this function.
      Closes https://github.com/facebook/rocksdb/pull/3101
      
      Differential Revision: D6199405
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: ace509934c307ea5d26f4bbac5f836d7c80fd240
      02693f64
    • M
      Added support for differential snapshots · 7fe3b328
      Mikhail Antonov 提交于
      Summary:
      The motivation for this PR is to add to RocksDB support for differential (incremental) snapshots, as snapshot of the DB changes between two points in time (one can think of it as diff between to sequence numbers, or the diff D which can be thought of as an SST file or just set of KVs that can be applied to sequence number S1 to get the database to the state at sequence number S2).
      
      This feature would be useful for various distributed storages layers built on top of RocksDB, as it should help reduce resources (time and network bandwidth) needed to recover and rebuilt DB instances as replicas in the context of distributed storages.
      
      From the API standpoint that would like client app requesting iterator between (start seqnum) and current DB state, and reading the "diff".
      
      This is a very draft PR for initial review in the discussion on the approach, i'm going to rework some parts and keep updating the PR.
      
      For now, what's done here according to initial discussions:
      
      Preserving deletes:
       - We want to be able to optionally preserve recent deletes for some defined period of time, so that if a delete came in recently and might need to be included in the next incremental snapshot it would't get dropped by a compaction. This is done by adding new param to Options (preserve deletes flag) and new variable to DB Impl where we keep track of the sequence number after which we don't want to drop tombstones, even if they are otherwise eligible for deletion.
       - I also added a new API call for clients to be able to advance this cutoff seqnum after which we drop deletes; i assume it's more flexible to let clients control this, since otherwise we'd need to keep some kind of timestamp < -- > seqnum mapping inside the DB, which sounds messy and painful to support. Clients could make use of it by periodically calling GetLatestSequenceNumber(), noting the timestamp, doing some calculation and figuring out by how much we need to advance the cutoff seqnum.
       - Compaction codepath in compaction_iterator.cc has been modified to avoid dropping tombstones with seqnum > cutoff seqnum.
      
      Iterator changes:
       - couple params added to ReadOptions, to optionally allow client to request internal keys instead of user keys (so that client can get the latest value of a key, be it delete marker or a put), as well as min timestamp and min seqnum.
      
      TableCache changes:
       - I modified table_cache code to be able to quickly exclude SST files from iterators heep if creation_time on the file is less then iter_start_ts as passed in ReadOptions. That would help a lot in some DB settings (like reading very recent data only or using FIFO compactions), but not so much for universal compaction with more or less long iterator time span.
      
      What's left:
      
       - Still looking at how to best plug that inside DBIter codepath. So far it seems that FindNextUserKeyInternal only parses values as UserKeys, and iter->key() call generally returns user key. Can we add new API to DBIter as internal_key(), and modify this internal method to optionally set saved_key_ to point to the full internal key? I don't need to store actual seqnum there, but I do need to store type.
      Closes https://github.com/facebook/rocksdb/pull/2999
      
      Differential Revision: D6175602
      
      Pulled By: mikhail-antonov
      
      fbshipit-source-id: c779a6696ee2d574d86c69cec866a3ae095aa900
      7fe3b328
    • M
      WritePrepared Txn: Optimize for recoverable state · 17731a43
      Maysam Yabandeh 提交于
      Summary:
      GetCommitTimeWriteBatch is currently used to store some state as part of commit in 2PC. In MyRocks it is specifically used to store some data that would be needed only during recovery. So it is not need to be stored in memtable right after each commit.
      This patch enables an optimization to write the GetCommitTimeWriteBatch only to the WAL. The batch will be written to memtable during recovery when the WAL is replayed. To cover the case when WAL is deleted after memtable flush, the batch is also buffered and written to memtable right before each memtable flush.
      Closes https://github.com/facebook/rocksdb/pull/3071
      
      Differential Revision: D6148023
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 2d09bae5565abe2017c0327421010d5c0d55eaa7
      17731a43
  9. 01 11月, 2017 4 次提交
  10. 31 10月, 2017 2 次提交
  11. 30 10月, 2017 1 次提交