1. 21 4月, 2018 4 次提交
    • A
      Add a stat for MultiGet keys found, update memtable hit/miss stats · dbdaa466
      Anand Ananthabhotla 提交于
      Summary:
      1. Add a new ticker stat rocksdb.number.multiget.keys.found to track the
      number of keys successfully read
      2. Update rocksdb.memtable.hit/miss in DBImpl::MultiGet(). It was being done in
      DBImpl::GetImpl(), but not MultiGet
      Closes https://github.com/facebook/rocksdb/pull/3730
      
      Differential Revision: D7677364
      
      Pulled By: anand1976
      
      fbshipit-source-id: af22bd0ef8ddc5cf2b4244b0a024e539fe48bca5
      dbdaa466
    • M
      WritePrepared Txn: enable TryAgain for duplicates at the end of the batch · c3d1e36c
      Maysam Yabandeh 提交于
      Summary:
      The WriteBatch::Iterate will try with a larger sequence number if the memtable reports a duplicate. This status is specified with TryAgain status. So far the assumption was that the last entry in the batch will never return TryAgain, which is correct when WAL is created via WritePrepared since it always appends a batch separator if a natural one does not exist. However when reading a WAL generated by WriteCommitted this batch separator might  not exist. Although WritePrepared is not supposed to be able to read the WAL generated by WriteCommitted we should avoid confusing scenarios in which the behavior becomes unpredictable. The path fixes that by allowing TryAgain even for the last entry of the write batch.
      Closes https://github.com/facebook/rocksdb/pull/3747
      
      Differential Revision: D7708391
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: bfaddaa9b14a4cdaff6977f6f63c789a6ab1ee0d
      c3d1e36c
    • M
      Propagate fill_cache config to partitioned index iterator · 17e04039
      Maysam Yabandeh 提交于
      Summary:
      Currently the partitioned index iterator creates a new ReadOptions which ignores the fill_cache config set to ReadOptions passed by the user. The patch propagates fill_cache from the user's ReadOptions to that of partition index iterator.
      Also it clarifies the contract of fill_cache that i) it does not apply to filters, ii) it still charges block cache for the size of the data block, it still pin the block if it is already in the block cache.
      Closes https://github.com/facebook/rocksdb/pull/3739
      
      Differential Revision: D7678308
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 53ed96424ae922e499e2d4e3580ddc3f0db893da
      17e04039
    • P
      Fix GitHub issue #3716: gcc-8 warnings · dee95a1a
      przemyslaw.skibinski@percona.com 提交于
      Summary:
      Fix the following gcc-8 warnings:
      - conflicting C language linkage declaration [-Werror]
      - writing to an object with no trivial copy-assignment [-Werror=class-memaccess]
      - array subscript -1 is below array bounds [-Werror=array-bounds]
      
      Solves https://github.com/facebook/rocksdb/issues/3716
      Closes https://github.com/facebook/rocksdb/pull/3736
      
      Differential Revision: D7684161
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 47c0423d26b74add251f1d3595211eee1e41e54a
      dee95a1a
  2. 20 4月, 2018 3 次提交
  3. 19 4月, 2018 3 次提交
    • Y
      Add block cache related DB properties · ad511684
      Yi Wu 提交于
      Summary:
      Add DB properties "rocksdb.block-cache-capacity", "rocksdb.block-cache-usage", "rocksdb.block-cache-pinned-usage" to show block cache usage.
      Closes https://github.com/facebook/rocksdb/pull/3734
      
      Differential Revision: D7657180
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: dd34a019d5878dab539c51ee82669e97b2b745fd
      ad511684
    • A
      include thread-pool priority in thread names · 3cea6139
      Andrew Kryczka 提交于
      Summary:
      Previously threads were named "rocksdb:bg\<index in thread pool\>", so the first thread in all thread pools would be named "rocksdb:bg0". Users want to be able to distinguish threads used for flush (high-pri) vs regular compaction (low-pri) vs compaction to bottom-level (bottom-pri). So I changed the thread naming convention to include the thread-pool priority.
      Closes https://github.com/facebook/rocksdb/pull/3702
      
      Differential Revision: D7581415
      
      Pulled By: ajkr
      
      fbshipit-source-id: ce04482b6acd956a401ef22dc168b84f76f7d7c1
      3cea6139
    • M
      Improve db_stress with transactions · 6d06be22
      Maysam Yabandeh 提交于
      Summary:
      db_stress was already capable running transactions by setting use_txn. Running it under stress showed a couple of problems fixed in this patch.
      - The uncommitted transaction must be either rolled back or commit after recovery.
      - Current implementation of WritePrepared transaction cannot handle cf drop before crash. Clarified that in the comments and added safety checks. When running with use_txn, clear_column_family_one_in must be set to 0.
      Closes https://github.com/facebook/rocksdb/pull/3733
      
      Differential Revision: D7654419
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: a024bad80a9dc99677398c00d29ff17d4436b7f3
      6d06be22
  4. 18 4月, 2018 1 次提交
  5. 17 4月, 2018 3 次提交
  6. 16 4月, 2018 4 次提交
  7. 14 4月, 2018 5 次提交
    • A
      Implemented Knuth shuffle to construct permutation for selecting no_o… · 28087acd
      Amy Tai 提交于
      Summary:
      …verwrite_keys. Also changed each no_overwrite_key set to an unordered set, otherwise Knuth shuffle only gets you 2x time improvement, because insertion (and subsequent internal sorting) into an ordered set is the bottleneck.
      
      With this change, each iteration of permutation construction and prefix selection takes around 40 secs, as opposed to 360 secs previously. However, this still means that with the default 10 CF per blackbox test case, the test is going to time out given the default interval of 200 secs.
      
      Also, there is currently an assertion error affecting all blackbox tests in db_crashtest.py; this assertion error will be fixed in a future PR.
      Closes https://github.com/facebook/rocksdb/pull/3699
      
      Differential Revision: D7624616
      
      Pulled By: amytai
      
      fbshipit-source-id: ea64fbe83407ff96c1c0ecabbc6c830576939393
      28087acd
    • X
      Make database files' permissions configurable · a0102aa6
      Xiaofei Du 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3709
      
      Differential Revision: D7610227
      
      Pulled By: xiaofeidu008
      
      fbshipit-source-id: 88a52f0f9f96e2195fccde995cf9760b785e9f07
      a0102aa6
    • Z
      add kEntryRangeDeletion · 31ee4bf2
      zhangjinpeng1987 提交于
      Summary:
      When there are many range deletions in a range, we want to trigger manual compaction on this range to reclaim disk space as soon as possible and speed up read.
      After this change, we can collect informations of range deletions and store them into user properties which can guide our manual compaction.
      Closes https://github.com/facebook/rocksdb/pull/3695
      
      Differential Revision: D7570322
      
      Pulled By: ajkr
      
      fbshipit-source-id: c358fa43b0aac6cc954d2eadc7d3bd8015373369
      31ee4bf2
    • S
      Merge raw and shared pointer log method impls · 1f5457ef
      Steven Fackler 提交于
      Summary:
      Calling rocksdb::Log, rocksdb::Info, etc with a `shared_ptr<Logger>` should behave the same as calling those functions with a `Logger *`. This PR achieves it by making the `shared_ptr<Logger>` versions delegate to the `Logger *` versions.
      
      Closes #3689
      Closes https://github.com/facebook/rocksdb/pull/3710
      
      Differential Revision: D7595557
      
      Pulled By: ajkr
      
      fbshipit-source-id: 64dd7f20fd42dc821bac7b8032705c35b483e00d
      1f5457ef
    • Y
      Improve accuracy of I/O stats collection of external SST ingestion. · c81b0abe
      Yanqin Jin 提交于
      Summary:
      RocksDB supports ingestion of external ssts. If ingestion_options.move_files is true, when performing ingestion, RocksDB first tries to link external ssts. If external SST file resides on a different FS, or the underlying FS does not support hard link, then RocksDB performs actual file copy. However, no matter which choice is made, current code increase bytes-written when updating compaction stats, which is inaccurate when RocksDB does NOT copy file.
      
      Rename a sync point.
      Closes https://github.com/facebook/rocksdb/pull/3713
      
      Differential Revision: D7604151
      
      Pulled By: riversand963
      
      fbshipit-source-id: dd0c0d9b9a69c7d9ffceafc3d9c23371aa413586
      c81b0abe
  8. 13 4月, 2018 2 次提交
  9. 12 4月, 2018 2 次提交
    • M
      WritePrepared Txn: fix smallest_prep atomicity issue · 6f5e6445
      Maysam Yabandeh 提交于
      Summary:
      We introduced smallest_prep optimization in this commit b225de7e, which enables storing the smallest uncommitted sequence number along with the snapshot. This enables the readers that read from the snapshot to skip further checks and safely assumed the data is committed if its sequence number is less than smallest uncommitted when the snapshot was taken. The problem was that smallest uncommitted and the snapshot must be taken atomically, and the lack of atomicity had led to readers using a smallest uncommitted after the snapshot was taken and hence mistakenly skipping some data.
      This patch fixes the problem by i) separating the process of removing of prepare entries from the AddCommitted function, ii) removing the prepare entires AFTER the committed sequence number is published, iii) getting smallest uncommitted (from the prepare list) BEFORE taking a snapshot. This guarantees that the smallest uncommitted that is accompanied with a snapshot is less than or equal of such number if it was obtained atomically.
      
      Tested by running MySQLStyleTransactionTest/MySQLStyleTransactionTest.TransactionStressTest that was failing sporadically.
      Closes https://github.com/facebook/rocksdb/pull/3703
      
      Differential Revision: D7581934
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: dc9d6f4fb477eba75d4d5927326905b548a96a32
      6f5e6445
    • Y
      Improve visibility into the reasons for compaction. · d42bd041
      Yanqin Jin 提交于
      Summary:
      Add `compaction_reason` as part of event log for event `compaction started`.
      Add counters for each `CompactionReason`.
      Closes https://github.com/facebook/rocksdb/pull/3679
      
      Differential Revision: D7550348
      
      Pulled By: riversand963
      
      fbshipit-source-id: a19cff3a678c785aa5ef41aac78b9a5968fcc34d
      d42bd041
  10. 11 4月, 2018 3 次提交
    • A
      fix calling SetOptions on deprecated options · 019d7894
      Andrew Kryczka 提交于
      Summary:
      In `cf_options_type_info`, the deprecated options are all considered to have offset zero in the `MutableCFOptions` struct. Previously we weren't checking in `GetMutableOptionsFromStrings` whether the provided option was deprecated or not and simply writing the provided value to the offset specified by `cf_options_type_info`. That meant setting any deprecated option would overwrite the first element in the struct, which is `write_buffer_size`. `db_stress` hit this often since it calls `SetOptions` with `soft_rate_limit=0` and `hard_rate_limit=0`, which are both deprecated so cause `write_buffer_size` to be set to zero, which causes it to crash on the following assertion:
      
      ```
      db_stress: db/memtable.cc:106: rocksdb::MemTable::MemTable(const rocksdb::InternalKeyComparator&, const rocksdb::ImmutableCFOptions&, const rocksdb::MutableCFOptions&, rocksdb::WriteBufferManager*, rocksdb::SequenceNumber, uint32_t): Assertion `!ShouldScheduleFlush()' failed.
      ```
      
      We fix it by skipping deprecated options (and logging a warning) when users provide them to `SetOptions`. I didn't want to fail the call for compatibility reasons.
      Closes https://github.com/facebook/rocksdb/pull/3700
      
      Differential Revision: D7572596
      
      Pulled By: ajkr
      
      fbshipit-source-id: bd5d84e14c0c39f30c5d4c6df7c1503d2c28ecf1
      019d7894
    • Y
      fix some text in comments. · d95014b9
      Yanqin Jin 提交于
      Summary:
      1. Remove redundant text.
      2. Make terminology consistent across all comments and doc of RocksDB. Also do
         our best to conform to conventions. Specifically, use 'callback' instead of
         'call-back' [wikipedia](https://en.wikipedia.org/wiki/Callback_(computer_programming)).
      Closes https://github.com/facebook/rocksdb/pull/3693
      
      Differential Revision: D7560396
      
      Pulled By: riversand963
      
      fbshipit-source-id: ba8c251c487f4e7d1872a1a8dc680f9e35a6ffb8
      d95014b9
    • Z
      make MockTimeEnv::current_time_ atomic to fix data race · 2770a94c
      Zhongyi Xie 提交于
      Summary:
      fix a new TSAN failure
      https://gist.github.com/miasantreble/7599c33f4e17da1024c67d4540dbe397
      Closes https://github.com/facebook/rocksdb/pull/3694
      
      Differential Revision: D7565310
      
      Pulled By: miasantreble
      
      fbshipit-source-id: f672c96e925797b34dec6e20b59527e8eebaa825
      2770a94c
  11. 10 4月, 2018 5 次提交
  12. 08 4月, 2018 2 次提交
    • M
      WritePrepared Txn: add stats · bde1c1a7
      Maysam Yabandeh 提交于
      Summary:
      Adding some stats that would be helpful to monitor if the DB has gone to unlikely stats that would hurt the performance. These are mostly when we end up needing to acquire a mutex.
      Closes https://github.com/facebook/rocksdb/pull/3683
      
      Differential Revision: D7529393
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f7d36279a8f39bd84d8ddbf64b5c97f670c5d6d9
      bde1c1a7
    • M
      WritePrepared Txn: add write_committed option to dump_wal · eb5a2954
      Maysam Yabandeh 提交于
      Summary:
      Currently dump_wal cannot print the prepared records from the WAL that is generated by WRITE_PREPARED write policy since the default reaction of the handler is to return NotSupported if markers of WRITE_PREPARED are encountered. This patch enables the admin to pass --write_committed=false option, which will be accordingly passed to the handler. Note that DBFileDumperCommand and DBDumperCommand are still not updated by this patch but firstly they are not urgent and secondly we need to revise this approach later when we also add WRITE_UNPREPARED markers so I leave it for future work.
      
      Tested by running it on a WAL generated by WRITE_PREPARED:
      $ ./ldb dump_wal --walfile=/dev/shm/dbbench/000003.log  | grep BEGIN_PREARE | head -1
      1,2,70,0,BEGIN_PREARE
      $ ./ldb dump_wal --walfile=/dev/shm/dbbench/000003.log --write_committed=false | grep BEGIN_PREARE | head -1
      1,2,70,0,BEGIN_PREARE PUT(0) : 0x30303031313330313938 PUT(0) : 0x30303032353732313935 END_PREPARE(0x74786E31313535383434323738303738363938313335312D30)
      Closes https://github.com/facebook/rocksdb/pull/3682
      
      Differential Revision: D7522090
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: a0332207261c61e18b2f9dfbe9feecd9a1339aca
      eb5a2954
  13. 07 4月, 2018 2 次提交
  14. 06 4月, 2018 1 次提交
    • A
      protect valid backup files when max_valid_backups_to_open is set · faba3fb5
      Andrew Kryczka 提交于
      Summary:
      When `max_valid_backups_to_open` is set, the `BackupEngine` doesn't know about the files referenced by existing backups. This PR prevents us from deleting valid files when that option is set, in cases where we are unable to accurately determine refcount. There are warnings logged when we may miss deleting unreferenced files, and a recommendation in the header for users to periodically unset this option and run a full `GarbageCollect`.
      Closes https://github.com/facebook/rocksdb/pull/3518
      
      Differential Revision: D7008331
      
      Pulled By: ajkr
      
      fbshipit-source-id: 87907f964dc9716e229d08636a895d2fc7b72305
      faba3fb5