1. 24 7月, 2018 3 次提交
  2. 21 7月, 2018 5 次提交
    • Z
      Avoid unnecessary big for-loop when reporting ticker stats stored in GetContext (#3490) · f95a5b24
      Zhongyi Xie 提交于
      Summary:
      Currently in `Version::Get` when reporting ticker stats stored in `GetContext`, there is a big for-loop through all `Ticker` which adds unnecessary cost to overall CPU usage. We can optimize by storing only ticker values that are used in `Get()` calls in a new struct `GetContextStats` since only a small fraction of all tickers are used in `Get()` calls. For comparison, with the new approach we only need to visit 17 values while old approach will require visiting 100+ `Ticker`
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3490
      
      Differential Revision: D6969154
      
      Pulled By: miasantreble
      
      fbshipit-source-id: fc27072965a3a94125a3e6883d20dafcf5b84029
      f95a5b24
    • Z
      Fixed the db_bench MergeRandom only access CF_default (#4155) · 6811fb06
      Zhichao Cao 提交于
      Summary:
      When running the tracing and analyzing, I found that MergeRandom benchmark in db_bench only access the default column family even the -num_column_families is specified > 1.
      
      changes: Using the db_with_cfh as DB to randomly select the column family to execute the Merge operation if -num_column_families is specified > 1.
      
      Tested with make asan_check and verified in tracing
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4155
      
      Differential Revision: D8907888
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 2b4bc8fe0e99c8f262f5be6b986c7025d62cf850
      6811fb06
    • S
      Reformatting some recent changes (#4161) · a5e851e1
      Siying Dong 提交于
      Summary:
      Lint is not happy with some new code recently committed. Format them.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4161
      
      Differential Revision: D8940582
      
      Pulled By: siying
      
      fbshipit-source-id: c9b43b1ef8c88b5e923911058b44eb77234b36b7
      a5e851e1
    • S
      BlockBasedTableReader: automatically adjust tail prefetch size (#4156) · 8425c8bd
      Siying Dong 提交于
      Summary:
      Right now we use one hard-coded prefetch size to prefetch data from the tail of the SST files. However, this may introduce a waste for some use cases, while not efficient for others.
      Introduce a way to adjust this prefetch size by tracking 32 recent times, and pick a value with which the wasted read is less than 10%
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4156
      
      Differential Revision: D8916847
      
      Pulled By: siying
      
      fbshipit-source-id: 8413f9eb3987e0033ed0bd910f83fc2eeaaf5758
      8425c8bd
    • A
      Write properties metablock last in block-based tables (#4158) · ab35505e
      Andrew Kryczka 提交于
      Summary:
      The properties meta-block should come at the end since we always need to
      read it when opening a file, unlike index/filter/other meta-blocks, which
      are sometimes read depending on the user's configuration. This ordering
      will allow us to (in a future PR) do a small readahead on the end of the file
      to read properties and meta-index blocks with one I/O.
      
      The bulk of this PR is a refactoring of the `BlockBasedTableBuilder::Finish`
      function. It was previously too large with inconsistent error handling, which
      made it difficult to change. So I broke it up into one function per meta-block
      write, and tried to make error handling consistent within those functions.
      Then reordering the metablocks was trivial -- just reorder the calls to these
      helper functions.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4158
      
      Differential Revision: D8921705
      
      Pulled By: ajkr
      
      fbshipit-source-id: 96c9cc3182eb1adf11af46adab79dbeba7b12fcc
      ab35505e
  3. 20 7月, 2018 3 次提交
    • Y
      Fix a bug in MANIFEST group commit (#4157) · 2736752b
      Yanqin Jin 提交于
      Summary:
      PR #3944 introduces group commit of `VersionEdit` in MANIFEST. The
      implementation has a bug. When updating the log file number of each column
      family, we must consider only `VersionEdit`s that operate on the same column
      family. Otherwise, a column family may accidentally set its log file number
      higher than actual value, indicating that log files with smaller file number
      will be ignored, thus causing some updates to be lost.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4157
      
      Differential Revision: D8916650
      
      Pulled By: riversand963
      
      fbshipit-source-id: 8f456cf688f17bf35ad87b38e30e899aa162f201
      2736752b
    • A
      Smaller tail readahead when not reading index/filters (#4159) · b5613227
      Andrew Kryczka 提交于
      Summary:
      In all cases during `BlockBasedTable::Open`, we issue at least three read requests to the file's tail: (1) footer, (2) metaindex block, and (3) properties block. Depending on the config, we may also read other metablocks like filter and index.
      
      This PR issues smaller readahead when we expect to do only the three necessary reads mentioned above. Then, 4KB should be enough (ignoring the case where there are lots of user-defined properties). We can keep doing 512KB readahead when additional reads are expected.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4159
      
      Differential Revision: D8924002
      
      Pulled By: ajkr
      
      fbshipit-source-id: cfc713275de4d05ce11f18571f1d72e27ccd3356
      b5613227
    • D
      Return new operator for Status allocations for Windows (#4128) · 78ab11cd
      Dmitri Smirnov 提交于
      Summary: Windows requires new/delete for memory allocations to be overriden. Refactor to be less intrusive.
      
      Differential Revision: D8878047
      
      Pulled By: siying
      
      fbshipit-source-id: 35f2b5fec2f88ea48c9be926539c6469060aab36
      78ab11cd
  4. 19 7月, 2018 4 次提交
  5. 18 7月, 2018 5 次提交
    • Y
      Release 5.15. (#4148) · 79f009f2
      Yanqin Jin 提交于
      Summary:
      Cut 5.15.fb
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4148
      
      Differential Revision: D8886802
      
      Pulled By: riversand963
      
      fbshipit-source-id: 6b6427ce97f5b323a7eebf92458fda8b24b0cece
      79f009f2
    • S
      DBSSTTest.DeleteSchedulerMultipleDBPaths data race (#4146) · 37e0fdc8
      Siying Dong 提交于
      Summary:
      Fix a minor data race in DBSSTTest.DeleteSchedulerMultipleDBPaths reported by TSAN
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4146
      
      Differential Revision: D8880945
      
      Pulled By: siying
      
      fbshipit-source-id: 25c632f685757735c59ad4ff26b2f346a443a446
      37e0fdc8
    • Y
      Fix write get stuck when pipelined write is enabled (#4143) · d538ebdf
      Yi Wu 提交于
      Summary:
      Fix the issue when pipelined write is enabled, writers can get stuck indefinitely and not able to finish the write. It can show with the following example: Assume there are 4 writers W1, W2, W3, W4 (W1 is the first, W4 is the last).
      
      T1: all writers pending in WAL writer queue:
      WAL writer queue: W1, W2, W3, W4
      memtable writer queue: empty
      
      T2. W1 finish WAL writer and move to memtable writer queue:
      WAL writer queue: W2, W3, W4,
      memtable writer queue: W1
      
      T3. W2 and W3 finish WAL write as a batch group. W2 enter ExitAsBatchGroupLeader and move the group to memtable writer queue, but before wake up next leader.
      WAL writer queue: W4
      memtable writer queue: W1, W2, W3
      
      T4. W1, W2, W3 finish memtable write as a batch group. Note that W2 still in the previous ExitAsBatchGroupLeader, although W1 have done memtable write for W2.
      WAL writer queue: W4
      memtable writer queue: empty
      
      T5. The thread corresponding to W3 create another writer W3' with the same address as W3.
      WAL writer queue: W4, W3'
      memtable writer queue: empty
      
      T6. W2 continue with ExitAsBatchGroupLeader. Because the address of W3' is the same as W3, the last writer in its group, it thinks there are no pending writers, so it reset newest_writer_ to null, emptying the queue. W4 and W3' are deleted from the queue and will never be wake up.
      
      The issue exists since pipelined write was introduced in 5.5.0.
      
      Closes #3704
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4143
      
      Differential Revision: D8871599
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 3502674e51066a954a0660257e24ac588f815e2a
      d538ebdf
    • S
      Remove managed iterator · ddc07b40
      Siying Dong 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4124
      
      Differential Revision: D8829910
      
      Pulled By: siying
      
      fbshipit-source-id: f3e952ccf3a631071a5d77c48e327046f8abb560
      ddc07b40
    • S
      Pending output file number should be released after bulkload failure (#4145) · 995fcf75
      Siying Dong 提交于
      Summary:
      If bulkload fails for an input error, the pending output file number wasn't released. This bug can cause all future files with larger number than the current number won't be deleted, even they are compacted. This commit fixes the bug.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4145
      
      Differential Revision: D8877900
      
      Pulled By: siying
      
      fbshipit-source-id: 080be92a23d43305ca1e13fe1c06eb4cd0b01466
      995fcf75
  6. 17 7月, 2018 5 次提交
  7. 14 7月, 2018 10 次提交
    • N
      Support range deletion tombstones in IngestExternalFile SSTs (#3778) · ef7815b8
      Nathan VanBenschoten 提交于
      Summary:
      Fixes #3391.
      
      This change adds a `DeleteRange` method to `SstFileWriter` and adds
      support for ingesting SSTs with range deletion tombstones. This is
      important for applications that need to atomically ingest SSTs while
      clearing out any existing keys in a given key range.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3778
      
      Differential Revision: D8821836
      
      Pulled By: anand1976
      
      fbshipit-source-id: ca7786c1947ff129afa703dab011d524c7883844
      ef7815b8
    • Z
      Exclude time waiting for rate limiter from rocksdb.sst.read.micros (#4102) · 91d7c03c
      Zhongyi Xie 提交于
      Summary:
      Our "rocksdb.sst.read.micros" stat includes time spent waiting for rate limiter. It probably only affects people rate limiting compaction reads, which is fairly rare.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4102
      
      Differential Revision: D8848506
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 01258ac5ae56e4eee372978cfc9143a6869f8bfc
      91d7c03c
    • P
      Relax VersionStorageInfo::GetOverlappingInputs check (#4050) · 90fc4069
      Peter Mattis 提交于
      Summary:
      Do not consider the range tombstone sentinel key as causing 2 adjacent
      sstables in a level to overlap. When a range tombstone's end key is the
      largest key in an sstable, the sstable's end key is so to a "sentinel"
      value that is the smallest key in the next sstable with a sequence
      number of kMaxSequenceNumber. This "sentinel" is guaranteed to not
      overlap in internal-key space with the next sstable. Unfortunately,
      GetOverlappingFiles uses user-keys to determine overlap and was thus
      considering 2 adjacent sstables in a level to overlap if they were
      separated by this sentinel key. This in turn would cause compactions to
      be larger than necessary.
      
      Note that this conflicts with
      https://github.com/facebook/rocksdb/pull/2769 and cases
      `DBRangeDelTest.CompactionTreatsSplitInputLevelDeletionAtomically` to
      fail.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4050
      
      Differential Revision: D8844423
      
      Pulled By: ajkr
      
      fbshipit-source-id: df3f9f1db8f4cff2bff77376b98b83c2ae1d155b
      90fc4069
    • Y
      Reduce execution time of IngestFileWithGlobalSeqnoRandomized (#4131) · 21171615
      Yanqin Jin 提交于
      Summary:
      Make `ExternalSSTFileTest.IngestFileWithGlobalSeqnoRandomized` run faster.
      
      `make format`
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4131
      
      Differential Revision: D8839952
      
      Pulled By: riversand963
      
      fbshipit-source-id: 4a7e842fde1cde4dc902e928a1cf511322578521
      21171615
    • M
      Per-thread unique test db names (#4135) · 8581a93a
      Maysam Yabandeh 提交于
      Summary:
      The patch makes sure that two parallel test threads will operate on different db paths. This enables using open source tools such as gtest-parallel to run the tests of a file in parallel.
      Example: ``` ~/gtest-parallel/gtest-parallel ./table_test```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4135
      
      Differential Revision: D8846653
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 799bad1abb260e3d346bcb680d2ae207a852ba84
      8581a93a
    • Z
      db_bench: enable setting cache_size when loading options file · 23b76252
      Zhongyi Xie 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4118
      
      Differential Revision: D8845554
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 13bd3c1259a7c30bad762a413fe3bb24eea650ba
      23b76252
    • F
      Converted db/merge_test.cc to use gtest (#4114) · 8527012b
      Fosco Marotto 提交于
      Summary:
      Picked up a task to convert this to use the gtest framework.  It can't be this simple, can it?
      
      It works, but should all the std::cout be removed?
      
      ```
      [$] ~/git/rocksdb [gft !]: ./merge_test
      [==========] Running 2 tests from 1 test case.
      [----------] Global test environment set-up.
      [----------] 2 tests from MergeTest
      [ RUN      ] MergeTest.MergeDbTest
      Test read-modify-write counters...
      a: 3
      1
      2
      a: 3
      b: 1225
      3
      Compaction started ...
      Compaction ended
      a: 3
      b: 1225
      Test merge-based counters...
      a: 3
      1
      2
      a: 3
      b: 1225
      3
      Test merge in memtable...
      a: 3
      1
      2
      a: 3
      b: 1225
      3
      Test Partial-Merge
      Test merge-operator not set after reopen
      [       OK ] MergeTest.MergeDbTest (93 ms)
      [ RUN      ] MergeTest.MergeDbTtlTest
      Opening database with TTL
      Test read-modify-write counters...
      a: 3
      1
      2
      a: 3
      b: 1225
      3
      Compaction started ...
      Compaction ended
      a: 3
      b: 1225
      Test merge-based counters...
      a: 3
      1
      2
      a: 3
      b: 1225
      3
      Test merge in memtable...
      Opening database with TTL
      a: 3
      1
      2
      a: 3
      b: 1225
      3
      Test Partial-Merge
      Opening database with TTL
      Opening database with TTL
      Opening database with TTL
      Opening database with TTL
      Test merge-operator not set after reopen
      [       OK ] MergeTest.MergeDbTtlTest (97 ms)
      [----------] 2 tests from MergeTest (190 ms total)
      
      [----------] Global test environment tear-down
      [==========] 2 tests from 1 test case ran. (190 ms total)
      [  PASSED  ] 2 tests.
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4114
      
      Differential Revision: D8822886
      
      Pulled By: gfosco
      
      fbshipit-source-id: c299d008e883c3bb911d2b357a2e9e4423f8e91a
      8527012b
    • M
      Exclude StackableDB from transaction stress tests (#4132) · 537a2339
      Maysam Yabandeh 提交于
      Summary:
      The transactions are currently tested with and without using StackableDB. This is mostly to check that the code path is consistent with stackable db as well. Slow, stress tests however do not benefit from being run again with StackableDB. The patch excludes StackableDB from such tests.
      On a single core it reduced the runtime of transaction_test from 199s to 135s.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4132
      
      Differential Revision: D8841655
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 7b9aaba2673b542b195439dfb306cef26bd63b19
      537a2339
    • A
      Re-enable kUniversalSubcompactions option_config (#4125) · e3eba52a
      Anand Ananthabhotla 提交于
      Summary:
      1. Move kUniversalSubcompactions up before kEnd in db_test_util.h, so
      tests that cycle through all the option_configs include this
      2. Skip kUniversalSubcompactions wherever kUniversalCompaction and
      kUniversalCompactionMultilevel are skipped
      
      Related to #3935
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4125
      
      Differential Revision: D8828637
      
      Pulled By: anand1976
      
      fbshipit-source-id: 650dee15fd27d85281cf9bb4ca8ab460e04cac6f
      e3eba52a
    • T
      Add GCC 8 to Travis (#3433) · 7bee48bd
      Tamir Duberstein 提交于
      Summary:
      - Avoid `strdup` to use jemalloc on Windows
      - Use `size_t` for consistency
      - Add GCC 8 to Travis
      - Add CMAKE_BUILD_TYPE=Release to Travis
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3433
      
      Differential Revision: D6837948
      
      Pulled By: sagar0
      
      fbshipit-source-id: b8543c3a4da9cd07ee9a33f9f4623188e233261f
      7bee48bd
  8. 13 7月, 2018 5 次提交
    • Z
      Support compaction filter in db_bench (#4106) · de98fd88
      Zhongyi Xie 提交于
      Summary:
      Right now there is no support for enabling compaction filter in db_bench, we should add support for that to facilitate testing of compaction filter.
      This PR adds a compaction filter called KeepFilter and make `Filter` always returns false, essentially a noop compaction filter. This will allow us to test compaction filter code path without having to support arbitrary compaction filters
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4106
      
      Differential Revision: D8828517
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 9ad76d04103eaa9d00da98334b4a39e542d26c41
      de98fd88
    • A
      Fix unsigned int flag in db_bench (#4129) · 97fe23fc
      Andrew Kryczka 提交于
      Summary:
      `DEFINE_uint32` was unavailable on some platforms, e.g., https://travis-ci.org/facebook/rocksdb/jobs/403352902. Use `DEFINE_uint64` instead which should work as it's used many times elsewhere in this file.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4129
      
      Differential Revision: D8830311
      
      Pulled By: ajkr
      
      fbshipit-source-id: b4fc90ba3f50e649c070ce8069c68e530d731f05
      97fe23fc
    • Y
      Disable EnvPosixTest.RunImmediately, add EnvPosixTest.RunEventually. (#4126) · 520bbb17
      Yanqin Jin 提交于
      Summary:
      The original `EnvPosixTest.RunImmediately` assumes that after scheduling
      a background thread, the thread is guaranteed to complete after 0.1 second.
      I do not know about any non-real-time OS/runtime providing this guarantee. Nor
      does C++11 standard say anything about this in the documentation of `std::thread`.
      In fact, we have observed this test failure multiple times on appveyor, and we
      haven't been able to reproduce the failure deterministically. Therefore,
      I disable this test for now until we know for sure how it used to fail.
      
      Instead, I add another test `EnvPosixTest.RunEventually` that checks that
      a thread will be scheduled eventually.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4126
      
      Differential Revision: D8827086
      
      Pulled By: riversand963
      
      fbshipit-source-id: abc5cb655f90d50b791493da5eeb3716885dfe93
      520bbb17
    • Y
      Reduce execution time of a test. (#4127) · 90ebf1a2
      Yanqin Jin 提交于
      Summary:
      Reduce the number of key ranges in `ExternalSSTFileTest.OverlappingRanges` so
      that the test completes in shorter time to avoid timeouts.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4127
      
      Differential Revision: D8827851
      
      Pulled By: riversand963
      
      fbshipit-source-id: a16387b0cc92a7c872b1c50f0cfbadc463afc9db
      90ebf1a2
    • M
      Refactor BlockIter (#4121) · d4ad32d7
      Maysam Yabandeh 提交于
      Summary:
      BlockIter is getting crowded including details that specific only to either index or data blocks. The patch moves down such details to DataBlockIter and IndexBlockIter, both inheriting from BlockIter.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4121
      
      Differential Revision: D8816832
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: d492e74155c11d8a0c1c85cd7ee33d24c7456197
      d4ad32d7