1. 07 8月, 2018 5 次提交
    • Y
      BlobDB: Can return expiration together with Get() (#4227) · c9703585
      Yi Wu 提交于
      Summary:
      Add API to allow fetching expiration of a key with `Get()`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4227
      
      Differential Revision: D9169897
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 2a6f216c493dc75731ddcef1daa689b517fab31b
      c9703585
    • Y
      BlobDB: Fix VisibleToActiveSnapshot() (#4236) · 4cb7068c
      Yi Wu 提交于
      Summary:
      There are two issues with `VisibleToActiveSnapshot`:
      1. If there are no snapshots, `oldest_snapshot` will be 0 and `VisibleToActiveSnapshot` will always return true. Since the method is used to decide whether it is safe to delete obsolete files, obsolete file won't be able to delete in this case.
      2. The `auto` keyword of `auto snapshots = db_impl_->snapshots()` translate to a copy of `const SnapshotList` instead of a reference. Since copy constructor of `SnapshotList` is not defined, using the copy may yield unexpected result.
      
      Issue 2 actually hide issue 1 from being catch by tests. During test `snapshots.empty()` can return false while it should actually be empty, and `snapshots.oldest()` return an invalid address, making `oldest_snapshot` being some random large number.
      
      The issue was originally reported by BlobDB early adopter at Kuaishou.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4236
      
      Differential Revision: D9188706
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: a0f2624b927cf9bf28c1bb534784fee5d106f5ea
      4cb7068c
    • A
      Support dictionary compression in stress/crash tests (#4234) · 6175b4b2
      Andrew Kryczka 提交于
      Summary:
      - Add `--compression_max_dict_bytes` and `--compression_zstd_max_train_bytes` flags to stress test
      - Randomly enable/disable the above flags in crash test
      - Set `--compression_type=zstd` in FB-specific crash test runs
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4234
      
      Differential Revision: D9187207
      
      Pulled By: ajkr
      
      fbshipit-source-id: 8d78cf8d8e1165f2cd1c32e069b73726b5bc1fd2
      6175b4b2
    • Y
      BlobDB: Cleanup TTLExtractor interface (#4229) · 140f256d
      Yi Wu 提交于
      Summary:
      Cleanup TTLExtractor interface. The original purpose of it is to allow our users keep using existing `Write()` interface but allow it to accept TTL via `TTLExtractor`. However the interface is confusing. Will replace it with something like `WriteWithTTL(batch, ttl)` in the future.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4229
      
      Differential Revision: D9174390
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 68201703d784408b851336ab4dd9b84188245b2d
      140f256d
    • J
      Improve FullFilterBitsReader::HashMayMatch's doc (#4202) · ceb5fea1
      Jingguo Yao 提交于
      Summary:
      HashMayMatch is related to AddKey() instead of CreateFilter().
      Also applies some minor Fixes #4191 #4200 #3910
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4202
      
      Differential Revision: D9180945
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 6f07b81c5bb9bda5c0273475b486ba8a030471e6
      ceb5fea1
  2. 05 8月, 2018 1 次提交
  3. 04 8月, 2018 3 次提交
    • Y
      Update JobContext. (#3949) · 1f802773
      Yanqin Jin 提交于
      Summary:
      In the past, we assume that a job modifies a single column family. Therefore, a job can create at most one superversion since each superversion corresponds to one column family. This assumption leads to the fact that a `JobContext` has only one member variable called `superversion_context`.
      Now we want to support group flush of column families, indicating that each job can create multiple superversions. Therefore, we need to make the following change to accommodate this new feature.
      
      Add a vector of `SuperVersionContext` to `JobContext` to support installing
      superversions for multiple column families in one job context.
      
      This PR is a subset of [PR 3752](https://github.com/facebook/rocksdb/pull/3752).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3949
      
      Differential Revision: D8864895
      
      Pulled By: riversand963
      
      fbshipit-source-id: 5937a48817276370d3c8172db9c8aafc826d97ca
      1f802773
    • Y
      Modify verification logic of ObsoleteOptionsFileTest (#4218) · 22368965
      Yanqin Jin 提交于
      Summary:
      The current verification logic does not consider the case in which multiple
      threads (foreground and background) may execute `PurgeObsoleteFiles` function
      simultaneously. Each invocation will trigger the callback adding elements to
      a vector. Then we verify the elements in the vector, which can fail sometimes.
      
      The solution is to give up checking the elements. Instead, we check the number
      of OPTIONS file in the database dir.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4218
      
      Differential Revision: D9128727
      
      Pulled By: riversand963
      
      fbshipit-source-id: 2b13b705fb21bc0ddd41940c4ec9b6b0c8d88224
      22368965
    • S
      Fix lite build failure in db_bench due to trace/replay (#4225) · fefdac10
      Sagar Vemuri 提交于
      Summary:
      Fix lite build failure in db_bench due to trace/replay feature.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4225
      
      Differential Revision: D9153303
      
      Pulled By: sagar0
      
      fbshipit-source-id: 9f7a8035429d0dcdbe99616d11389ed7bccf44be
      fefdac10
  4. 03 8月, 2018 2 次提交
  5. 02 8月, 2018 2 次提交
    • P
      Advisor: README and blog, and also tests for DBBenchRunner, DatabaseOptions (#4201) · 892a1562
      Pooja Malik 提交于
      Summary:
      This pull request adds a README file and a blog post for the Advisor tool. It also adds the missing tests for some Optimizer modules. Some comments are added to the classes being tested for improved readability.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4201
      
      Reviewed By: maysamyabandeh
      
      Differential Revision: D9125311
      
      Pulled By: poojam23
      
      fbshipit-source-id: aefcf2f06eaa05490cc2834ef5aa6e21f0d1dc55
      892a1562
    • A
      Skip range deletions at seqno zero when collapsing (#4216) · f8f6983f
      Andrew Kryczka 提交于
      Summary:
      `CollapsedRangeDelMap` internally uses seqno zero as a sentinel value to
      denote a gap between range tombstones or the end of range tombstones. It
      therefore expects to never have consecutive sentinel tombstones.
      
      However, since `DeleteRange` is now supported in `SstFileWriter`, an
      ingested file may contain range tombstones, and that ingested file may
      be assigned global seqno zero. When such tombstones are added to the
      collapsed map, they resemble sentinel tombstones due to having seqno
      zero. Then, the invariant mentioned above about never having consecutive
      sentinel tombstones can be violated.
      
      The symptom of this violation was dereferencing the `end()` iterator
      (#4204). The fix in this PR is to not add range tombstones with seqno
      zero to the collapsed map. They're not needed anyways since they can't
      possibly cover anything (in case of a key and a range tombstone with the
      same seqno, the key is visible).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4216
      
      Differential Revision: D9121716
      
      Pulled By: ajkr
      
      fbshipit-source-id: f5b78a70bea9527354603ea7ac8542a7e2b6a210
      f8f6983f
  6. 01 8月, 2018 2 次提交
    • S
      Trace and Replay for RocksDB (#3837) · 12b6cdee
      Sagar Vemuri 提交于
      Summary:
      A framework for tracing and replaying RocksDB operations.
      
      A binary trace file is created by capturing the DB operations, and it can be replayed back at the same rate using db_bench.
      
      - Column-families are supported
      - Multi-threaded tracing is supported.
      - TraceReader and TraceWriter are exposed to the user, so that tracing to various destinations can be enabled (say, to other messaging/logging services). By default, a FileTraceReader and FileTraceWriter are implemented to capture to a file and replay from it.
      - This is not yet ideal to be enabled in production due to large performance overhead, but it can be safely tried out in a shadow setup, say, for analyzing RocksDB operations.
      
      Currently supported DB operations:
      - Writes:
      -- Put
      -- Merge
      -- Delete
      -- SingleDelete
      -- DeleteRange
      -- Write
      - Reads:
      -- Get (point lookups)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3837
      
      Differential Revision: D7974837
      
      Pulled By: sagar0
      
      fbshipit-source-id: 8ec65aaf336504bc1f6ed0feae67f6ed5ef97a72
      12b6cdee
    • F
      DataBlockHashIndex: Specify that DataBlockHashIndex is not yet implemented in the comment · ee761716
      Fenggang Wu 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4203
      
      Differential Revision: D9090912
      
      Pulled By: fgwu
      
      fbshipit-source-id: 6a68be83693ddf2a5c060290382141f0d2fb400b
      ee761716
  7. 31 7月, 2018 2 次提交
    • A
      Avoid integer division in filter probing (#4071) · a1a546a6
      Andrew Kryczka 提交于
      Summary:
      The cache line size was computed dynamically based on the length of the filter bits, and the number of cache-lines encoded in the footer. This calculation had to be dynamic in case users migrate their data between platforms with different cache line sizes. The downside, though, was bloom filter probing became expensive as it did integer mod and division.
      
      However, since we know all possible cache line sizes are powers of two, we should be able to use bit shift to find the cache line, and bitwise-and to find the bit within the cache line. To do this, we compute the log-base-two of cache line size in the constructor, and use that in bitwise operations to replace division/mod.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4071
      
      Differential Revision: D8684067
      
      Pulled By: ajkr
      
      fbshipit-source-id: 50298872fba5acd01e8269cd7abcc51a095e0f61
      a1a546a6
    • Y
      Generalize parameters generation. (#4046) · 8abafb1f
      Yanqin Jin 提交于
      Summary:
      Making generation of column families and keys virtual function so that
      subclasses of StressTest can override them to provide custom parameter
      generation for more flexibility. This will be useful for future tests.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4046
      
      Differential Revision: D9073382
      
      Pulled By: riversand963
      
      fbshipit-source-id: 2754f0fdfa5c24d95c1f92d4944bc479552fb665
      8abafb1f
  8. 28 7月, 2018 5 次提交
  9. 27 7月, 2018 3 次提交
  10. 26 7月, 2018 1 次提交
  11. 25 7月, 2018 2 次提交
    • Y
      Increase version number to 5.16 (#4176) · 18f53803
      Yanqin Jin 提交于
      Summary:
      Given that we have cut 5.15, we should bump the version number to the next
      version, i.e. 5.16.
      Also update HISTORY.md
      cc sagar0
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4176
      
      Differential Revision: D8977965
      
      Pulled By: riversand963
      
      fbshipit-source-id: 481d75d2f446946f0eb2afb7e94ef894c8c87e1e
      18f53803
    • F
      DataBlockHashIndex: Standalone Implementation with Unit Test (#4139) · 8805ec2f
      Fenggang Wu 提交于
      Summary:
      The first step of the `DataBlockHashIndex` implementation. A string based hash table is implemented and unit-tested.
      
      `DataBlockHashIndexBuilder`: `Add()` takes pairs of `<key, restart_index>`, and formats it into a string when `Finish()` is called.
      `DataBlockHashIndex`: initialized by the formatted string, and can interpret it as a hash table. Lookup for a key is supported by iterator operation.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4139
      
      Reviewed By: sagar0
      
      Differential Revision: D8866764
      
      Pulled By: fgwu
      
      fbshipit-source-id: 7f015f0098632c65979a22898a50424384730b10
      8805ec2f
  12. 24 7月, 2018 4 次提交
    • M
      WriteUnPrepared: Implement unprepared batches for transactions (#4104) · ea212e53
      Manuel Ung 提交于
      Summary:
      This adds support for writing unprepared batches based on size defined in `TransactionOptions::max_write_batch_size`. This is done by overriding methods that modify data (Put/Delete/SingleDelete/Merge) and checking first if write batch size has exceeded threshold. If so, the write batch is written to DB as an unprepared batch.
      
      Support for Commit/Rollback for unprepared batch is added as well. This has been done by simply extending the WritePrepared Commit/Rollback logic to take care of all unprep_seq numbers either when updating prepare heap, or adding to commit map. For updating the commit map, this logic exists inside `WriteUnpreparedCommitEntryPreReleaseCallback`.
      
      A test change was also made to have transactions unregister themselves when committing without prepare. This is because with write unprepared, there may be unprepared entries (which act similarly to prepared entries) already when a commit is done without prepare.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4104
      
      Differential Revision: D8785717
      
      Pulled By: lth
      
      fbshipit-source-id: c02006e281ec1ce00f628e2a7beec0ee73096a91
      ea212e53
    • C
      move static msgs out of Status class (#4144) · 374c37da
      Chang Su 提交于
      Summary:
      The member msgs of class Status contains all types of status messages.
      When users dump a Status object, msgs will confuse users. So move it out
      of class Status by making it as file-local static variable.
      
      Closes #3831 .
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4144
      
      Differential Revision: D8941419
      
      Pulled By: sagar0
      
      fbshipit-source-id: 56b0510258465ff26db15aa6b04e01532e053e3d
      374c37da
    • A
      Build improvements: Split docker targets and parallelize java builds · c6d2a7f8
      Adam Retter 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4165
      
      Differential Revision: D8955531
      
      Pulled By: sagar0
      
      fbshipit-source-id: 97d5a1375e200bde3c6414f94703504a4ed7536a
      c6d2a7f8
    • S
      db_stress to cover upper bound in iterators (#4162) · 4b0a4357
      Siying Dong 提交于
      Summary:
      db_stress doesn't cover upper or lower bound in iterators. Try to cover it by randomly assigning a random one. Also in prefix scan tests, with 50% of the chance, set next prefix as the upper bound.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4162
      
      Differential Revision: D8953507
      
      Pulled By: siying
      
      fbshipit-source-id: f0f04e9cb6c07cbebbb82b892ca23e0daeea708b
      4b0a4357
  13. 21 7月, 2018 5 次提交
    • Z
      Avoid unnecessary big for-loop when reporting ticker stats stored in GetContext (#3490) · f95a5b24
      Zhongyi Xie 提交于
      Summary:
      Currently in `Version::Get` when reporting ticker stats stored in `GetContext`, there is a big for-loop through all `Ticker` which adds unnecessary cost to overall CPU usage. We can optimize by storing only ticker values that are used in `Get()` calls in a new struct `GetContextStats` since only a small fraction of all tickers are used in `Get()` calls. For comparison, with the new approach we only need to visit 17 values while old approach will require visiting 100+ `Ticker`
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3490
      
      Differential Revision: D6969154
      
      Pulled By: miasantreble
      
      fbshipit-source-id: fc27072965a3a94125a3e6883d20dafcf5b84029
      f95a5b24
    • Z
      Fixed the db_bench MergeRandom only access CF_default (#4155) · 6811fb06
      Zhichao Cao 提交于
      Summary:
      When running the tracing and analyzing, I found that MergeRandom benchmark in db_bench only access the default column family even the -num_column_families is specified > 1.
      
      changes: Using the db_with_cfh as DB to randomly select the column family to execute the Merge operation if -num_column_families is specified > 1.
      
      Tested with make asan_check and verified in tracing
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4155
      
      Differential Revision: D8907888
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 2b4bc8fe0e99c8f262f5be6b986c7025d62cf850
      6811fb06
    • S
      Reformatting some recent changes (#4161) · a5e851e1
      Siying Dong 提交于
      Summary:
      Lint is not happy with some new code recently committed. Format them.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4161
      
      Differential Revision: D8940582
      
      Pulled By: siying
      
      fbshipit-source-id: c9b43b1ef8c88b5e923911058b44eb77234b36b7
      a5e851e1
    • S
      BlockBasedTableReader: automatically adjust tail prefetch size (#4156) · 8425c8bd
      Siying Dong 提交于
      Summary:
      Right now we use one hard-coded prefetch size to prefetch data from the tail of the SST files. However, this may introduce a waste for some use cases, while not efficient for others.
      Introduce a way to adjust this prefetch size by tracking 32 recent times, and pick a value with which the wasted read is less than 10%
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4156
      
      Differential Revision: D8916847
      
      Pulled By: siying
      
      fbshipit-source-id: 8413f9eb3987e0033ed0bd910f83fc2eeaaf5758
      8425c8bd
    • A
      Write properties metablock last in block-based tables (#4158) · ab35505e
      Andrew Kryczka 提交于
      Summary:
      The properties meta-block should come at the end since we always need to
      read it when opening a file, unlike index/filter/other meta-blocks, which
      are sometimes read depending on the user's configuration. This ordering
      will allow us to (in a future PR) do a small readahead on the end of the file
      to read properties and meta-index blocks with one I/O.
      
      The bulk of this PR is a refactoring of the `BlockBasedTableBuilder::Finish`
      function. It was previously too large with inconsistent error handling, which
      made it difficult to change. So I broke it up into one function per meta-block
      write, and tried to make error handling consistent within those functions.
      Then reordering the metablocks was trivial -- just reorder the calls to these
      helper functions.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4158
      
      Differential Revision: D8921705
      
      Pulled By: ajkr
      
      fbshipit-source-id: 96c9cc3182eb1adf11af46adab79dbeba7b12fcc
      ab35505e
  14. 20 7月, 2018 3 次提交
    • Y
      Fix a bug in MANIFEST group commit (#4157) · 2736752b
      Yanqin Jin 提交于
      Summary:
      PR #3944 introduces group commit of `VersionEdit` in MANIFEST. The
      implementation has a bug. When updating the log file number of each column
      family, we must consider only `VersionEdit`s that operate on the same column
      family. Otherwise, a column family may accidentally set its log file number
      higher than actual value, indicating that log files with smaller file number
      will be ignored, thus causing some updates to be lost.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4157
      
      Differential Revision: D8916650
      
      Pulled By: riversand963
      
      fbshipit-source-id: 8f456cf688f17bf35ad87b38e30e899aa162f201
      2736752b
    • A
      Smaller tail readahead when not reading index/filters (#4159) · b5613227
      Andrew Kryczka 提交于
      Summary:
      In all cases during `BlockBasedTable::Open`, we issue at least three read requests to the file's tail: (1) footer, (2) metaindex block, and (3) properties block. Depending on the config, we may also read other metablocks like filter and index.
      
      This PR issues smaller readahead when we expect to do only the three necessary reads mentioned above. Then, 4KB should be enough (ignoring the case where there are lots of user-defined properties). We can keep doing 512KB readahead when additional reads are expected.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4159
      
      Differential Revision: D8924002
      
      Pulled By: ajkr
      
      fbshipit-source-id: cfc713275de4d05ce11f18571f1d72e27ccd3356
      b5613227
    • D
      Return new operator for Status allocations for Windows (#4128) · 78ab11cd
      Dmitri Smirnov 提交于
      Summary: Windows requires new/delete for memory allocations to be overriden. Refactor to be less intrusive.
      
      Differential Revision: D8878047
      
      Pulled By: siying
      
      fbshipit-source-id: 35f2b5fec2f88ea48c9be926539c6469060aab36
      78ab11cd