1. 24 7月, 2018 1 次提交
  2. 14 7月, 2018 1 次提交
    • M
      Exclude StackableDB from transaction stress tests (#4132) · 537a2339
      Maysam Yabandeh 提交于
      Summary:
      The transactions are currently tested with and without using StackableDB. This is mostly to check that the code path is consistent with stackable db as well. Slow, stress tests however do not benefit from being run again with StackableDB. The patch excludes StackableDB from such tests.
      On a single core it reduced the runtime of transaction_test from 199s to 135s.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4132
      
      Differential Revision: D8841655
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 7b9aaba2673b542b195439dfb306cef26bd63b19
      537a2339
  3. 29 6月, 2018 1 次提交
    • A
      Allow DB resume after background errors (#3997) · 52d4c9b7
      Anand Ananthabhotla 提交于
      Summary:
      Currently, if RocksDB encounters errors during a write operation (user requested or BG operations), it sets DBImpl::bg_error_ and fails subsequent writes. This PR allows the DB to be resumed for certain classes of errors. It consists of 3 parts -
      1. Introduce Status::Severity in rocksdb::Status to indicate whether a given error can be recovered from or not
      2. Refactor the error handling code so that setting bg_error_ and deciding on severity is in one place
      3. Provide an API for the user to clear the error and resume the DB instance
      
      This whole change is broken up into multiple PRs. Initially, we only allow clearing the error for Status::NoSpace() errors during background flush/compaction. Subsequent PRs will expand this to include more errors and foreground operations such as Put(), and implement a polling mechanism for out-of-space errors.
      Closes https://github.com/facebook/rocksdb/pull/3997
      
      Differential Revision: D8653831
      
      Pulled By: anand1976
      
      fbshipit-source-id: 6dc835c76122443a7668497c0226b4f072bc6afd
      52d4c9b7
  4. 26 6月, 2018 1 次提交
    • D
      Align StatisticsImpl / StatisticsData (#4036) · 346d1069
      Daniel Black 提交于
      Summary:
      Pinned the alignment of StatisticsData to the cacheline size rather than just extending its size (which could go over two cache lines)if unaligned in allocation.
      
      Avoid compile errors in the process as per individual commit messages.
      
      strengthen static_assert to CACHELINE rather than the highest common multiple.
      Closes https://github.com/facebook/rocksdb/pull/4036
      
      Differential Revision: D8582844
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 363c37029f28e6093e06c60b987bca9aa204bc71
      346d1069
  5. 23 6月, 2018 1 次提交
  6. 02 6月, 2018 1 次提交
  7. 17 5月, 2018 1 次提交
    • M
      Change and clarify the relationship between Valid(), status() and Seek*() for... · 8bf555f4
      Mike Kolupaev 提交于
      Change and clarify the relationship between Valid(), status() and Seek*() for all iterators. Also fix some bugs
      
      Summary:
      Before this PR, Iterator/InternalIterator may simultaneously have non-ok status() and Valid() = true. That state means that the last operation failed, but the iterator is nevertheless positioned on some unspecified record. Likely intended uses of that are:
       * If some sst files are corrupted, a normal iterator can be used to read the data from files that are not corrupted.
       * When using read_tier = kBlockCacheTier, read the data that's in block cache, skipping over the data that is not.
      
      However, this behavior wasn't documented well (and until recently the wiki on github had misleading incorrect information). In the code there's a lot of confusion about the relationship between status() and Valid(), and about whether Seek()/SeekToLast()/etc reset the status or not. There were a number of bugs caused by this confusion, both inside rocksdb and in the code that uses rocksdb (including ours).
      
      This PR changes the convention to:
       * If status() is not ok, Valid() always returns false.
       * Any seek operation resets status. (Before the PR, it depended on iterator type and on particular error.)
      
      This does sacrifice the two use cases listed above, but siying said it's ok.
      
      Overview of the changes:
       * A commit that adds missing status checks in MergingIterator. This fixes a bug that actually affects us, and we need it fixed. `DBIteratorTest.NonBlockingIterationBugRepro` explains the scenario.
       * Changes to lots of iterator types to make all of them conform to the new convention. Some bug fixes along the way. By far the biggest changes are in DBIter, which is a big messy piece of code; I tried to make it less big and messy but mostly failed.
       * A stress-test for DBIter, to gain some confidence that I didn't break it. It does a few million random operations on the iterator, while occasionally modifying the underlying data (like ForwardIterator does) and occasionally returning non-ok status from internal iterator.
      
      To find the iterator types that needed changes I searched for "public .*Iterator" in the code. Here's an overview of all 27 iterator types:
      
      Iterators that didn't need changes:
       * status() is always ok(), or Valid() is always false: MemTableIterator, ModelIter, TestIterator, KVIter (2 classes with this name anonymous namespaces), LoggingForwardVectorIterator, VectorIterator, MockTableIterator, EmptyIterator, EmptyInternalIterator.
       * Thin wrappers that always pass through Valid() and status(): ArenaWrappedDBIter, TtlIterator, InternalIteratorFromIterator.
      
      Iterators with changes (see inline comments for details):
       * DBIter - an overhaul:
          - It used to silently skip corrupted keys (`FindParseableKey()`), which seems dangerous. This PR makes it just stop immediately after encountering a corrupted key, just like it would for other kinds of corruption. Let me know if there was actually some deeper meaning in this behavior and I should put it back.
          - It had a few code paths silently discarding subiterator's status. The stress test caught a few.
          - The backwards iteration code path was expecting the internal iterator's set of keys to be immutable. It's probably always true in practice at the moment, since ForwardIterator doesn't support backwards iteration, but this PR fixes it anyway. See added DBIteratorTest.ReverseToForwardBug for an example.
          - Some parts of backwards iteration code path even did things like `assert(iter_->Valid())` after a seek, which is never a safe assumption.
          - It used to not reset status on seek for some types of errors.
          - Some simplifications and better comments.
          - Some things got more complicated from the added error handling. I'm open to ideas for how to make it nicer.
       * MergingIterator - check status after every operation on every subiterator, and in some places assert that valid subiterators have ok status.
       * ForwardIterator - changed to the new convention, also slightly simplified.
       * ForwardLevelIterator - fixed some bugs and simplified.
       * LevelIterator - simplified.
       * TwoLevelIterator - changed to the new convention. Also fixed a bug that would make SeekForPrev() sometimes silently ignore errors from first_level_iter_.
       * BlockBasedTableIterator - minor changes.
       * BlockIter - replaced `SetStatus()` with `Invalidate()` to make sure non-ok BlockIter is always invalid.
       * PlainTableIterator - some seeks used to not reset status.
       * CuckooTableIterator - tiny code cleanup.
       * ManagedIterator - fixed some bugs.
       * BaseDeltaIterator - changed to the new convention and fixed a bug.
       * BlobDBIterator - seeks used to not reset status.
       * KeyConvertingIterator - some small change.
      Closes https://github.com/facebook/rocksdb/pull/3810
      
      Differential Revision: D7888019
      
      Pulled By: al13n321
      
      fbshipit-source-id: 4aaf6d3421c545d16722a815b2fa2e7912bc851d
      8bf555f4
  8. 05 5月, 2018 1 次提交
  9. 13 4月, 2018 1 次提交
  10. 06 4月, 2018 1 次提交
  11. 29 3月, 2018 2 次提交
    • A
      Allow rocksdbjavastatic to also be built as debug build · 3cb59195
      Adam Retter 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3654
      
      Differential Revision: D7417948
      
      Pulled By: sagar0
      
      fbshipit-source-id: 9514df9328181e54a6384764444c0c7ce66e7f5f
      3cb59195
    • Y
      Fix race condition causing double deletion of ssts · 1f5def16
      Yanqin Jin 提交于
      Summary:
      Possible interleaved execution of background compaction thread calling `FindObsoleteFiles (no full scan) / PurgeObsoleteFiles` and user thread calling `FindObsoleteFiles (full scan) / PurgeObsoleteFiles` can lead to race condition on which RocksDB attempts to delete a file twice. The second attempt will fail and return `IO error`. This may occur to other files,  but this PR targets sst.
      Also add a unit test to verify that this PR fixes the issue.
      
      The newly added unit test `obsolete_files_test` has a test case for this scenario, implemented in `ObsoleteFilesTest#RaceForObsoleteFileDeletion`. `TestSyncPoint`s are used to coordinate the interleaving the `user_thread` and background compaction thread. They execute as follows
      ```
      timeline              user_thread                background_compaction thread
      t1   |                                          FindObsoleteFiles(full_scan=false)
      t2   |     FindObsoleteFiles(full_scan=true)
      t3   |                                          PurgeObsoleteFiles
      t4   |     PurgeObsoleteFiles
           V
      ```
      When `user_thread` invokes `FindObsoleteFiles` with full scan, it collects ALL files in RocksDB directory, including the ones that background compaction thread have collected in its job context. Then `user_thread` will see an IO error when trying to delete these files in `PurgeObsoleteFiles` because background compaction thread has already deleted the file in `PurgeObsoleteFiles`.
      To fix this, we make RocksDB remember which (SST) files have been found by threads after calling `FindObsoleteFiles` (see `DBImpl#files_grabbed_for_purge_`). Therefore, when another thread calls `FindObsoleteFiles` with full scan, it will not collect such files.
      
      ajkr could you take a look and comment? Thanks!
      Closes https://github.com/facebook/rocksdb/pull/3638
      
      Differential Revision: D7384372
      
      Pulled By: riversand963
      
      fbshipit-source-id: 01489516d60012e722ee65a80e1449e589ce26d3
      1f5def16
  12. 27 3月, 2018 1 次提交
  13. 20 3月, 2018 1 次提交
    • T
      Enable compilation on OpenBSD · ccb76136
      Tobias Tschinkowitz 提交于
      Summary:
      I modified the Makefile so that we can compile rocksdb on OpenBSD.
      The instructions for building have been added to INSTALL.md.
      The whole compilation process works fine like this on OpenBSD-current
      Closes https://github.com/facebook/rocksdb/pull/3617
      
      Differential Revision: D7323754
      
      Pulled By: siying
      
      fbshipit-source-id: 990037d1cc69138d22f85bd77ef4dc8c1ba9edea
      ccb76136
  14. 19 3月, 2018 1 次提交
    • Y
      Fix the command used to generate ctags · 1139422d
      Yanqin Jin 提交于
      Summary:
      In original $ROCKSDB_HOME/Makefile, the command used to generate ctags is
      ```
      ctags * -R
      ```
      However, this failed to generate tags for me.
      I did some search on the usage of ctags command and found that it should be
      ```
      ctags -R .
      ```
      or
      ```
      ctags -R *
      ```
      After the change, I can find the tags in vim using `:ts <identifier>`.
      Closes https://github.com/facebook/rocksdb/pull/3626
      
      Reviewed By: ajkr
      
      Differential Revision: D7320217
      
      Pulled By: riversand963
      
      fbshipit-source-id: e4cd8f8a67842370a2343f0213df3cbd07754111
      1139422d
  15. 14 3月, 2018 1 次提交
  16. 06 3月, 2018 1 次提交
  17. 14 2月, 2018 1 次提交
  18. 08 2月, 2018 2 次提交
  19. 01 2月, 2018 1 次提交
  20. 13 1月, 2018 2 次提交
  21. 12 1月, 2018 1 次提交
  22. 06 1月, 2018 2 次提交
  23. 04 1月, 2018 1 次提交
  24. 20 12月, 2017 1 次提交
    • Y
      Port 3 way SSE4.2 crc32c implementation from Folly · f54d7f5f
      yingsu00 提交于
      Summary:
      **# Summary**
      
      RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain.
      
      This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used.
      
      **# Performance Test Results**
      We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm.
      
      Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s.
      
      1) ReadRandom in db_bench overall metrics
      
          PER RUN
          Algorithm | run | micros/op | ops/sec |Throughput (MB/s)
          3-way      |  1   | 4.143   | 241387 | 26.7
          3-way      |  2   | 3.775   | 264872 | 29.3
          3-way      | 3    | 4.116   | 242929 | 26.9
          FastCrc32c|1  | 4.037   | 247727 | 27.4
          FastCrc32c|2  | 4.648   | 215166 | 23.8
          FastCrc32c|3  | 4.352   | 229799 | 25.4
      
           AVG
          Algorithm     |    Average of micros/op |   Average of ops/sec |    Average of Throughput (MB/s)
          3-way           |     4.01                               |      249,729                 |      27.63
          FastCrc32c  |     4.35                              |     230,897                  |      25.53
      
       2)   Crc32c computation CPU cost (inclusive samples percentage)
          PER RUN
          Implementation | run |  TotalSamples   | Crc32c percentage
          3-way                 |  1    |  4,572,250,000 | 4.37%
          3-way                 |  2    |  3,779,250,000 | 4.62%
          3-way                 |  3    |  4,129,500,000 | 4.48%
          FastCrc32c       |  1    |  4,663,500,000 | 11.24%
          FastCrc32c       |  2    |  4,047,500,000 | 12.34%
          FastCrc32c       |  3    |  4,366,750,000 | 11.68%
      
       **# Test Plan**
           make -j64 corruption_test && ./corruption_test
            By default it uses 3-way SSE algorithm
      
           NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test
      
          make clean && DEBUG_LEVEL=0 make -j64 db_bench
          make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench
      Closes https://github.com/facebook/rocksdb/pull/3173
      
      Differential Revision: D6330882
      
      Pulled By: yingsu00
      
      fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9
      f54d7f5f
  25. 19 12月, 2017 1 次提交
  26. 16 11月, 2017 1 次提交
    • Y
      Suppress valgrind "unimplemented functionality" error · bbcd3b0b
      Yi Wu 提交于
      Summary:
      Add ROCKSDB_VALGRIND_RUN macro and suppress false-positive "unimplemented functionality" throw by valgrind for steam hints.
      
      Another approach would be add a valgrind suppress file. Valgrind is suppose to print the suppression when given "--gen-suppressions=all" param, which is suppose to be the content for the suppression file. But it doesn't print.
      Closes https://github.com/facebook/rocksdb/pull/3174
      
      Differential Revision: D6338786
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 3559efa5f3b92d40d09ad6ac82bc7b59f86c75aa
      bbcd3b0b
  27. 14 11月, 2017 1 次提交
  28. 13 10月, 2017 1 次提交
  29. 06 10月, 2017 1 次提交
  30. 05 10月, 2017 1 次提交
    • A
      rate limit auto-tuning · 1026e794
      Andrew Kryczka 提交于
      Summary:
      Dynamic adjustment of rate limit according to demand for background I/O. It increases by a factor when limiter is drained too frequently, and decreases by the same factor when limiter is not drained frequently enough. The parameters for this behavior are fixed in `GenericRateLimiter::Tune`. Other changes:
      
      - make rate limiter's `Env*` configurable for testing
      - track num drain intervals in RateLimiter so we don't have to rely on stats, which may be shared across different DB instances from the ones that share the RateLimiter.
      Closes https://github.com/facebook/rocksdb/pull/2899
      
      Differential Revision: D5858704
      
      Pulled By: ajkr
      
      fbshipit-source-id: cc2bac30f85e7f6fd63655d0a6732ef9ed7403b1
      1026e794
  31. 04 10月, 2017 2 次提交
    • Y
      speedup 'make check' · 92ccae71
      Yi Wu 提交于
      Summary:
      Make SnapshotConcurrentAccessTest run in the beginning of the queue.
      
      Test Plan
      `make all check -j64` on devserver
      Closes https://github.com/facebook/rocksdb/pull/2962
      
      Differential Revision: D5965871
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 8cb5a47c2468be0fbbb929226a143ec5848bfaa9
      92ccae71
    • Y
      Add ValueType::kTypeBlobIndex · d1cab2b6
      Yi Wu 提交于
      Summary:
      Add kTypeBlobIndex value type, which will be used by blob db only, to insert a (key, blob_offset) KV pair. The purpose is to
      1. Make it possible to open existing rocksdb instance as blob db. Existing value will be of kTypeIndex type, while value inserted by blob db will be of kTypeBlobIndex.
      2. Make rocksdb able to detect if the db contains value written by blob db, if so return error.
      3. Make it possible to have blob db optionally store value in SST file (with kTypeValue type) or as a blob value (with kTypeBlobIndex type).
      
      The root db (DBImpl) basically pretended kTypeBlobIndex are normal value on write. On Get if is_blob is provided, return whether the value read is of kTypeBlobIndex type, or return Status::NotSupported() status if is_blob is not provided. On scan allow_blob flag is pass and if the flag is true, return wether the value is of kTypeBlobIndex type via iter->IsBlob().
      
      Changes on blob db side will be in a separate patch.
      Closes https://github.com/facebook/rocksdb/pull/2886
      
      Differential Revision: D5838431
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 3c5306c62bc13bb11abc03422ec5cbcea1203cca
      d1cab2b6
  32. 03 10月, 2017 1 次提交
  33. 01 9月, 2017 2 次提交
  34. 15 8月, 2017 1 次提交