1. 03 5月, 2019 1 次提交
  2. 01 5月, 2019 1 次提交
  3. 27 3月, 2019 1 次提交
    • Y
      Support for single-primary, multi-secondary instances (#4899) · 9358178e
      Yanqin Jin 提交于
      Summary:
      This PR allows RocksDB to run in single-primary, multi-secondary process mode.
      The writer is a regular RocksDB (e.g. an `DBImpl`) instance playing the role of a primary.
      Multiple `DBImplSecondary` processes (secondaries) share the same set of SST files, MANIFEST, WAL files with the primary. Secondaries tail the MANIFEST of the primary and apply updates to their own in-memory state of the file system, e.g. `VersionStorageInfo`.
      
      This PR has several components:
      1. (Originally in #4745). Add a `PathNotFound` subcode to `IOError` to denote the failure when a secondary tries to open a file which has been deleted by the primary.
      
      2. (Similar to #4602). Add `FragmentBufferedReader` to handle partially-read, trailing record at the end of a log from where future read can continue.
      
      3. (Originally in #4710 and #4820). Add implementation of the secondary, i.e. `DBImplSecondary`.
      3.1 Tail the primary's MANIFEST during recovery.
      3.2 Tail the primary's MANIFEST during normal processing by calling `ReadAndApply`.
      3.3 Tailing WAL will be in a future PR.
      
      4. Add an example in 'examples/multi_processes_example.cc' to demonstrate the usage of secondary RocksDB instance in a multi-process setting. Instructions to run the example can be found at the beginning of the source code.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4899
      
      Differential Revision: D14510945
      
      Pulled By: riversand963
      
      fbshipit-source-id: 4ac1c5693e6012ad23f7b4b42d3c374fecbe8886
      9358178e
  4. 23 2月, 2019 1 次提交
  5. 20 2月, 2019 1 次提交
  6. 14 2月, 2019 1 次提交
  7. 29 1月, 2019 1 次提交
    • Y
      Change the command to invoke parallel tests (#4922) · 95604d13
      Yanqin Jin 提交于
      Summary:
      We used to call `printf $(t_run)` and later feed the result to GNU parallel in the recipe of target `check_0`. However, this approach is problematic when the length of $(t_run) exceeds the
      maximum length of a command and the `printf` command cannot be executed. Instead we use 'find -print' to avoid generating an overly long command.
      
      **This PR is actually the last commit of #4916. Prefer to merge this PR separately.**
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4922
      
      Differential Revision: D13845883
      
      Pulled By: riversand963
      
      fbshipit-source-id: b56de7f7af43337c6ec89b931de843c9667cb679
      95604d13
  8. 25 1月, 2019 1 次提交
  9. 24 1月, 2019 1 次提交
  10. 12 1月, 2019 1 次提交
  11. 11 1月, 2019 1 次提交
  12. 20 12月, 2018 1 次提交
  13. 19 12月, 2018 1 次提交
  14. 18 12月, 2018 2 次提交
  15. 28 11月, 2018 1 次提交
    • H
      Add SstFileReader to read sst files (#4717) · 5e72bc11
      Huachao Huang 提交于
      Summary:
      A user friendly sst file reader is useful when we want to access sst
      files outside of RocksDB. For example, we can generate an sst file
      with SstFileWriter and send it to other places, then use SstFileReader
      to read the file and process the entries in other ways.
      
      Also rename the original SstFileReader to SstFileDumper because of
      name conflict, and seems SstFileDumper is more appropriate for tools.
      
      TODO: there is only a very simple test now, because I want to get some feedback first.
      If the changes look good, I will add more tests soon.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4717
      
      Differential Revision: D13212686
      
      Pulled By: ajkr
      
      fbshipit-source-id: 737593383264c954b79e63edaf44aaae0d947e56
      5e72bc11
  16. 22 11月, 2018 1 次提交
    • A
      Introduce RangeDelAggregatorV2 (#4649) · 457f77b9
      Abhishek Madan 提交于
      Summary:
      The old RangeDelAggregator did expensive pre-processing work
      to create a collapsed, binary-searchable representation of range
      tombstones. With FragmentedRangeTombstoneIterator, much of this work is
      now unnecessary. RangeDelAggregatorV2 takes advantage of this by seeking
      in each iterator to find a covering tombstone in ShouldDelete, while
      doing minimal work in AddTombstones. The old RangeDelAggregator is still
      used during flush/compaction for now, though RangeDelAggregatorV2 will
      support those uses in a future PR.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4649
      
      Differential Revision: D13146964
      
      Pulled By: abhimadan
      
      fbshipit-source-id: be29a4c020fc440500c137216fcc1cf529571eb3
      457f77b9
  17. 13 11月, 2018 1 次提交
  18. 10 11月, 2018 1 次提交
  19. 08 11月, 2018 1 次提交
  20. 31 10月, 2018 1 次提交
  21. 25 10月, 2018 1 次提交
    • A
      Use only "local" range tombstones during Get (#4449) · 8c78348c
      Abhishek Madan 提交于
      Summary:
      Previously, range tombstones were accumulated from every level, which
      was necessary if a range tombstone in a higher level covered a key in a lower
      level. However, RangeDelAggregator::AddTombstones's complexity is based on
      the number of tombstones that are currently stored in it, which is wasteful in
      the Get case, where we only need to know the highest sequence number of range
      tombstones that cover the key from higher levels, and compute the highest covering
      sequence number at the current level. This change introduces this optimization, and
      removes the use of RangeDelAggregator from the Get path.
      
      In the benchmark results, the following command was used to initialize the database:
      ```
      ./db_bench -db=/dev/shm/5k-rts -use_existing_db=false -benchmarks=filluniquerandom -write_buffer_size=1048576 -compression_type=lz4 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -value_size=112 -key_size=16 -block_size=4096 -level_compaction_dynamic_level_bytes=true -num=5000000 -max_background_jobs=12 -benchmark_write_rate_limit=20971520 -range_tombstone_width=100 -writes_per_range_tombstone=100 -max_num_range_tombstones=50000 -bloom_bits=8
      ```
      
      ...and the following command was used to measure read throughput:
      ```
      ./db_bench -db=/dev/shm/5k-rts/ -use_existing_db=true -benchmarks=readrandom -disable_auto_compactions=true -num=5000000 -reads=100000 -threads=32
      ```
      
      The filluniquerandom command was only run once, and the resulting database was used
      to measure read performance before and after the PR. Both binaries were compiled with
      `DEBUG_LEVEL=0`.
      
      Readrandom results before PR:
      ```
      readrandom   :       4.544 micros/op 220090 ops/sec;   16.9 MB/s (63103 of 100000 found)
      ```
      
      Readrandom results after PR:
      ```
      readrandom   :      11.147 micros/op 89707 ops/sec;    6.9 MB/s (63103 of 100000 found)
      ```
      
      So it's actually slower right now, but this PR paves the way for future optimizations (see #4493).
      
      ----
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4449
      
      Differential Revision: D10370575
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 9a2e152be1ef36969055c0e9eb4beb0d96c11f4d
      8c78348c
  22. 24 10月, 2018 1 次提交
    • Y
      Fix compile error with aligned-new (#4576) · 742302a1
      Yi Wu 提交于
      Summary:
      In fbcode when we build with clang7++, although -faligned-new is available in compile phase, we link with an older version of libstdc++.a and it doesn't come with aligned-new support (e.g. `nm libstdc++.a | grep align_val_t` return empty). In this case the previous -faligned-new detection can pass but will end up with link error. Fixing it by only have the detection for non-fbcode build.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4576
      
      Differential Revision: D10500008
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: b375de4fbb61d2a08e54ab709441aa8e7b4b08cf
      742302a1
  23. 28 9月, 2018 1 次提交
    • Y
      Utility to run task periodically in a thread (#4423) · d6f2ecf4
      Yi Wu 提交于
      Summary:
      Introduce `RepeatableThread` utility to run task periodically in a separate thread. It is basically the same as the the same class in fbcode, and in addition provide a helper method to let tests mock time and trigger execution one at a time.
      
      We can use this class to replace `TimerQueue` in #4382 and `BlobDB`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4423
      
      Differential Revision: D10020932
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 3616bef108c39a33c92eedb1256de424b7c04087
      d6f2ecf4
  24. 18 9月, 2018 1 次提交
    • A
      Add RangeDelAggregator microbenchmarks (#4363) · 1626f6ab
      Abhishek Madan 提交于
      Summary:
      To measure the results of upcoming DeleteRange v2 work, this commit adds
      simple benchmarks for RangeDelAggregator. It measures the average time
      for AddTombstones and ShouldDelete calls.
      
      Using this to compare the results before #4014 and on the latest master (using the default arguments) produces the following results:
      
      Before #4014:
      ```
      =======================
      Results:
      =======================
      AddTombstones:          1356.28 us
      ShouldDelete:           0.401732 us
      ```
      
      Latest master:
      ```
      =======================
      Results:
      =======================
      AddTombstones:          740.82 us
      ShouldDelete:           0.383271 us
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4363
      
      Differential Revision: D9881676
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 793e7d61aa4b9d47eb917bbcc03f08695b5e5442
      1626f6ab
  25. 12 9月, 2018 1 次提交
    • Y
      Fix Makefile target 'jtest' on PowerPC (#4357) · 3ba3b153
      Yanqin Jin 提交于
      Summary:
      Before the fix:
      On a PowerPC machine, run the following
      ```
      $ make jtest
      ```
      The command will fail due to "undefined symbol: crc32c_ppc". It was caused by
      'rocksdbjava' Makefile target not including crc32c_ppc object files when
      generating the shared lib. The fix is simple.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4357
      
      Differential Revision: D9779474
      
      Pulled By: riversand963
      
      fbshipit-source-id: 3c5ec9068c2b9c796e6500f71cd900267064fd51
      3ba3b153
  26. 31 8月, 2018 1 次提交
    • Z
      Rename DecodeCFAndKey to resolve naming conflict in unity test (#4323) · 1cf17ba5
      Zhongyi Xie 提交于
      Summary:
      Currently unity-test is failing because both trace_replay.cc and trace_analyzer_tool.cc defined `DecodeCFAndKey` under anonymous namespace. It is supposed to be fine except unity test will dump all source files together and now we have a conflict.
      Another issue with trace_analyzer_tool.cc is that it is using some utility functions from ldb_cmd which is not included in Makefile for unity_test, I chose to update TESTHARNESS to include LIBOBJECTS. Feel free to comment if there is a less intrusive way to solve this.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4323
      
      Differential Revision: D9599170
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 38765b11f8e7de92b43c63bdcf43ea914abdc029
      1cf17ba5
  27. 28 8月, 2018 1 次提交
  28. 22 8月, 2018 1 次提交
    • Z
      Adjusted the Makefile of trace_analyzer to isolate the Gflags from other (#4290) · 9e2d5ab6
      Zhichao Cao 提交于
      Summary:
      Previously, the trace_analyzer_tool will be complied with other libobjects, which let the GFLAGS of trace_analyzer appear in other tools (e.g., db_bench, rocksdb_dump, and etc.). When using '--help', the help information of trace_analyzer will appear in other tool help information, which will cause confusion issues.
      
      Currently, trace_analyzer_tool is built and used only by trace_analyzer and trace_analyzer_test to avoid the issues.
      
      Tested with make asan_check.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4290
      
      Differential Revision: D9413163
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: ed5d20c4575a53ca15ff62a2ffe601d5cf278cc4
      9e2d5ab6
  29. 14 8月, 2018 1 次提交
    • Z
      RocksDB Trace Analyzer (#4091) · 999d955e
      Zhichao Cao 提交于
      Summary:
      A framework of trace analyzing for RocksDB
      
      After collecting the trace by using the tool of [PR #3837](https://github.com/facebook/rocksdb/pull/3837). User can use the Trace Analyzer to interpret, analyze, and characterize the collected workload.
      **Input:**
      1. trace file
      2. Whole keys space file
      
      **Statistics:**
      1. Access count of each operation (Get, Put, Delete, SingleDelete, DeleteRange, Merge) in each column family.
      2. Key hotness (access count) of each one
      3. Key space separation based on given prefix
      4. Key size distribution
      5. Value size distribution if appliable
      6. Top K accessed keys
      7. QPS statistics including the average QPS and peak QPS
      8. Top K accessed prefix
      9. The query correlation analyzing, output the number of X after Y and the corresponding average time
          intervals
      
      **Output:**
      1. key access heat map (either in the accessed key space or whole key space)
      2. trace sequence file (interpret the raw trace file to line base text file for future use)
      3. Time serial (The key space ID and its access time)
      4. Key access count distritbution
      5. Key size distribution
      6. Value size distribution (in each intervals)
      7. whole key space separation by the prefix
      8. Accessed key space separation by the prefix
      9. QPS of each operation and each column family
      10. Top K QPS and their accessed prefix range
      
      **Test:**
      1. Added the unit test of analyzing Get, Put, Delete, SingleDelete, DeleteRange, Merge
      2. Generated the trace and analyze the trace
      
      **Implemented but not tested (due to the limitation of trace_replay):**
      1. Analyzing Iterator, supporting Seek() and SeekForPrev() analyzing
      2. Analyzing the number of Key found by Get
      
      **Future Work:**
      1.  Support execution time analyzing of each requests
      2.  Support cache hit situation and block read situation of Get
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4091
      
      Differential Revision: D9256157
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: f0ceacb7eedbc43a3eee6e85b76087d7832a8fe6
      999d955e
  30. 27 7月, 2018 1 次提交
    • Y
      Enable cscope to exclude test source files (#4190) · bdc6abd0
      Yanqin Jin 提交于
      Summary:
      Usually when using cscope, the query results contain a lot of function calls in test, making it hard to browse. So this PR aims to provide an option to exclude test source files.
      
      Add a new PHONY target, tags0, to exclude test source files while using cscope.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4190
      
      Differential Revision: D9015901
      
      Pulled By: riversand963
      
      fbshipit-source-id: ea9a45756ccff5b26344d37e9ff1c02c5d9736d6
      bdc6abd0
  31. 25 7月, 2018 1 次提交
    • F
      DataBlockHashIndex: Standalone Implementation with Unit Test (#4139) · 8805ec2f
      Fenggang Wu 提交于
      Summary:
      The first step of the `DataBlockHashIndex` implementation. A string based hash table is implemented and unit-tested.
      
      `DataBlockHashIndexBuilder`: `Add()` takes pairs of `<key, restart_index>`, and formats it into a string when `Finish()` is called.
      `DataBlockHashIndex`: initialized by the formatted string, and can interpret it as a hash table. Lookup for a key is supported by iterator operation.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4139
      
      Reviewed By: sagar0
      
      Differential Revision: D8866764
      
      Pulled By: fgwu
      
      fbshipit-source-id: 7f015f0098632c65979a22898a50424384730b10
      8805ec2f
  32. 24 7月, 2018 1 次提交
  33. 14 7月, 2018 1 次提交
    • M
      Exclude StackableDB from transaction stress tests (#4132) · 537a2339
      Maysam Yabandeh 提交于
      Summary:
      The transactions are currently tested with and without using StackableDB. This is mostly to check that the code path is consistent with stackable db as well. Slow, stress tests however do not benefit from being run again with StackableDB. The patch excludes StackableDB from such tests.
      On a single core it reduced the runtime of transaction_test from 199s to 135s.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4132
      
      Differential Revision: D8841655
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 7b9aaba2673b542b195439dfb306cef26bd63b19
      537a2339
  34. 29 6月, 2018 1 次提交
    • A
      Allow DB resume after background errors (#3997) · 52d4c9b7
      Anand Ananthabhotla 提交于
      Summary:
      Currently, if RocksDB encounters errors during a write operation (user requested or BG operations), it sets DBImpl::bg_error_ and fails subsequent writes. This PR allows the DB to be resumed for certain classes of errors. It consists of 3 parts -
      1. Introduce Status::Severity in rocksdb::Status to indicate whether a given error can be recovered from or not
      2. Refactor the error handling code so that setting bg_error_ and deciding on severity is in one place
      3. Provide an API for the user to clear the error and resume the DB instance
      
      This whole change is broken up into multiple PRs. Initially, we only allow clearing the error for Status::NoSpace() errors during background flush/compaction. Subsequent PRs will expand this to include more errors and foreground operations such as Put(), and implement a polling mechanism for out-of-space errors.
      Closes https://github.com/facebook/rocksdb/pull/3997
      
      Differential Revision: D8653831
      
      Pulled By: anand1976
      
      fbshipit-source-id: 6dc835c76122443a7668497c0226b4f072bc6afd
      52d4c9b7
  35. 26 6月, 2018 1 次提交
    • D
      Align StatisticsImpl / StatisticsData (#4036) · 346d1069
      Daniel Black 提交于
      Summary:
      Pinned the alignment of StatisticsData to the cacheline size rather than just extending its size (which could go over two cache lines)if unaligned in allocation.
      
      Avoid compile errors in the process as per individual commit messages.
      
      strengthen static_assert to CACHELINE rather than the highest common multiple.
      Closes https://github.com/facebook/rocksdb/pull/4036
      
      Differential Revision: D8582844
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 363c37029f28e6093e06c60b987bca9aa204bc71
      346d1069
  36. 23 6月, 2018 1 次提交
  37. 02 6月, 2018 1 次提交
  38. 17 5月, 2018 1 次提交
    • M
      Change and clarify the relationship between Valid(), status() and Seek*() for... · 8bf555f4
      Mike Kolupaev 提交于
      Change and clarify the relationship between Valid(), status() and Seek*() for all iterators. Also fix some bugs
      
      Summary:
      Before this PR, Iterator/InternalIterator may simultaneously have non-ok status() and Valid() = true. That state means that the last operation failed, but the iterator is nevertheless positioned on some unspecified record. Likely intended uses of that are:
       * If some sst files are corrupted, a normal iterator can be used to read the data from files that are not corrupted.
       * When using read_tier = kBlockCacheTier, read the data that's in block cache, skipping over the data that is not.
      
      However, this behavior wasn't documented well (and until recently the wiki on github had misleading incorrect information). In the code there's a lot of confusion about the relationship between status() and Valid(), and about whether Seek()/SeekToLast()/etc reset the status or not. There were a number of bugs caused by this confusion, both inside rocksdb and in the code that uses rocksdb (including ours).
      
      This PR changes the convention to:
       * If status() is not ok, Valid() always returns false.
       * Any seek operation resets status. (Before the PR, it depended on iterator type and on particular error.)
      
      This does sacrifice the two use cases listed above, but siying said it's ok.
      
      Overview of the changes:
       * A commit that adds missing status checks in MergingIterator. This fixes a bug that actually affects us, and we need it fixed. `DBIteratorTest.NonBlockingIterationBugRepro` explains the scenario.
       * Changes to lots of iterator types to make all of them conform to the new convention. Some bug fixes along the way. By far the biggest changes are in DBIter, which is a big messy piece of code; I tried to make it less big and messy but mostly failed.
       * A stress-test for DBIter, to gain some confidence that I didn't break it. It does a few million random operations on the iterator, while occasionally modifying the underlying data (like ForwardIterator does) and occasionally returning non-ok status from internal iterator.
      
      To find the iterator types that needed changes I searched for "public .*Iterator" in the code. Here's an overview of all 27 iterator types:
      
      Iterators that didn't need changes:
       * status() is always ok(), or Valid() is always false: MemTableIterator, ModelIter, TestIterator, KVIter (2 classes with this name anonymous namespaces), LoggingForwardVectorIterator, VectorIterator, MockTableIterator, EmptyIterator, EmptyInternalIterator.
       * Thin wrappers that always pass through Valid() and status(): ArenaWrappedDBIter, TtlIterator, InternalIteratorFromIterator.
      
      Iterators with changes (see inline comments for details):
       * DBIter - an overhaul:
          - It used to silently skip corrupted keys (`FindParseableKey()`), which seems dangerous. This PR makes it just stop immediately after encountering a corrupted key, just like it would for other kinds of corruption. Let me know if there was actually some deeper meaning in this behavior and I should put it back.
          - It had a few code paths silently discarding subiterator's status. The stress test caught a few.
          - The backwards iteration code path was expecting the internal iterator's set of keys to be immutable. It's probably always true in practice at the moment, since ForwardIterator doesn't support backwards iteration, but this PR fixes it anyway. See added DBIteratorTest.ReverseToForwardBug for an example.
          - Some parts of backwards iteration code path even did things like `assert(iter_->Valid())` after a seek, which is never a safe assumption.
          - It used to not reset status on seek for some types of errors.
          - Some simplifications and better comments.
          - Some things got more complicated from the added error handling. I'm open to ideas for how to make it nicer.
       * MergingIterator - check status after every operation on every subiterator, and in some places assert that valid subiterators have ok status.
       * ForwardIterator - changed to the new convention, also slightly simplified.
       * ForwardLevelIterator - fixed some bugs and simplified.
       * LevelIterator - simplified.
       * TwoLevelIterator - changed to the new convention. Also fixed a bug that would make SeekForPrev() sometimes silently ignore errors from first_level_iter_.
       * BlockBasedTableIterator - minor changes.
       * BlockIter - replaced `SetStatus()` with `Invalidate()` to make sure non-ok BlockIter is always invalid.
       * PlainTableIterator - some seeks used to not reset status.
       * CuckooTableIterator - tiny code cleanup.
       * ManagedIterator - fixed some bugs.
       * BaseDeltaIterator - changed to the new convention and fixed a bug.
       * BlobDBIterator - seeks used to not reset status.
       * KeyConvertingIterator - some small change.
      Closes https://github.com/facebook/rocksdb/pull/3810
      
      Differential Revision: D7888019
      
      Pulled By: al13n321
      
      fbshipit-source-id: 4aaf6d3421c545d16722a815b2fa2e7912bc851d
      8bf555f4
  39. 05 5月, 2018 1 次提交