1. 18 12月, 2018 1 次提交
  2. 28 11月, 2018 1 次提交
    • H
      Add SstFileReader to read sst files (#4717) · 5e72bc11
      Huachao Huang 提交于
      Summary:
      A user friendly sst file reader is useful when we want to access sst
      files outside of RocksDB. For example, we can generate an sst file
      with SstFileWriter and send it to other places, then use SstFileReader
      to read the file and process the entries in other ways.
      
      Also rename the original SstFileReader to SstFileDumper because of
      name conflict, and seems SstFileDumper is more appropriate for tools.
      
      TODO: there is only a very simple test now, because I want to get some feedback first.
      If the changes look good, I will add more tests soon.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4717
      
      Differential Revision: D13212686
      
      Pulled By: ajkr
      
      fbshipit-source-id: 737593383264c954b79e63edaf44aaae0d947e56
      5e72bc11
  3. 22 11月, 2018 1 次提交
    • A
      Introduce RangeDelAggregatorV2 (#4649) · 457f77b9
      Abhishek Madan 提交于
      Summary:
      The old RangeDelAggregator did expensive pre-processing work
      to create a collapsed, binary-searchable representation of range
      tombstones. With FragmentedRangeTombstoneIterator, much of this work is
      now unnecessary. RangeDelAggregatorV2 takes advantage of this by seeking
      in each iterator to find a covering tombstone in ShouldDelete, while
      doing minimal work in AddTombstones. The old RangeDelAggregator is still
      used during flush/compaction for now, though RangeDelAggregatorV2 will
      support those uses in a future PR.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4649
      
      Differential Revision: D13146964
      
      Pulled By: abhimadan
      
      fbshipit-source-id: be29a4c020fc440500c137216fcc1cf529571eb3
      457f77b9
  4. 13 11月, 2018 1 次提交
  5. 10 11月, 2018 1 次提交
  6. 08 11月, 2018 1 次提交
  7. 31 10月, 2018 1 次提交
  8. 25 10月, 2018 1 次提交
    • A
      Use only "local" range tombstones during Get (#4449) · 8c78348c
      Abhishek Madan 提交于
      Summary:
      Previously, range tombstones were accumulated from every level, which
      was necessary if a range tombstone in a higher level covered a key in a lower
      level. However, RangeDelAggregator::AddTombstones's complexity is based on
      the number of tombstones that are currently stored in it, which is wasteful in
      the Get case, where we only need to know the highest sequence number of range
      tombstones that cover the key from higher levels, and compute the highest covering
      sequence number at the current level. This change introduces this optimization, and
      removes the use of RangeDelAggregator from the Get path.
      
      In the benchmark results, the following command was used to initialize the database:
      ```
      ./db_bench -db=/dev/shm/5k-rts -use_existing_db=false -benchmarks=filluniquerandom -write_buffer_size=1048576 -compression_type=lz4 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -value_size=112 -key_size=16 -block_size=4096 -level_compaction_dynamic_level_bytes=true -num=5000000 -max_background_jobs=12 -benchmark_write_rate_limit=20971520 -range_tombstone_width=100 -writes_per_range_tombstone=100 -max_num_range_tombstones=50000 -bloom_bits=8
      ```
      
      ...and the following command was used to measure read throughput:
      ```
      ./db_bench -db=/dev/shm/5k-rts/ -use_existing_db=true -benchmarks=readrandom -disable_auto_compactions=true -num=5000000 -reads=100000 -threads=32
      ```
      
      The filluniquerandom command was only run once, and the resulting database was used
      to measure read performance before and after the PR. Both binaries were compiled with
      `DEBUG_LEVEL=0`.
      
      Readrandom results before PR:
      ```
      readrandom   :       4.544 micros/op 220090 ops/sec;   16.9 MB/s (63103 of 100000 found)
      ```
      
      Readrandom results after PR:
      ```
      readrandom   :      11.147 micros/op 89707 ops/sec;    6.9 MB/s (63103 of 100000 found)
      ```
      
      So it's actually slower right now, but this PR paves the way for future optimizations (see #4493).
      
      ----
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4449
      
      Differential Revision: D10370575
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 9a2e152be1ef36969055c0e9eb4beb0d96c11f4d
      8c78348c
  9. 24 10月, 2018 1 次提交
    • Y
      Fix compile error with aligned-new (#4576) · 742302a1
      Yi Wu 提交于
      Summary:
      In fbcode when we build with clang7++, although -faligned-new is available in compile phase, we link with an older version of libstdc++.a and it doesn't come with aligned-new support (e.g. `nm libstdc++.a | grep align_val_t` return empty). In this case the previous -faligned-new detection can pass but will end up with link error. Fixing it by only have the detection for non-fbcode build.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4576
      
      Differential Revision: D10500008
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: b375de4fbb61d2a08e54ab709441aa8e7b4b08cf
      742302a1
  10. 28 9月, 2018 1 次提交
    • Y
      Utility to run task periodically in a thread (#4423) · d6f2ecf4
      Yi Wu 提交于
      Summary:
      Introduce `RepeatableThread` utility to run task periodically in a separate thread. It is basically the same as the the same class in fbcode, and in addition provide a helper method to let tests mock time and trigger execution one at a time.
      
      We can use this class to replace `TimerQueue` in #4382 and `BlobDB`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4423
      
      Differential Revision: D10020932
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 3616bef108c39a33c92eedb1256de424b7c04087
      d6f2ecf4
  11. 18 9月, 2018 1 次提交
    • A
      Add RangeDelAggregator microbenchmarks (#4363) · 1626f6ab
      Abhishek Madan 提交于
      Summary:
      To measure the results of upcoming DeleteRange v2 work, this commit adds
      simple benchmarks for RangeDelAggregator. It measures the average time
      for AddTombstones and ShouldDelete calls.
      
      Using this to compare the results before #4014 and on the latest master (using the default arguments) produces the following results:
      
      Before #4014:
      ```
      =======================
      Results:
      =======================
      AddTombstones:          1356.28 us
      ShouldDelete:           0.401732 us
      ```
      
      Latest master:
      ```
      =======================
      Results:
      =======================
      AddTombstones:          740.82 us
      ShouldDelete:           0.383271 us
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4363
      
      Differential Revision: D9881676
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 793e7d61aa4b9d47eb917bbcc03f08695b5e5442
      1626f6ab
  12. 12 9月, 2018 1 次提交
    • Y
      Fix Makefile target 'jtest' on PowerPC (#4357) · 3ba3b153
      Yanqin Jin 提交于
      Summary:
      Before the fix:
      On a PowerPC machine, run the following
      ```
      $ make jtest
      ```
      The command will fail due to "undefined symbol: crc32c_ppc". It was caused by
      'rocksdbjava' Makefile target not including crc32c_ppc object files when
      generating the shared lib. The fix is simple.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4357
      
      Differential Revision: D9779474
      
      Pulled By: riversand963
      
      fbshipit-source-id: 3c5ec9068c2b9c796e6500f71cd900267064fd51
      3ba3b153
  13. 31 8月, 2018 1 次提交
    • Z
      Rename DecodeCFAndKey to resolve naming conflict in unity test (#4323) · 1cf17ba5
      Zhongyi Xie 提交于
      Summary:
      Currently unity-test is failing because both trace_replay.cc and trace_analyzer_tool.cc defined `DecodeCFAndKey` under anonymous namespace. It is supposed to be fine except unity test will dump all source files together and now we have a conflict.
      Another issue with trace_analyzer_tool.cc is that it is using some utility functions from ldb_cmd which is not included in Makefile for unity_test, I chose to update TESTHARNESS to include LIBOBJECTS. Feel free to comment if there is a less intrusive way to solve this.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4323
      
      Differential Revision: D9599170
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 38765b11f8e7de92b43c63bdcf43ea914abdc029
      1cf17ba5
  14. 28 8月, 2018 1 次提交
  15. 22 8月, 2018 1 次提交
    • Z
      Adjusted the Makefile of trace_analyzer to isolate the Gflags from other (#4290) · 9e2d5ab6
      Zhichao Cao 提交于
      Summary:
      Previously, the trace_analyzer_tool will be complied with other libobjects, which let the GFLAGS of trace_analyzer appear in other tools (e.g., db_bench, rocksdb_dump, and etc.). When using '--help', the help information of trace_analyzer will appear in other tool help information, which will cause confusion issues.
      
      Currently, trace_analyzer_tool is built and used only by trace_analyzer and trace_analyzer_test to avoid the issues.
      
      Tested with make asan_check.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4290
      
      Differential Revision: D9413163
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: ed5d20c4575a53ca15ff62a2ffe601d5cf278cc4
      9e2d5ab6
  16. 14 8月, 2018 1 次提交
    • Z
      RocksDB Trace Analyzer (#4091) · 999d955e
      Zhichao Cao 提交于
      Summary:
      A framework of trace analyzing for RocksDB
      
      After collecting the trace by using the tool of [PR #3837](https://github.com/facebook/rocksdb/pull/3837). User can use the Trace Analyzer to interpret, analyze, and characterize the collected workload.
      **Input:**
      1. trace file
      2. Whole keys space file
      
      **Statistics:**
      1. Access count of each operation (Get, Put, Delete, SingleDelete, DeleteRange, Merge) in each column family.
      2. Key hotness (access count) of each one
      3. Key space separation based on given prefix
      4. Key size distribution
      5. Value size distribution if appliable
      6. Top K accessed keys
      7. QPS statistics including the average QPS and peak QPS
      8. Top K accessed prefix
      9. The query correlation analyzing, output the number of X after Y and the corresponding average time
          intervals
      
      **Output:**
      1. key access heat map (either in the accessed key space or whole key space)
      2. trace sequence file (interpret the raw trace file to line base text file for future use)
      3. Time serial (The key space ID and its access time)
      4. Key access count distritbution
      5. Key size distribution
      6. Value size distribution (in each intervals)
      7. whole key space separation by the prefix
      8. Accessed key space separation by the prefix
      9. QPS of each operation and each column family
      10. Top K QPS and their accessed prefix range
      
      **Test:**
      1. Added the unit test of analyzing Get, Put, Delete, SingleDelete, DeleteRange, Merge
      2. Generated the trace and analyze the trace
      
      **Implemented but not tested (due to the limitation of trace_replay):**
      1. Analyzing Iterator, supporting Seek() and SeekForPrev() analyzing
      2. Analyzing the number of Key found by Get
      
      **Future Work:**
      1.  Support execution time analyzing of each requests
      2.  Support cache hit situation and block read situation of Get
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4091
      
      Differential Revision: D9256157
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: f0ceacb7eedbc43a3eee6e85b76087d7832a8fe6
      999d955e
  17. 27 7月, 2018 1 次提交
    • Y
      Enable cscope to exclude test source files (#4190) · bdc6abd0
      Yanqin Jin 提交于
      Summary:
      Usually when using cscope, the query results contain a lot of function calls in test, making it hard to browse. So this PR aims to provide an option to exclude test source files.
      
      Add a new PHONY target, tags0, to exclude test source files while using cscope.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4190
      
      Differential Revision: D9015901
      
      Pulled By: riversand963
      
      fbshipit-source-id: ea9a45756ccff5b26344d37e9ff1c02c5d9736d6
      bdc6abd0
  18. 25 7月, 2018 1 次提交
    • F
      DataBlockHashIndex: Standalone Implementation with Unit Test (#4139) · 8805ec2f
      Fenggang Wu 提交于
      Summary:
      The first step of the `DataBlockHashIndex` implementation. A string based hash table is implemented and unit-tested.
      
      `DataBlockHashIndexBuilder`: `Add()` takes pairs of `<key, restart_index>`, and formats it into a string when `Finish()` is called.
      `DataBlockHashIndex`: initialized by the formatted string, and can interpret it as a hash table. Lookup for a key is supported by iterator operation.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4139
      
      Reviewed By: sagar0
      
      Differential Revision: D8866764
      
      Pulled By: fgwu
      
      fbshipit-source-id: 7f015f0098632c65979a22898a50424384730b10
      8805ec2f
  19. 24 7月, 2018 1 次提交
  20. 14 7月, 2018 1 次提交
    • M
      Exclude StackableDB from transaction stress tests (#4132) · 537a2339
      Maysam Yabandeh 提交于
      Summary:
      The transactions are currently tested with and without using StackableDB. This is mostly to check that the code path is consistent with stackable db as well. Slow, stress tests however do not benefit from being run again with StackableDB. The patch excludes StackableDB from such tests.
      On a single core it reduced the runtime of transaction_test from 199s to 135s.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4132
      
      Differential Revision: D8841655
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 7b9aaba2673b542b195439dfb306cef26bd63b19
      537a2339
  21. 29 6月, 2018 1 次提交
    • A
      Allow DB resume after background errors (#3997) · 52d4c9b7
      Anand Ananthabhotla 提交于
      Summary:
      Currently, if RocksDB encounters errors during a write operation (user requested or BG operations), it sets DBImpl::bg_error_ and fails subsequent writes. This PR allows the DB to be resumed for certain classes of errors. It consists of 3 parts -
      1. Introduce Status::Severity in rocksdb::Status to indicate whether a given error can be recovered from or not
      2. Refactor the error handling code so that setting bg_error_ and deciding on severity is in one place
      3. Provide an API for the user to clear the error and resume the DB instance
      
      This whole change is broken up into multiple PRs. Initially, we only allow clearing the error for Status::NoSpace() errors during background flush/compaction. Subsequent PRs will expand this to include more errors and foreground operations such as Put(), and implement a polling mechanism for out-of-space errors.
      Closes https://github.com/facebook/rocksdb/pull/3997
      
      Differential Revision: D8653831
      
      Pulled By: anand1976
      
      fbshipit-source-id: 6dc835c76122443a7668497c0226b4f072bc6afd
      52d4c9b7
  22. 26 6月, 2018 1 次提交
    • D
      Align StatisticsImpl / StatisticsData (#4036) · 346d1069
      Daniel Black 提交于
      Summary:
      Pinned the alignment of StatisticsData to the cacheline size rather than just extending its size (which could go over two cache lines)if unaligned in allocation.
      
      Avoid compile errors in the process as per individual commit messages.
      
      strengthen static_assert to CACHELINE rather than the highest common multiple.
      Closes https://github.com/facebook/rocksdb/pull/4036
      
      Differential Revision: D8582844
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 363c37029f28e6093e06c60b987bca9aa204bc71
      346d1069
  23. 23 6月, 2018 1 次提交
  24. 02 6月, 2018 1 次提交
  25. 17 5月, 2018 1 次提交
    • M
      Change and clarify the relationship between Valid(), status() and Seek*() for... · 8bf555f4
      Mike Kolupaev 提交于
      Change and clarify the relationship between Valid(), status() and Seek*() for all iterators. Also fix some bugs
      
      Summary:
      Before this PR, Iterator/InternalIterator may simultaneously have non-ok status() and Valid() = true. That state means that the last operation failed, but the iterator is nevertheless positioned on some unspecified record. Likely intended uses of that are:
       * If some sst files are corrupted, a normal iterator can be used to read the data from files that are not corrupted.
       * When using read_tier = kBlockCacheTier, read the data that's in block cache, skipping over the data that is not.
      
      However, this behavior wasn't documented well (and until recently the wiki on github had misleading incorrect information). In the code there's a lot of confusion about the relationship between status() and Valid(), and about whether Seek()/SeekToLast()/etc reset the status or not. There were a number of bugs caused by this confusion, both inside rocksdb and in the code that uses rocksdb (including ours).
      
      This PR changes the convention to:
       * If status() is not ok, Valid() always returns false.
       * Any seek operation resets status. (Before the PR, it depended on iterator type and on particular error.)
      
      This does sacrifice the two use cases listed above, but siying said it's ok.
      
      Overview of the changes:
       * A commit that adds missing status checks in MergingIterator. This fixes a bug that actually affects us, and we need it fixed. `DBIteratorTest.NonBlockingIterationBugRepro` explains the scenario.
       * Changes to lots of iterator types to make all of them conform to the new convention. Some bug fixes along the way. By far the biggest changes are in DBIter, which is a big messy piece of code; I tried to make it less big and messy but mostly failed.
       * A stress-test for DBIter, to gain some confidence that I didn't break it. It does a few million random operations on the iterator, while occasionally modifying the underlying data (like ForwardIterator does) and occasionally returning non-ok status from internal iterator.
      
      To find the iterator types that needed changes I searched for "public .*Iterator" in the code. Here's an overview of all 27 iterator types:
      
      Iterators that didn't need changes:
       * status() is always ok(), or Valid() is always false: MemTableIterator, ModelIter, TestIterator, KVIter (2 classes with this name anonymous namespaces), LoggingForwardVectorIterator, VectorIterator, MockTableIterator, EmptyIterator, EmptyInternalIterator.
       * Thin wrappers that always pass through Valid() and status(): ArenaWrappedDBIter, TtlIterator, InternalIteratorFromIterator.
      
      Iterators with changes (see inline comments for details):
       * DBIter - an overhaul:
          - It used to silently skip corrupted keys (`FindParseableKey()`), which seems dangerous. This PR makes it just stop immediately after encountering a corrupted key, just like it would for other kinds of corruption. Let me know if there was actually some deeper meaning in this behavior and I should put it back.
          - It had a few code paths silently discarding subiterator's status. The stress test caught a few.
          - The backwards iteration code path was expecting the internal iterator's set of keys to be immutable. It's probably always true in practice at the moment, since ForwardIterator doesn't support backwards iteration, but this PR fixes it anyway. See added DBIteratorTest.ReverseToForwardBug for an example.
          - Some parts of backwards iteration code path even did things like `assert(iter_->Valid())` after a seek, which is never a safe assumption.
          - It used to not reset status on seek for some types of errors.
          - Some simplifications and better comments.
          - Some things got more complicated from the added error handling. I'm open to ideas for how to make it nicer.
       * MergingIterator - check status after every operation on every subiterator, and in some places assert that valid subiterators have ok status.
       * ForwardIterator - changed to the new convention, also slightly simplified.
       * ForwardLevelIterator - fixed some bugs and simplified.
       * LevelIterator - simplified.
       * TwoLevelIterator - changed to the new convention. Also fixed a bug that would make SeekForPrev() sometimes silently ignore errors from first_level_iter_.
       * BlockBasedTableIterator - minor changes.
       * BlockIter - replaced `SetStatus()` with `Invalidate()` to make sure non-ok BlockIter is always invalid.
       * PlainTableIterator - some seeks used to not reset status.
       * CuckooTableIterator - tiny code cleanup.
       * ManagedIterator - fixed some bugs.
       * BaseDeltaIterator - changed to the new convention and fixed a bug.
       * BlobDBIterator - seeks used to not reset status.
       * KeyConvertingIterator - some small change.
      Closes https://github.com/facebook/rocksdb/pull/3810
      
      Differential Revision: D7888019
      
      Pulled By: al13n321
      
      fbshipit-source-id: 4aaf6d3421c545d16722a815b2fa2e7912bc851d
      8bf555f4
  26. 05 5月, 2018 1 次提交
  27. 13 4月, 2018 1 次提交
  28. 06 4月, 2018 1 次提交
  29. 29 3月, 2018 2 次提交
    • A
      Allow rocksdbjavastatic to also be built as debug build · 3cb59195
      Adam Retter 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3654
      
      Differential Revision: D7417948
      
      Pulled By: sagar0
      
      fbshipit-source-id: 9514df9328181e54a6384764444c0c7ce66e7f5f
      3cb59195
    • Y
      Fix race condition causing double deletion of ssts · 1f5def16
      Yanqin Jin 提交于
      Summary:
      Possible interleaved execution of background compaction thread calling `FindObsoleteFiles (no full scan) / PurgeObsoleteFiles` and user thread calling `FindObsoleteFiles (full scan) / PurgeObsoleteFiles` can lead to race condition on which RocksDB attempts to delete a file twice. The second attempt will fail and return `IO error`. This may occur to other files,  but this PR targets sst.
      Also add a unit test to verify that this PR fixes the issue.
      
      The newly added unit test `obsolete_files_test` has a test case for this scenario, implemented in `ObsoleteFilesTest#RaceForObsoleteFileDeletion`. `TestSyncPoint`s are used to coordinate the interleaving the `user_thread` and background compaction thread. They execute as follows
      ```
      timeline              user_thread                background_compaction thread
      t1   |                                          FindObsoleteFiles(full_scan=false)
      t2   |     FindObsoleteFiles(full_scan=true)
      t3   |                                          PurgeObsoleteFiles
      t4   |     PurgeObsoleteFiles
           V
      ```
      When `user_thread` invokes `FindObsoleteFiles` with full scan, it collects ALL files in RocksDB directory, including the ones that background compaction thread have collected in its job context. Then `user_thread` will see an IO error when trying to delete these files in `PurgeObsoleteFiles` because background compaction thread has already deleted the file in `PurgeObsoleteFiles`.
      To fix this, we make RocksDB remember which (SST) files have been found by threads after calling `FindObsoleteFiles` (see `DBImpl#files_grabbed_for_purge_`). Therefore, when another thread calls `FindObsoleteFiles` with full scan, it will not collect such files.
      
      ajkr could you take a look and comment? Thanks!
      Closes https://github.com/facebook/rocksdb/pull/3638
      
      Differential Revision: D7384372
      
      Pulled By: riversand963
      
      fbshipit-source-id: 01489516d60012e722ee65a80e1449e589ce26d3
      1f5def16
  30. 27 3月, 2018 1 次提交
  31. 20 3月, 2018 1 次提交
    • T
      Enable compilation on OpenBSD · ccb76136
      Tobias Tschinkowitz 提交于
      Summary:
      I modified the Makefile so that we can compile rocksdb on OpenBSD.
      The instructions for building have been added to INSTALL.md.
      The whole compilation process works fine like this on OpenBSD-current
      Closes https://github.com/facebook/rocksdb/pull/3617
      
      Differential Revision: D7323754
      
      Pulled By: siying
      
      fbshipit-source-id: 990037d1cc69138d22f85bd77ef4dc8c1ba9edea
      ccb76136
  32. 19 3月, 2018 1 次提交
    • Y
      Fix the command used to generate ctags · 1139422d
      Yanqin Jin 提交于
      Summary:
      In original $ROCKSDB_HOME/Makefile, the command used to generate ctags is
      ```
      ctags * -R
      ```
      However, this failed to generate tags for me.
      I did some search on the usage of ctags command and found that it should be
      ```
      ctags -R .
      ```
      or
      ```
      ctags -R *
      ```
      After the change, I can find the tags in vim using `:ts <identifier>`.
      Closes https://github.com/facebook/rocksdb/pull/3626
      
      Reviewed By: ajkr
      
      Differential Revision: D7320217
      
      Pulled By: riversand963
      
      fbshipit-source-id: e4cd8f8a67842370a2343f0213df3cbd07754111
      1139422d
  33. 14 3月, 2018 1 次提交
  34. 06 3月, 2018 1 次提交
  35. 14 2月, 2018 1 次提交
  36. 08 2月, 2018 2 次提交
  37. 01 2月, 2018 1 次提交
  38. 13 1月, 2018 1 次提交