1. 06 4月, 2017 1 次提交
  2. 05 4月, 2017 1 次提交
    • A
      Level-based L0->L0 compaction · d659faad
      Andrew Kryczka 提交于
      Summary:
      Level-based L0->L0 compaction operates on spans of files that aren't currently being compacted. It reduces the number of L0 files, thus making write stall conditions harder to reach.
      
      - L0->L0 is triggered when base level is unavailable due to pending compactions
      - L0->L0 always outputs one file of at most `max_level0_burst_file_size` bytes.
      - Subcompactions are disabled for L0->L0 since we want to output one file.
      - Input files are chosen as the longest span of available files that will fit within the size limit. This minimizes number of files in L0.
      Closes https://github.com/facebook/rocksdb/pull/2027
      
      Differential Revision: D4760318
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9d07183
      d659faad
  3. 04 4月, 2017 1 次提交
  4. 23 3月, 2017 1 次提交
    • D
      Fix clang compile error - [-Werror,-Wunused-lambda-capture] · f4fce475
      Daniel Black 提交于
      Summary:
      Errors where:
      
      db/version_set.cc:1535:20: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
                        [this](const Fsize& f1, const Fsize& f2) -> bool {
                         ^
      db/version_set.cc:1541:20: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
                        [this](const Fsize& f1, const Fsize& f2) -> bool {
                         ^
      db/db_test.cc:2983:27: error: lambda capture 'kNumPutsBeforeWaitForFlush' is not required to be captured for this use [-Werror,-Wunused-lambda-capture]
        auto gen_l0_kb = [this, kNumPutsBeforeWaitForFlush](int size) {
                                ^
      Closes https://github.com/facebook/rocksdb/pull/1972
      
      Differential Revision: D4685991
      
      Pulled By: siying
      
      fbshipit-source-id: 9125379
      f4fce475
  5. 16 3月, 2017 1 次提交
    • I
      Add macros to include file name and line number during Logging · e1916368
      Islam AbdelRahman 提交于
      Summary:
      current logging
      ```
      2017/03/14-14:20:30.393432 7fedde9f5700 (Original Log Time 2017/03/14-14:20:30.393414) [default] Level summary: base level 1 max bytes base 268435456 files[1 0 0 0 0 0 0] max score 0.25
      2017/03/14-14:20:30.393438 7fedde9f5700 [JOB 2] Try to delete WAL files size 61417909, prev total WAL file size 73820858, number of live WAL files 2.
      2017/03/14-14:20:30.393464 7fedde9f5700 [DEBUG] [JOB 2] Delete /dev/shm/old_logging//MANIFEST-000001 type=3 #1 -- OK
      2017/03/14-14:20:30.393472 7fedde9f5700 [DEBUG] [JOB 2] Delete /dev/shm/old_logging//000003.log type=0 #3 -- OK
      2017/03/14-14:20:31.427103 7fedd49f1700 [default] New memtable created with log file: #9. Immutable memtables: 0.
      2017/03/14-14:20:31.427179 7fedde9f5700 [JOB 3] Syncing log #6
      2017/03/14-14:20:31.427190 7fedde9f5700 (Original Log Time 2017/03/14-14:20:31.427170) Calling FlushMemTableToOutputFile with column family [default], flush slots available 1, compaction slots allowed 1, compaction slots scheduled 1
      2017/03/14-14:20:31.
      Closes https://github.com/facebook/rocksdb/pull/1990
      
      Differential Revision: D4708695
      
      Pulled By: IslamAbdelRahman
      
      fbshipit-source-id: cb8968f
      e1916368
  6. 14 3月, 2017 1 次提交
    • M
      Pinnableslice (2nd attempt) · 11526252
      Maysam Yabandeh 提交于
      Summary:
      PinnableSlice
      
          Summary:
          Currently the point lookup values are copied to a string provided by the
          user. This incures an extra memcpy cost. This patch allows doing point lookup
          via a PinnableSlice which pins the source memory location (instead of
          copying their content) and releases them after the content is consumed
          by the user. The old API of Get(string) is translated to the new API
          underneath.
      
          Here is the summary for improvements:
      
          value 100 byte: 1.8% regular, 1.2% merge values
          value 1k byte: 11.5% regular, 7.5% merge values
          value 10k byte: 26% regular, 29.9% merge values
          The improvement for merge could be more if we extend this approach to
          pin the merge output and delay the full merge operation until the user
          actually needs it. We have put that for future work.
      
          PS:
          Sometimes we observe a small decrease in performance when switching from
          t5452014 to this patch but with the old Get(string) API. The d
      Closes https://github.com/facebook/rocksdb/pull/1756
      
      Differential Revision: D4391738
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 6f3edd3
      11526252
  7. 24 2月, 2017 1 次提交
  8. 22 2月, 2017 1 次提交
    • A
      level compaction expansion · 2a0f3d0d
      Aaron Gao 提交于
      Summary:
      reimplement the compaction expansion on lower level.
      
      Considering such a case:
      input level file: 1[B E] 2[F G] 3[H I] 4 [J M]
      output level file: 5[A C] 6[D K] 7[L O]
      
      If we initially pick file 2, now we will compact file 2 and 6. But we can safely compact 2, 3 and 6 without expanding the output level.
      
      The previous code is messy and wrong.
      
      In this diff, I first determine the input range [a, b], and output range [c, d],
      then we get the range [e,f] = [min(a, c), max(b, d] and put all eligible clean-cut files within [e, f] into this compaction.
      
      **Note: clean-cut means the files don't have the same user key on the boundaries of some files that are not chosen in this compaction**.
      Closes https://github.com/facebook/rocksdb/pull/1760
      
      Differential Revision: D4395564
      
      Pulled By: lightmark
      
      fbshipit-source-id: 2dc2c5c
      2a0f3d0d
  9. 17 2月, 2017 1 次提交
  10. 14 2月, 2017 1 次提交
    • S
      Remove disableDataSync option · eb912a92
      Sagar Vemuri 提交于
      Summary:
      Remove disableDataSync, and another similarly named disable_data_sync options.
      This is being done to simplify options, and also because the performance gains of this feature can be achieved by other methods.
      Closes https://github.com/facebook/rocksdb/pull/1859
      
      Differential Revision: D4541292
      
      Pulled By: sagar0
      
      fbshipit-source-id: 5b3a6ca
      eb912a92
  11. 03 2月, 2017 1 次提交
  12. 09 1月, 2017 2 次提交
    • M
      Revert "PinnableSlice" · d0ba8ec8
      Maysam Yabandeh 提交于
      Summary:
      This reverts commit 54d94e9c.
      
      The pull request was landed by mistake.
      Closes https://github.com/facebook/rocksdb/pull/1755
      
      Differential Revision: D4391678
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 36d5149
      d0ba8ec8
    • M
      PinnableSlice · 54d94e9c
      Maysam Yabandeh 提交于
      Summary:
      Currently the point lookup values are copied to a string provided by the user.
      This incures an extra memcpy cost. This patch allows doing point lookup
      via a PinnableSlice which pins the source memory location (instead of
      copying their content) and releases them after the content is consumed
      by the user. The old API of Get(string) is translated to the new API
      underneath.
      
       Here is the summary for improvements:
       1. value 100 byte: 1.8%  regular, 1.2% merge values
       2. value 1k   byte: 11.5% regular, 7.5% merge values
       3. value 10k byte: 26% regular,    29.9% merge values
      
       The improvement for merge could be more if we extend this approach to
       pin the merge output and delay the full merge operation until the user
       actually needs it. We have put that for future work.
      
      PS:
      Sometimes we observe a small decrease in performance when switching from
      t5452014 to this patch but with the old Get(string) API. The difference
      is a little and could be noise. More importantly it is safely
      cancelled
      Closes https://github.com/facebook/rocksdb/pull/1732
      
      Differential Revision: D4374613
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: a077f1a
      54d94e9c
  13. 16 11月, 2016 1 次提交
  14. 05 11月, 2016 1 次提交
    • A
      DeleteRange user iterator support · 9e7cf346
      Andrew Kryczka 提交于
      Summary:
      Note: reviewed in  https://reviews.facebook.net/D65115
      
      - DBIter maintains a range tombstone accumulator. We don't cleanup obsolete tombstones yet, so if the user seeks back and forth, the same tombstones would be added to the accumulator multiple times.
      - DBImpl::NewInternalIterator() (used to make DBIter's underlying iterator) adds memtable/L0 range tombstones, L1+ range tombstones are added on-demand during NewSecondaryIterator() (see D62205)
      - DBIter uses ShouldDelete() when advancing to check whether keys are covered by range tombstones
      Closes https://github.com/facebook/rocksdb/pull/1464
      
      Differential Revision: D4131753
      
      Pulled By: ajkr
      
      fbshipit-source-id: be86559
      9e7cf346
  15. 04 11月, 2016 1 次提交
    • A
      DeleteRange Get support · f998c979
      Andrew Kryczka 提交于
      Summary:
      During Get()/MultiGet(), build up a RangeDelAggregator with range
      tombstones as we search through live memtable, immutable memtables, and
      SST files. This aggregator is then used by memtable.cc's SaveValue() and
      GetContext::SaveValue() to check whether keys are covered.
      
      added tests for Get on memtables/files; end-to-end tests mainly in https://reviews.facebook.net/D64761
      Closes https://github.com/facebook/rocksdb/pull/1456
      
      Differential Revision: D4111271
      
      Pulled By: ajkr
      
      fbshipit-source-id: 6e388d4
      f998c979
  16. 02 11月, 2016 1 次提交
  17. 21 10月, 2016 1 次提交
    • I
      Support IngestExternalFile (remove AddFile restrictions) · 869ae5d7
      Islam AbdelRahman 提交于
      Summary:
      Changes in the diff
      
      API changes:
      - Introduce IngestExternalFile to replace AddFile (I think this make the API more clear)
      - Introduce IngestExternalFileOptions (This struct will encapsulate the options for ingesting the external file)
      - Deprecate AddFile() API
      
      Logic changes:
      - If our file overlap with the memtable we will flush the memtable
      - We will find the first level in the LSM tree that our file key range overlap with the keys in it
      - We will find the lowest level in the LSM tree above the the level we found in step 2 that our file can fit in and ingest our file in it
      - We will assign a global sequence number to our new file
      - Remove AddFile restrictions by using global sequence numbers
      
      Other changes:
      - Refactor all AddFile logic to be encapsulated in ExternalSstFileIngestionJob
      
      Test Plan:
      unit tests (still need to add more)
      addfile_stress (https://reviews.facebook.net/D65037)
      
      Reviewers: yiwu, andrewkr, lightmark, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: jkedgar, hcz, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D65061
      869ae5d7
  18. 19 10月, 2016 1 次提交
    • A
      Compaction Support for Range Deletion · 6fbe96ba
      Andrew Kryczka 提交于
      Summary:
      This diff introduces RangeDelAggregator, which takes ownership of iterators
      provided to it via AddTombstones(). The tombstones are organized in a two-level
      map (snapshot stripe -> begin key -> tombstone). Tombstone creation avoids data
      copy by holding Slices returned by the iterator, which remain valid thanks to pinning.
      
      For compaction, we create a hierarchical range tombstone iterator with structure
      matching the iterator over compaction input data. An aggregator based on that
      iterator is used by CompactionIterator to determine which keys are covered by
      range tombstones. In case of merge operand, the same aggregator is used by
      MergeHelper. Upon finishing each file in the compaction, relevant range tombstones
      are added to the output file's range tombstone metablock and file boundaries are
      updated accordingly.
      
      To check whether a key is covered by range tombstone, RangeDelAggregator::ShouldDelete()
      considers tombstones in the key's snapshot stripe. When this function is used outside of
      compaction, it also checks newer stripes, which can contain covering tombstones. Currently
      the intra-stripe check involves a linear scan; however, in the future we plan to collapse ranges
      within a stripe such that binary search can be used.
      
      RangeDelAggregator::AddToBuilder() adds all range tombstones in the table's key-range
      to a new table's range tombstone meta-block. Since range tombstones may fall in the gap
      between files, we may need to extend some files' key-ranges. The strategy is (1) first file
      extends as far left as possible and other files do not extend left, (2) all files extend right
      until either the start of the next file or the end of the last range tombstone in the gap,
      whichever comes first.
      
      One other notable change is adding release/move semantics to ScopedArenaIterator
      such that it can be used to transfer ownership of an arena-allocated iterator, similar to
      how unique_ptr is used for malloc'd data.
      
      Depends on D61473
      
      Test Plan: compaction_iterator_test, mock_table, end-to-end tests in D63927
      
      Reviewers: sdong, IslamAbdelRahman, wanning, yhchiang, lightmark
      
      Reviewed By: lightmark
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D62205
      6fbe96ba
  19. 08 10月, 2016 1 次提交
    • I
      Support running consistency checks in release mode · 2ad68b97
      Islam AbdelRahman 提交于
      Summary:
      We always run consistency checks when compiling in debug mode
      allow users to set Options::force_consistency_checks to true to be able to run such checks even when compiling in release mode
      
      Test Plan:
      make check -j64
      make release
      
      Reviewers: lightmark, sdong, yiwu
      
      Reviewed By: yiwu
      
      Subscribers: hermanlee4, andrewkr, yoshinorim, jkedgar, dhruba
      
      Differential Revision: https://reviews.facebook.net/D64701
      2ad68b97
  20. 28 9月, 2016 1 次提交
    • A
      Add SeekForPrev() to Iterator · f517d9dd
      Aaron Gao 提交于
      Summary:
      Add new Iterator API, `SeekForPrev`: find the last key that <= target key
      support prefix_extractor
      support prefix_same_as_start
      support upper_bound
      not supported in iterators without Prev()
      
      Also add tests in db_iter_test and db_iterator_test
      
      Pass all tests
      Cheers!
      
      Test Plan: make all check -j64
      
      Reviewers: andrewkr, yiwu, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D64149
      f517d9dd
  21. 24 9月, 2016 1 次提交
    • Y
      Split DBOptions into ImmutableDBOptions and MutableDBOptions · 9ed928e7
      Yi Wu 提交于
      Summary: Use ImmutableDBOptions/MutableDBOptions internally and DBOptions only for user-facing APIs. MutableDBOptions is barely a placeholder for now. I'll start to move options to MutableDBOptions in following diffs.
      
      Test Plan:
        make all check
      
      Reviewers: yhchiang, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D64065
      9ed928e7
  22. 14 9月, 2016 1 次提交
    • Y
      Refactor MutableCFOptions · 81747f1b
      Yi Wu 提交于
      Summary:
      * Change constructor of MutableCFOptions to depends only on ColumnFamilyOptions.
      * Move `max_subcompactions`, `compaction_options_fifo` and `compaction_pri` to ImmutableCFOptions to make it clear that they are immutable.
      
      Test Plan: existing unit tests.
      
      Reviewers: yhchiang, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D63945
      81747f1b
  23. 21 7月, 2016 2 次提交
    • O
      Only cache level 0 indexes and filter when opening table reader · e70020e4
      omegaga 提交于
      Summary: In T8216281 we decided to disable prefetching the index and filter during opening table handlers during startup (max_open_files = -1).
      
      Test Plan: Rely on `IndexAndFilterBlocksOfNewTableAddedToCache` to guarantee L0 indexes and filters are still cached and change `PinL0IndexAndFilterBlocksTest` to make sure other levels are not cached (maybe add one more test to test we don't cache other levels?)
      
      Reviewers: sdong, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D59913
      e70020e4
    • I
      Introduce FullMergeV2 (eliminate memcpy from merge operators) · 68a8e6b8
      Islam AbdelRahman 提交于
      Summary:
      This diff update the code to pin the merge operator operands while the merge operation is done, so that we can eliminate the memcpy cost, to do that we need a new public API for FullMerge that replace the std::deque<std::string> with std::vector<Slice>
      
      This diff is stacked on top of D56493 and D56511
      
      In this diff we
      - Update FullMergeV2 arguments to be encapsulated in MergeOperationInput and MergeOperationOutput which will make it easier to add new arguments in the future
      - Replace std::deque<std::string> with std::vector<Slice> to pass operands
      - Replace MergeContext std::deque with std::vector (based on a simple benchmark I ran https://gist.github.com/IslamAbdelRahman/78fc86c9ab9f52b1df791e58943fb187)
      - Allow FullMergeV2 output to be an existing operand
      
      ```
      [Everything in Memtable | 10K operands | 10 KB each | 1 operand per key]
      
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=10000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000
      
      [FullMergeV2]
      readseq      :       0.607 micros/op 1648235 ops/sec; 16121.2 MB/s
      readseq      :       0.478 micros/op 2091546 ops/sec; 20457.2 MB/s
      readseq      :       0.252 micros/op 3972081 ops/sec; 38850.5 MB/s
      readseq      :       0.237 micros/op 4218328 ops/sec; 41259.0 MB/s
      readseq      :       0.247 micros/op 4043927 ops/sec; 39553.2 MB/s
      
      [master]
      readseq      :       3.935 micros/op 254140 ops/sec; 2485.7 MB/s
      readseq      :       3.722 micros/op 268657 ops/sec; 2627.7 MB/s
      readseq      :       3.149 micros/op 317605 ops/sec; 3106.5 MB/s
      readseq      :       3.125 micros/op 320024 ops/sec; 3130.1 MB/s
      readseq      :       4.075 micros/op 245374 ops/sec; 2400.0 MB/s
      ```
      
      ```
      [Everything in Memtable | 10K operands | 10 KB each | 10 operand per key]
      
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=1000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000
      
      [FullMergeV2]
      readseq      :       3.472 micros/op 288018 ops/sec; 2817.1 MB/s
      readseq      :       2.304 micros/op 434027 ops/sec; 4245.2 MB/s
      readseq      :       1.163 micros/op 859845 ops/sec; 8410.0 MB/s
      readseq      :       1.192 micros/op 838926 ops/sec; 8205.4 MB/s
      readseq      :       1.250 micros/op 800000 ops/sec; 7824.7 MB/s
      
      [master]
      readseq      :      24.025 micros/op 41623 ops/sec;  407.1 MB/s
      readseq      :      18.489 micros/op 54086 ops/sec;  529.0 MB/s
      readseq      :      18.693 micros/op 53495 ops/sec;  523.2 MB/s
      readseq      :      23.621 micros/op 42335 ops/sec;  414.1 MB/s
      readseq      :      18.775 micros/op 53262 ops/sec;  521.0 MB/s
      
      ```
      
      ```
      [Everything in Block cache | 10K operands | 10 KB each | 1 operand per key]
      
      [FullMergeV2]
      $ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions
      readseq      :      14.741 micros/op 67837 ops/sec;  663.5 MB/s
      readseq      :       1.029 micros/op 971446 ops/sec; 9501.6 MB/s
      readseq      :       0.974 micros/op 1026229 ops/sec; 10037.4 MB/s
      readseq      :       0.965 micros/op 1036080 ops/sec; 10133.8 MB/s
      readseq      :       0.943 micros/op 1060657 ops/sec; 10374.2 MB/s
      
      [master]
      readseq      :      16.735 micros/op 59755 ops/sec;  584.5 MB/s
      readseq      :       3.029 micros/op 330151 ops/sec; 3229.2 MB/s
      readseq      :       3.136 micros/op 318883 ops/sec; 3119.0 MB/s
      readseq      :       3.065 micros/op 326245 ops/sec; 3191.0 MB/s
      readseq      :       3.014 micros/op 331813 ops/sec; 3245.4 MB/s
      ```
      
      ```
      [Everything in Block cache | 10K operands | 10 KB each | 10 operand per key]
      
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10-operands-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions
      
      [FullMergeV2]
      readseq      :      24.325 micros/op 41109 ops/sec;  402.1 MB/s
      readseq      :       1.470 micros/op 680272 ops/sec; 6653.7 MB/s
      readseq      :       1.231 micros/op 812347 ops/sec; 7945.5 MB/s
      readseq      :       1.091 micros/op 916590 ops/sec; 8965.1 MB/s
      readseq      :       1.109 micros/op 901713 ops/sec; 8819.6 MB/s
      
      [master]
      readseq      :      27.257 micros/op 36687 ops/sec;  358.8 MB/s
      readseq      :       4.443 micros/op 225073 ops/sec; 2201.4 MB/s
      readseq      :       5.830 micros/op 171526 ops/sec; 1677.7 MB/s
      readseq      :       4.173 micros/op 239635 ops/sec; 2343.8 MB/s
      readseq      :       4.150 micros/op 240963 ops/sec; 2356.8 MB/s
      ```
      
      Test Plan: COMPILE_WITH_ASAN=1 make check -j64
      
      Reviewers: yhchiang, andrewkr, sdong
      
      Reviewed By: sdong
      
      Subscribers: lovro, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57075
      68a8e6b8
  24. 20 7月, 2016 1 次提交
    • J
      New Statistics to track Compression/Decompression (#1197) · 9430333f
      John Alexander 提交于
      * Added new statistics and refactored to allow ioptions to be passed around as required to access environment and statistics pointers (and, as a convenient side effect, info_log pointer).
      
      * Prevent incrementing compression counter when compression is turned off in options.
      
      * Prevent incrementing compression counter when compression is turned off in options.
      
      * Added two more supported compression types to test code in db_test.cc
      
      * Prevent incrementing compression counter when compression is turned off in options.
      
      * Added new StatsLevel that excludes compression timing.
      
      * Fixed casting error in coding.h
      
      * Fixed CompressionStatsTest for new StatsLevel.
      
      * Removed unused variable that was breaking the Linux build
      9430333f
  25. 09 7月, 2016 1 次提交
    • Y
      Fix clang analyzer errors · 296545a2
      Yi Wu 提交于
      Summary:
      Fixing erros reported by clang static analyzer.
      * Removing some unused variables.
      * Adding assertions to fix false positives reported by clang analyzer.
      * Adding `__clang_analyzer__` macro to suppress false positive warnings.
      
      Test Plan:
          USE_CLANG=1 OPT=-g make analyze -j64
      
      Reviewers: andrewkr, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D60549
      296545a2
  26. 07 7月, 2016 1 次提交
    • G
      Fix release build for MyRocks by using debug-only code only in debug builds · b954847f
      Gunnar Kudrjavets 提交于
      Summary: MyRocks release integration build breaks because we treat warnings caused by unused variables as errors. Variable `edit` is only used in debug builds. Therefore we need to guard it using `#ifndef NDEBUG` check.
      
      Test Plan:
      - `[p]arc diff --preview` for the default validation.
      - Verify that release build fails before this fix and passes after applying it.
      
      Reviewers: andrewkr, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D60423
      b954847f
  27. 06 7月, 2016 2 次提交
    • S
      Add options.write_buffer_manager: control total memtable size across DB instances · 32df9733
      sdong 提交于
      Summary: Add option write_buffer_manager to help users control total memory spent on memtables across multiple DB instances.
      
      Test Plan: Add a new unit test.
      
      Reviewers: yhchiang, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: adela, benj, sumeet, muthu, leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D59925
      32df9733
    • A
      group multiple batch of flush into one manifest file (one call to LogAndApply) · 5aaef91d
      Aaron Gao 提交于
      Summary: Currently, if several flush outputs are committed together, we issue each manifest write per batch (1 batch = 1 flush = 1 sst file = 1+ continuous memtables). Each manifest write requires one fsync and one fsync to parent directory. In some cases, it becomes the bottleneck of write. We should batch them and write in one manifest write when possible.
      
      Test Plan:
      ` ./db_bench -benchmarks="fillseq" -max_write_buffer_number=16 -max_background_flushes=16 -disable_auto_compactions=true -min_write_buffer_number_to_merge=1 -write_buffer_size=65536 -level0_stop_writes_trigger=10000 -level0_slowdown_writes_trigger=10000`
      **Before**
      ```
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 4.9
      Date:       Fri Jul  1 15:38:17 2016
      CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
      CPUCache:   20480 KB
      Keys:       16 bytes each
      Values:     100 bytes each (50 bytes after compression)
      Entries:    1000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    110.6 MB (estimated)
      FileSize:   62.9 MB (estimated)
      Write rate: 0 bytes/second
      Compression: Snappy
      Memtablerep: skip_list
      Perf Level: 1
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      DB path: [/tmp/rocksdbtest-112628/dbbench]
      fillseq      :     166.277 micros/op 6014 ops/sec;    0.7 MB/s
      ```
      **After**
      ```
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 4.9
      Date:       Fri Jul  1 15:35:05 2016
      CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
      CPUCache:   20480 KB
      Keys:       16 bytes each
      Values:     100 bytes each (50 bytes after compression)
      Entries:    1000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    110.6 MB (estimated)
      FileSize:   62.9 MB (estimated)
      Write rate: 0 bytes/second
      Compression: Snappy
      Memtablerep: skip_list
      Perf Level: 1
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      DB path: [/tmp/rocksdbtest-112628/dbbench]
      fillseq      :      52.328 micros/op 19110 ops/sec;    2.1 MB/s
      ```
      
      Reviewers: andrewkr, IslamAbdelRahman, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: igor, andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D60075
      5aaef91d
  28. 14 6月, 2016 1 次提交
  29. 24 5月, 2016 1 次提交
  30. 20 5月, 2016 1 次提交
  31. 11 5月, 2016 1 次提交
    • I
      Fix data race in GetObsoleteFiles() · 560358dc
      Islam AbdelRahman 提交于
      Summary:
      GetObsoleteFiles() and LogAndApply() functions modify obsolete_manifests_ vector
      we need to make sure that the mutex is held when we modify the obsolete_manifests_
      
      Test Plan: run the test under TSAN
      
      Reviewers: andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D58011
      560358dc
  32. 10 5月, 2016 1 次提交
    • S
      Estimate pending compaction bytes more accurately · bfb6b1b8
      sdong 提交于
      Summary: Currently we estimate bytes needed for compaction by assuming fanout value to be level multiplier. It overestimates when size of a level exceeds the target by large. We estimate by the ratio of actual sizes in levels instead.
      
      Test Plan: Fix existing test cases and add a new one.
      
      Reviewers: IslamAbdelRahman, igor, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57789
      bfb6b1b8
  33. 21 4月, 2016 1 次提交
    • A
      Add per-level compression ratio property · 73a847ef
      Andrew Kryczka 提交于
      Summary:
      This is needed so we can measure compression ratio improvements
      achieved by D52287.
      
      The property compares raw data size against the total file size for a given
      level. If the level is empty it should return 0.0.
      
      Test Plan: new unit test
      
      Reviewers: IslamAbdelRahman, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D56967
      73a847ef
  34. 07 4月, 2016 1 次提交
    • I
      Don't use version in the error message · ab4c6233
      Igor Canadi 提交于
      Summary: We use object `v` in the error message, which is not initialized if the edit is column family manipulation. This doesn't provide much useful info, so this diff is removing it. Instead, it dumps actual VersionEdit contents.
      
      Test Plan: compiles. would be great to get tests in version_set_test.cc that cover cases where a file write fails
      
      Reviewers: sdong, yhchiang, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D56349
      ab4c6233
  35. 02 4月, 2016 2 次提交
    • A
      No need to limit to 20 files in UpdateAccumulatedStats() if options.max_open_files=-1 · cc87075d
      Aaron Gao 提交于
      Summary:
      There is a hardcoded constraint in our statistics collection that prevents reading properties from more than 20 SST files. This means our statistics will be very inaccurate for databases with > 20 files since additional files are just ignored. The purpose of constraining the number of files used is to bound the I/O performed during statistics collection, since these statistics need to be recomputed every time the database reopened.
      
      However, this constraint doesn't take into account the case where option "max_open_files" is -1. In that case, all the file metadata has already been read, so MaybeInitializeFileMetaData() won't incur any I/O cost. so this diff gets rid of the 20-file constraint in case max_open_files == -1.
      
      Test Plan:
      write into unit test db/db_properties_test.cc - "ValidateSampleNumber".
      We generate 20 files with 2 rows and 10 files with 1 row.
      If max_open_files !=-1, the `rocksdb.estimate-num-keys` should be (10*1 + 10*2)/20 * 30 = 45. Otherwise, it should be the ground truth, 50.
      {F1089153}
      
      Reviewers: andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D56253
      cc87075d
    • M
      Adding pin_l0_filter_and_index_blocks_in_cache feature and related fixes. · 9b519875
      Marton Trencseni 提交于
      Summary:
      When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
      What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
      
      Test Plan:
      'export TEST_TMPDIR=/dev/shm/ && DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32' is OK.
      I didn't run the Java tests, I don't have Java set up on my devserver.
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56133
      9b519875
  36. 26 3月, 2016 1 次提交