1. 23 3月, 2017 1 次提交
  2. 22 3月, 2017 1 次提交
  3. 08 3月, 2017 1 次提交
  4. 23 2月, 2017 1 次提交
  5. 07 2月, 2017 1 次提交
    • D
      Windows thread · 0a4cdde5
      Dmitri Smirnov 提交于
      Summary:
      introduce new methods into a public threadpool interface,
      - allow submission of std::functions as they allow greater flexibility.
      - add Joining methods to the implementation to join scheduled and submitted jobs with
        an option to cancel jobs that did not start executing.
      - Remove ugly `#ifdefs` between pthread and std implementation, make it uniform.
      - introduce pimpl for a drop in replacement of the implementation
      - Introduce rocksdb::port::Thread typedef which is a replacement for std::thread.  On Posix Thread defaults as before std::thread.
      - Implement WindowsThread that allocates memory in a more controllable manner than windows std::thread with a replaceable implementation.
      - should be no functionality changes.
      Closes https://github.com/facebook/rocksdb/pull/1823
      
      Differential Revision: D4492902
      
      Pulled By: siying
      
      fbshipit-source-id: c74cb11
      0a4cdde5
  6. 04 2月, 2017 1 次提交
  7. 27 1月, 2017 1 次提交
  8. 21 1月, 2017 1 次提交
  9. 20 12月, 2016 1 次提交
  10. 22 11月, 2016 1 次提交
  11. 17 11月, 2016 1 次提交
  12. 12 11月, 2016 1 次提交
  13. 14 10月, 2016 1 次提交
    • I
      Fix compaction conflict with running compaction · 5691a1d8
      Islam AbdelRahman 提交于
      Summary:
      Issue scenario:
      (1) We have 3 files in L1 and we issue a compaction that will compact them into 1 file in L2
      (2) While compaction (1) is running, we flush a file into L0 and trigger another compaction that decide to move this file to L1 and then move it again to L2 (this file don't overlap with any other files)
      (3) compaction (1) finishes and install the file it generated in L2, but this file overlap with the file we generated in (2) so we break the LSM consistency
      
      Looks like this issue can be triggered by using non-exclusive manual compaction or AddFile()
      
      Test Plan: unit tests
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: hermanlee4, jkedgar, andrewkr, dhruba, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D64947
      5691a1d8
  14. 22 9月, 2016 1 次提交
  15. 20 9月, 2016 1 次提交
  16. 07 9月, 2016 1 次提交
    • S
      Support ZSTD with finalized format · 607628d3
      sdong 提交于
      Summary:
      ZSTD 1.0.0 is coming. We can finally add a support of ZSTD without worrying about compatibility.
      Still keep ZSTDNotFinal for compatibility reason.
      
      Test Plan: Run all tests. Run db_bench with ZSTD version with RocksDB built with ZSTD 1.0 and older.
      
      Reviewers: andrewkr, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: cyan, igor, IslamAbdelRahman, leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D63141
      607628d3
  17. 02 9月, 2016 1 次提交
    • S
      Merge options source_compaction_factor, max_grandparent_overlap_bytes and... · 32149059
      sdong 提交于
      Merge options source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes
      
      Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes.
      
      Test Plan: Add two new unit tests. Run all existing tests, including jtest.
      
      Reviewers: yhchiang, igor, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D59829
      32149059
  18. 27 8月, 2016 1 次提交
    • I
      Introduce Read amplification bitmap (read amp statistics) · b49b92cf
      Islam AbdelRahman 提交于
      Summary:
      Add ReadOptions::read_amp_bytes_per_bit option which allow us to create a bitmap for every data block we read
      the bitmap will contain (block_size / read_amp_bytes_per_bit) bits.
      
      We will use this bitmap to mark which bytes have been used of the block so we can calculate the read amplification
      
      Test Plan: added new tests
      
      Reviewers: andrewkr, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: yiwu, leveldb, march, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D58707
      b49b92cf
  19. 26 8月, 2016 1 次提交
    • S
      Mitigate regression bug of options.max_successive_merges hit during DB Recovery · dade61ac
      sdong 提交于
      Summary:
      After 1b8a2e8f, DB Pointer is passed to WriteBatchInternal::InsertInto() while DB recovery. This can cause deadlock if options.max_successive_merges hits. In that case DB::Get() will be called. Get() will try to acquire the DB mutex, which is already held by the DB::Open(), causing a deadlock condition.
      
      This commit mitigates the problem by not passing the DB pointer unless 2PC is allowed.
      
      Test Plan: Add a new test and run it.
      
      Reviewers: IslamAbdelRahman, andrewkr, kradhakrishnan, horuff
      
      Reviewed By: kradhakrishnan
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D62625
      dade61ac
  20. 12 8月, 2016 1 次提交
    • I
      Eliminate memcpy from ForwardIterator · d11c09d9
      Islam AbdelRahman 提交于
      Summary:
      This diff update ForwardIterator to support pinning keys and values, which will allow DBIter to take advantage of that and eliminate memcpy when executing merge operators
      This diff is stacked on D61305
      
      Test Plan:
      existing tests (updated them to test tailing iterator)
      new test
      
      Reviewers: andrewkr, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D60009
      d11c09d9
  21. 21 7月, 2016 2 次提交
    • O
      Only cache level 0 indexes and filter when opening table reader · e70020e4
      omegaga 提交于
      Summary: In T8216281 we decided to disable prefetching the index and filter during opening table handlers during startup (max_open_files = -1).
      
      Test Plan: Rely on `IndexAndFilterBlocksOfNewTableAddedToCache` to guarantee L0 indexes and filters are still cached and change `PinL0IndexAndFilterBlocksTest` to make sure other levels are not cached (maybe add one more test to test we don't cache other levels?)
      
      Reviewers: sdong, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D59913
      e70020e4
    • I
      Introduce FullMergeV2 (eliminate memcpy from merge operators) · 68a8e6b8
      Islam AbdelRahman 提交于
      Summary:
      This diff update the code to pin the merge operator operands while the merge operation is done, so that we can eliminate the memcpy cost, to do that we need a new public API for FullMerge that replace the std::deque<std::string> with std::vector<Slice>
      
      This diff is stacked on top of D56493 and D56511
      
      In this diff we
      - Update FullMergeV2 arguments to be encapsulated in MergeOperationInput and MergeOperationOutput which will make it easier to add new arguments in the future
      - Replace std::deque<std::string> with std::vector<Slice> to pass operands
      - Replace MergeContext std::deque with std::vector (based on a simple benchmark I ran https://gist.github.com/IslamAbdelRahman/78fc86c9ab9f52b1df791e58943fb187)
      - Allow FullMergeV2 output to be an existing operand
      
      ```
      [Everything in Memtable | 10K operands | 10 KB each | 1 operand per key]
      
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=10000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000
      
      [FullMergeV2]
      readseq      :       0.607 micros/op 1648235 ops/sec; 16121.2 MB/s
      readseq      :       0.478 micros/op 2091546 ops/sec; 20457.2 MB/s
      readseq      :       0.252 micros/op 3972081 ops/sec; 38850.5 MB/s
      readseq      :       0.237 micros/op 4218328 ops/sec; 41259.0 MB/s
      readseq      :       0.247 micros/op 4043927 ops/sec; 39553.2 MB/s
      
      [master]
      readseq      :       3.935 micros/op 254140 ops/sec; 2485.7 MB/s
      readseq      :       3.722 micros/op 268657 ops/sec; 2627.7 MB/s
      readseq      :       3.149 micros/op 317605 ops/sec; 3106.5 MB/s
      readseq      :       3.125 micros/op 320024 ops/sec; 3130.1 MB/s
      readseq      :       4.075 micros/op 245374 ops/sec; 2400.0 MB/s
      ```
      
      ```
      [Everything in Memtable | 10K operands | 10 KB each | 10 operand per key]
      
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=1000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000
      
      [FullMergeV2]
      readseq      :       3.472 micros/op 288018 ops/sec; 2817.1 MB/s
      readseq      :       2.304 micros/op 434027 ops/sec; 4245.2 MB/s
      readseq      :       1.163 micros/op 859845 ops/sec; 8410.0 MB/s
      readseq      :       1.192 micros/op 838926 ops/sec; 8205.4 MB/s
      readseq      :       1.250 micros/op 800000 ops/sec; 7824.7 MB/s
      
      [master]
      readseq      :      24.025 micros/op 41623 ops/sec;  407.1 MB/s
      readseq      :      18.489 micros/op 54086 ops/sec;  529.0 MB/s
      readseq      :      18.693 micros/op 53495 ops/sec;  523.2 MB/s
      readseq      :      23.621 micros/op 42335 ops/sec;  414.1 MB/s
      readseq      :      18.775 micros/op 53262 ops/sec;  521.0 MB/s
      
      ```
      
      ```
      [Everything in Block cache | 10K operands | 10 KB each | 1 operand per key]
      
      [FullMergeV2]
      $ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions
      readseq      :      14.741 micros/op 67837 ops/sec;  663.5 MB/s
      readseq      :       1.029 micros/op 971446 ops/sec; 9501.6 MB/s
      readseq      :       0.974 micros/op 1026229 ops/sec; 10037.4 MB/s
      readseq      :       0.965 micros/op 1036080 ops/sec; 10133.8 MB/s
      readseq      :       0.943 micros/op 1060657 ops/sec; 10374.2 MB/s
      
      [master]
      readseq      :      16.735 micros/op 59755 ops/sec;  584.5 MB/s
      readseq      :       3.029 micros/op 330151 ops/sec; 3229.2 MB/s
      readseq      :       3.136 micros/op 318883 ops/sec; 3119.0 MB/s
      readseq      :       3.065 micros/op 326245 ops/sec; 3191.0 MB/s
      readseq      :       3.014 micros/op 331813 ops/sec; 3245.4 MB/s
      ```
      
      ```
      [Everything in Block cache | 10K operands | 10 KB each | 10 operand per key]
      
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10-operands-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions
      
      [FullMergeV2]
      readseq      :      24.325 micros/op 41109 ops/sec;  402.1 MB/s
      readseq      :       1.470 micros/op 680272 ops/sec; 6653.7 MB/s
      readseq      :       1.231 micros/op 812347 ops/sec; 7945.5 MB/s
      readseq      :       1.091 micros/op 916590 ops/sec; 8965.1 MB/s
      readseq      :       1.109 micros/op 901713 ops/sec; 8819.6 MB/s
      
      [master]
      readseq      :      27.257 micros/op 36687 ops/sec;  358.8 MB/s
      readseq      :       4.443 micros/op 225073 ops/sec; 2201.4 MB/s
      readseq      :       5.830 micros/op 171526 ops/sec; 1677.7 MB/s
      readseq      :       4.173 micros/op 239635 ops/sec; 2343.8 MB/s
      readseq      :       4.150 micros/op 240963 ops/sec; 2356.8 MB/s
      ```
      
      Test Plan: COMPILE_WITH_ASAN=1 make check -j64
      
      Reviewers: yhchiang, andrewkr, sdong
      
      Reviewed By: sdong
      
      Subscribers: lovro, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57075
      68a8e6b8
  22. 08 7月, 2016 1 次提交
  23. 06 7月, 2016 1 次提交
  24. 19 5月, 2016 1 次提交
  25. 16 5月, 2016 1 次提交
    • K
      Added PersistentCache abstraction · a08c8c85
      krad 提交于
      Summary:
      Added a new abstraction to cache page to RocksDB designed for the read
      cache use.
      
      RocksDB current block cache is more of an object cache. For the persistent read cache
      project, what we need is a page cache equivalent. This changes adds a cache
      abstraction to RocksDB to cache pages called PersistentCache. PersistentCache can cache
      uncompressed pages or raw pages (content as in filesystem). The user can
      choose to operate PersistentCache either in  COMPRESSED or UNCOMPRESSED mode.
      
      Blame Rev:
      
      Test Plan: Run unit tests
      
      Reviewers: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55707
      a08c8c85
  26. 10 5月, 2016 2 次提交
    • I
      Fix lite build · d86f9b9c
      Islam AbdelRahman 提交于
      Summary: Fix lite build
      
      Test Plan: run under lite
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57945
      d86f9b9c
    • I
      Add bottommost_compression option · 4b317234
      Islam AbdelRahman 提交于
      Summary:
      Add a new option that can be used to set a specific compression algorithm for bottommost level.
      This option will only affect levels larger than base level.
      
      I have also updated CompactionJobInfo to include the compression algorithm used in compaction
      
      Test Plan:
      added new unittest
      existing unittests
      
      Reviewers: andrewkr, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: lightmark, andrewkr, dhruba, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D57669
      4b317234
  27. 29 4月, 2016 2 次提交
    • A
      Skip PresetCompressionDict test for lite · a06faa63
      Andrew Kryczka 提交于
      Summary:
      This test relies on "rocksdb.num-files-at-levelN" property that isn't
      implemented in rocksdb lite. So we will compile it only for non-lite builds.
      
      Test Plan:
        $ make -j40 check 'OPT=-g -DROCKSDB_LITE'
      
      Reviewers: sdong, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D57387
      a06faa63
    • A
      Fix compression dictionary clang osx error · 0f428c56
      Andrew Kryczka 提交于
      Summary:
      There was one narrowing conversion in D52287 that only showed up with
      clang on osx.
      
      Test Plan:
        $ make clean && USE_CLANG=1 DISABLE_JEMALLOC=1 TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make -j32 check
      
      Reviewers: sdong, lightmark, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D57357
      0f428c56
  28. 28 4月, 2016 1 次提交
    • A
      Shared dictionary compression using reference block · 843d2e31
      Andrew Kryczka 提交于
      Summary:
      This adds a new metablock containing a shared dictionary that is used
      to compress all data blocks in the SST file. The size of the shared dictionary
      is configurable in CompressionOptions and defaults to 0. It's currently only
      used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
      the compression type if the user chooses a nonzero dictionary size.
      
      During compaction, computes the dictionary by randomly sampling the first
      output file in each subcompaction. It pre-computes the intervals to sample
      by assuming the output file will have the maximum allowable length. In case
      the file is smaller, some of the pre-computed sampling intervals can be beyond
      end-of-file, in which case we skip over those samples and the dictionary will
      be a bit smaller. After the dictionary is generated using the first file in a
      subcompaction, it is loaded into the compression library before writing each
      block in each subsequent file of that subcompaction.
      
      On the read path, gets the dictionary from the metablock, if it exists. Then,
      loads that dictionary into the compression library before reading each block.
      
      Test Plan: new unit test
      
      Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D52287
      843d2e31
  29. 19 4月, 2016 1 次提交
    • Y
      Split db_test.cc · 792762c4
      Yi Wu 提交于
      Summary: Split db_test.cc into several files. Moving several helper functions into DBTestBase.
      
      Test Plan: make check
      
      Reviewers: sdong, yhchiang, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: dhruba, andrewkr, kradhakrishnan, yhchiang, leveldb, sdong
      
      Differential Revision: https://reviews.facebook.net/D56715
      792762c4
  30. 16 4月, 2016 2 次提交
    • V
      Fixing snapshot 0 assertion · e5c614e1
      Victor Tyutyunov 提交于
      Summary:
      Solution is not to change db sequence number to start from 1 because 0 value is used in multiple other places.
      Fix covers only compact_iterator::findEarliestVisibleSnapshot with updated logic to support snapshot's numbering starting from 0.
      
      Test Plan:
      run:
        make all check
      
      it should pass all tests
      
      Reviewers: IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: lgalanis, mgalushka, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56601
      e5c614e1
    • V
      Fixing snapshot 0 assertion · 1b1adebe
      Victor Tyutyunov 提交于
      Test Plan: TBD
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56601
      
      Cosmetic changes and comment update
      1b1adebe
  31. 14 4月, 2016 1 次提交
    • S
      BlockBasedTable::PrefixMayMatch() to skip index checking if we can't find a filter block. · 535af525
      sdong 提交于
      Summary:
      In the case where we can't find a filter block, there is not much benefit of doing the binary search and see whether the index key has the prefix. With the change, we blindly return true if we can't get the filter.
      It also fixes missing row cases for reverse comparator with full bloom.
      
      Test Plan: Add a test case that used to fail.
      
      Reviewers: yhchiang, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: kradhakrishnan, yiwu, hermanlee4, yoshinorim, leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56697
      535af525
  32. 13 4月, 2016 1 次提交
  33. 02 4月, 2016 1 次提交
    • M
      Adding pin_l0_filter_and_index_blocks_in_cache feature and related fixes. · 9b519875
      Marton Trencseni 提交于
      Summary:
      When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
      What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
      
      Test Plan:
      'export TEST_TMPDIR=/dev/shm/ && DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32' is OK.
      I didn't run the Java tests, I don't have Java set up on my devserver.
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56133
      9b519875
  34. 23 3月, 2016 2 次提交
  35. 19 3月, 2016 1 次提交
    • A
      Add test for Snapshot 0 · fbbb8a61
      agiardullo 提交于
      Summary:
      I ran into this assert when stress testing transactions.  It's pretty easy to repro.
      
      Changing VersionSet::last_sequence_ to start at 1 seems pretty straightforward.  We would just need to change the 4 callers of SetLastSequence(), including recovery code.  I'd make this change myself, but I do not have enough time to test changes to recovery code-paths this week.  But checking in this test case (disabled) for future fixing.
      
      Test Plan: n/a
      
      Reviewers: yhchiang, kradhakrishnan, andrewkr, anthony, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55311
      fbbb8a61