1. 16 4月, 2016 5 次提交
  2. 15 4月, 2016 1 次提交
  3. 14 4月, 2016 2 次提交
    • S
      BlockBasedTable::PrefixMayMatch() to skip index checking if we can't find a filter block. · 535af525
      sdong 提交于
      Summary:
      In the case where we can't find a filter block, there is not much benefit of doing the binary search and see whether the index key has the prefix. With the change, we blindly return true if we can't get the filter.
      It also fixes missing row cases for reverse comparator with full bloom.
      
      Test Plan: Add a test case that used to fail.
      
      Reviewers: yhchiang, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: kradhakrishnan, yiwu, hermanlee4, yoshinorim, leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56697
      535af525
    • I
      Fix ManualCompactionPartial test flakiness · 19ef3de5
      Islam AbdelRahman 提交于
      Summary: The reason for this test flakiness is that we try to verify that number of files in L0 is 3 after flushing the 3rd file although we may have a compaction running in the background that may finish before we do the check and the 3 L0 files are converted to 1 L1 file
      
      Test Plan: Run a modified version of the test that sleep before doing the check
      
      Reviewers: sdong, andrewkr, kradhakrishnan, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56643
      19ef3de5
  4. 13 4月, 2016 1 次提交
  5. 12 4月, 2016 2 次提交
  6. 09 4月, 2016 1 次提交
    • J
      Make sure that if use_mmap_reads is on use_os_buffer is also on · 2448f803
      Jay Edgar 提交于
      Summary: The code assumes that if use_mmap_reads is on then use_os_buffer is also on.  This make sense as by using memory mapped files for reading you are expecting the OS to cache what it needs.  Add code to make sure the user does not turn off use_os_buffer when they turn on use_mmap_reads
      
      Test Plan: New test: DBTest.MMapAndBufferOptions
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56397
      2448f803
  7. 07 4月, 2016 2 次提交
    • A
      Embed column family name in SST file · 2391ef72
      Andrew Kryczka 提交于
      Summary:
      Added the column family name to the properties block. This property
      is omitted only if the property is unavailable, such as when RepairDB()
      writes SST files.
      
      In a next diff, I will change RepairDB to use this new property for
      deciding to which column family an existing SST file belongs. If this
      property is missing, it will add it to the "unknown" column family (same
      as its existing behavior).
      
      Test Plan:
      New unit test:
      
        $ ./db_table_properties_test --gtest_filter=DBTablePropertiesTest.GetColumnFamilyNameProperty
      
      Reviewers: IslamAbdelRahman, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55605
      2391ef72
    • I
      Don't use version in the error message · ab4c6233
      Igor Canadi 提交于
      Summary: We use object `v` in the error message, which is not initialized if the edit is column family manipulation. This doesn't provide much useful info, so this diff is removing it. Instead, it dumps actual VersionEdit contents.
      
      Test Plan: compiles. would be great to get tests in version_set_test.cc that cover cases where a file write fails
      
      Reviewers: sdong, yhchiang, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D56349
      ab4c6233
  8. 02 4月, 2016 4 次提交
    • A
      No need to limit to 20 files in UpdateAccumulatedStats() if options.max_open_files=-1 · cc87075d
      Aaron Gao 提交于
      Summary:
      There is a hardcoded constraint in our statistics collection that prevents reading properties from more than 20 SST files. This means our statistics will be very inaccurate for databases with > 20 files since additional files are just ignored. The purpose of constraining the number of files used is to bound the I/O performed during statistics collection, since these statistics need to be recomputed every time the database reopened.
      
      However, this constraint doesn't take into account the case where option "max_open_files" is -1. In that case, all the file metadata has already been read, so MaybeInitializeFileMetaData() won't incur any I/O cost. so this diff gets rid of the 20-file constraint in case max_open_files == -1.
      
      Test Plan:
      write into unit test db/db_properties_test.cc - "ValidateSampleNumber".
      We generate 20 files with 2 rows and 10 files with 1 row.
      If max_open_files !=-1, the `rocksdb.estimate-num-keys` should be (10*1 + 10*2)/20 * 30 = 45. Otherwise, it should be the ground truth, 50.
      {F1089153}
      
      Reviewers: andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D56253
      cc87075d
    • I
      Eliminate std::deque initialization while iterating over merge operands · 8a1a603f
      Islam AbdelRahman 提交于
      Summary:
      This patch is similar to D52563, When we iterate over a DB with merge operands we keep creating std::queue to store the operands, optimize this by reusing merge_operands_ data member
      
      Before the patch
      
      ```
      ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq" --db="/dev/shm/bench_merge_memcpy_on_the_fly/" --merge_operator="put" --merge_keys=10000 --num=10000
      
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      mergerandom  :       3.757 micros/op 266141 ops/sec;   29.4 MB/s ( updates:10000)
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.413 micros/op 2423538 ops/sec;  268.1 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.451 micros/op 2219071 ops/sec;  245.5 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.420 micros/op 2382039 ops/sec;  263.5 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.408 micros/op 2452017 ops/sec;  271.3 MB/s
      
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      mergerandom  :       3.947 micros/op 253376 ops/sec;   28.0 MB/s ( updates:10000)
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.441 micros/op 2266473 ops/sec;  250.7 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.471 micros/op 2122033 ops/sec;  234.8 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.440 micros/op 2271407 ops/sec;  251.3 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.429 micros/op 2331471 ops/sec;  257.9 MB/s
      ```
      
      with the patch
      
      ```
      ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq" --db="/dev/shm/bench_merge_memcpy_on_the_fly/" --merge_operator="put" --merge_keys=10000 --num=10000
      
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      mergerandom  :       4.080 micros/op 245092 ops/sec;   27.1 MB/s ( updates:10000)
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.308 micros/op 3241843 ops/sec;  358.6 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.312 micros/op 3200408 ops/sec;  354.0 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.332 micros/op 3013962 ops/sec;  333.4 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.300 micros/op 3328017 ops/sec;  368.2 MB/s
      
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      mergerandom  :       3.973 micros/op 251705 ops/sec;   27.8 MB/s ( updates:10000)
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.320 micros/op 3123752 ops/sec;  345.6 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.335 micros/op 2986641 ops/sec;  330.4 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.339 micros/op 2950047 ops/sec;  326.4 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.319 micros/op 3131565 ops/sec;  346.4 MB/s
      ```
      
      Test Plan: make check -j64
      
      Reviewers: yhchiang, andrewkr, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56031
      8a1a603f
    • I
      WriteBatchWithIndex micro optimization · f38540b1
      Islam AbdelRahman 提交于
      Summary:
        - Put key offset and key size in WriteBatchIndexEntry
        - Use vector for comparators in WriteBatchEntryComparator
      
      I use a slightly modified version of @yoshinorim code to benchmark
      https://gist.github.com/IslamAbdelRahman/b120f4fba8d6ff7d58d2
      
      For Put I create a transaction that put a 1000000 keys and measure the time spent without commit.
      For GetForUpdate I read the keys that I added in the Put transaction.
      
      Original time:
      
      ```
       rm -rf /dev/shm/rocksdb-example/
       ./txn_bench put 1000000
       1000000 OK Ops | took      3.679 seconds
       ./txn_bench get_for_update 1000000
       1000000 OK Ops | took      3.940 seconds
      ```
      
      New Time
      
      ```
        rm -rf /dev/shm/rocksdb-example/
       ./txn_bench put 1000000
       1000000 OK Ops | took      2.727 seconds
       ./txn_bench get_for_update 1000000
       1000000 OK Ops | took      3.880 seconds
      ```
      
      It looks like there is no significant improvement in GetForUpdate() but we can see ~30% improvement in Put()
      
      Test Plan: unittests
      
      Reviewers: yhchiang, anthony, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D55539
      f38540b1
    • M
      Adding pin_l0_filter_and_index_blocks_in_cache feature and related fixes. · 9b519875
      Marton Trencseni 提交于
      Summary:
      When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
      What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
      
      Test Plan:
      'export TEST_TMPDIR=/dev/shm/ && DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32' is OK.
      I didn't run the Java tests, I don't have Java set up on my devserver.
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56133
      9b519875
  9. 01 4月, 2016 1 次提交
    • S
      Change some RocksDB default options · 2feafa3d
      sdong 提交于
      Summary: Change some RocksDB default options to make it more friendly to server workloads.
      
      Test Plan: Run all existing tests
      
      Reviewers: yhchiang, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: sumeet, muthu, benj, MarkCallaghan, igor, leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55941
      2feafa3d
  10. 31 3月, 2016 3 次提交
  11. 26 3月, 2016 1 次提交
  12. 25 3月, 2016 2 次提交
    • Y
      Correct a typo in a comment · ad2fdaa8
      Yueh-Hsuan Chiang 提交于
      Summary: Correct a typo in a comment
      
      Test Plan: No code change.
      
      Reviewers: sdong, kradhakrishnan, IslamAbdelRahman
      
      Reviewed By: kradhakrishnan, IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55803
      ad2fdaa8
    • Y
      Fix data race issue when sub-compaction is used in CompactionJob · be9816b3
      Yueh-Hsuan Chiang 提交于
      Summary:
      When subcompaction is used, all subcompactions share the same Compaction
      pointer in CompactionJob while each subcompaction all keeps their mutable
      stats in SubcompactionState.  However, there're still some mutable part
      that is currently store in the shared Compaction pointer.
      
      This patch makes two changes:
      
      1. Make the shared Compaction pointer const so that it can never be modified
         during the compaction.
      2. Move necessary states from Compaction to SubcompactionState.
      3. Make functions of Compaction const if the function does not modify
         its internal state.
      
      Test Plan: rocksdb and MyRocks test
      
      Reviewers: sdong, kradhakrishnan, andrewkr, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, yoshinorim, gunnarku, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55923
      be9816b3
  13. 23 3月, 2016 2 次提交
  14. 22 3月, 2016 1 次提交
  15. 19 3月, 2016 4 次提交
    • A
      Add test for Snapshot 0 · fbbb8a61
      agiardullo 提交于
      Summary:
      I ran into this assert when stress testing transactions.  It's pretty easy to repro.
      
      Changing VersionSet::last_sequence_ to start at 1 seems pretty straightforward.  We would just need to change the 4 callers of SetLastSequence(), including recovery code.  I'd make this change myself, but I do not have enough time to test changes to recovery code-paths this week.  But checking in this test case (disabled) for future fixing.
      
      Test Plan: n/a
      
      Reviewers: yhchiang, kradhakrishnan, andrewkr, anthony, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55311
      fbbb8a61
    • A
      Add unit tests for RepairDB · e182f03c
      Andrew Kryczka 提交于
      Summary:
      Basic test cases:
      
      - Manifest is lost or corrupt
      - Manifest refers to too many or too few SST files
      - SST file is corrupt
      - Unflushed data is present when RepairDB is called
      
      Depends on D55065 for its CreateFile() function in file_utils
      
      Test Plan: Ran the tests.
      
      Reviewers: IslamAbdelRahman, yhchiang, yoshinorim, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55485
      e182f03c
    • P
      travis build fixes · 7d371863
      Praveen Rao 提交于
      7d371863
    • P
  16. 18 3月, 2016 2 次提交
    • M
      Reset block cache in failing unit test. · 44756260
      Marton Trencseni 提交于
      Test Plan: make -j40 check OPT=-g, on both /tmp and /dev/shm
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55701
      44756260
    • M
      Adding pin_l0_filter_and_index_blocks_in_cache feature. · 522de4f5
      Marton Trencseni 提交于
      Summary:
      When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
      What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
      When the table reader is destroyed, it releases the pinned blocks (if there were any). This has to happen before the cache is destroyed, so I had to introduce a TableReader::Close(), to guarantee the order of destruction.
      
      Test Plan:
      Added two unit tests for this. Existing unit tests run fine (default is pin_l0_filter_and_index_blocks_in_cache=false).
      
      DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32
        Mac: OK.
        Linux: with D55287 patched in it's OK.
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D54801
      522de4f5
  17. 17 3月, 2016 1 次提交
  18. 16 3月, 2016 1 次提交
  19. 15 3月, 2016 2 次提交
    • D
      ColumnFamilyOptions SanitizeOptions is buggy on 32-bit platforms. · 1a2cc27e
      Dhruba Borthakur 提交于
      Summary:
      The pre-existing code is trying to clamp between 65,536 and 0,
      resulting in clamping to 65,536, resulting in very small buffers,
      resulting in ShouldFlushNow() being true quite easily,
      resulting in assertion failing and database performance
      being "not what it should be".
      
      https://github.com/facebook/rocksdb/issues/1018
      
      Test Plan: make check
      
      Reviewers: sdong, andrewkr, IslamAbdelRahman, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55455
      1a2cc27e
    • S
      Index Reader should not be reused after DB restart · b2ae5950
      sdong 提交于
      Summary:
      In block based table reader, wow we put index reader to block cache, which can be retrieved after DB restart. However, index reader may reference internal comparator, which can be destroyed after DB restarts, causing problems.
      Fix it by making cache key identical per table reader.
      
      Test Plan: Add a new test which failed with out the commit but now pass.
      
      Reviewers: IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: maro, yhchiang, kradhakrishnan, leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55287
      b2ae5950
  20. 12 3月, 2016 2 次提交
    • I
      Aggregate hot Iterator counters in LocalStatistics (DBIter::Next perf regression) · 580fede3
      Islam AbdelRahman 提交于
      Summary:
      This patch bump the counters in the frequent code path DBIter::Next() / DBIter::Prev() in a local data members and send them to Statistics when the iterator is destroyed
      A better solution will be to have thread_local implementation for Statistics
      
      New performance
      ```
      readseq      :       0.035 micros/op 28597881 ops/sec; 3163.7 MB/s
           1,851,568,819      stalled-cycles-frontend   #   31.29% frontend cycles idle    [49.86%]
             884,929,823      stalled-cycles-backend    #   14.95% backend  cycles idle    [50.21%]
      readreverse  :       0.071 micros/op 14077393 ops/sec; 1557.3 MB/s
           3,239,575,993      stalled-cycles-frontend   #   27.36% frontend cycles idle    [49.96%]
           1,558,253,983      stalled-cycles-backend    #   13.16% backend  cycles idle    [50.14%]
      
      ```
      
      Existing performance
      
      ```
      readreverse  :       0.174 micros/op 5732342 ops/sec;  634.1 MB/s
          20,570,209,389      stalled-cycles-frontend   #   70.71% frontend cycles idle    [50.01%]
          18,422,816,837      stalled-cycles-backend    #   63.33% backend  cycles idle    [50.04%]
      
      readseq      :       0.119 micros/op 8400537 ops/sec;  929.3 MB/s
          15,634,225,844      stalled-cycles-frontend   #   79.07% frontend cycles idle    [49.96%]
          14,227,427,453      stalled-cycles-backend    #   71.95% backend  cycles idle    [50.09%]
      ```
      
      Test Plan: unit tests
      
      Reviewers: yhchiang, sdong, igor
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55107
      580fede3
    • B
      fix: handle_fatal_signal (sig=6) in std::vector<std::string,... · e8e6cf01
      Baris Yazici 提交于
      fix: handle_fatal_signal (sig=6) in std::vector<std::string, std::allocator<std::string> >::_M_range_check | c++/4.8.2/bits/stl_vector.h:794 #174
      
      Summary:
      Fix for https://github.com/facebook/mysql-5.6/issues/174
      
      When there is no old files to purge, vector.at(i) function was crashing
      
      if (old_info_log_file_count != 0 &&
            old_info_log_file_count >= db_options_.keep_log_file_num) {
          std::sort(old_info_log_files.begin(), old_info_log_files.end());
          size_t end = old_info_log_file_count - db_options_.keep_log_file_num;
          for (unsigned int i = 0; i <= end; i++) {
            std::string& to_delete = old_info_log_files.at(i);
      
      Added check to old_info_log_file_count be non zero.
      
      Test Plan: run existing tests
      
      Reviewers: gunnarku, vasilep, sdong, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: andrewkr, webscalesql-eng, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55245
      e8e6cf01