1. 13 1月, 2018 4 次提交
  2. 12 1月, 2018 7 次提交
    • A
      fix Gemfile.lock nokogiri dependencies · 6d7e3b9f
      Andrew Kryczka 提交于
      Summary:
      I installed the ruby dependencies and ran `bundle update nokogiri`. It depends on a newer version of "mini_portile2" which I missed in 9c2f64e1. Now `bundle install` works again.
      Closes https://github.com/facebook/rocksdb/pull/3361
      
      Differential Revision: D6710164
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9a08d6cc6400ef495b715b3d68b04ce3f3367031
      6d7e3b9f
    • P
      Consider an increase to buffer size when reading option file, from 4K to 8K. · 45828c72
      Peter (Stig) Edwards 提交于
      Summary:
      Hello and thank you for RocksDB,
      
      While looking into the buffered io used when an `OPTIONS` file is read I noticed the `OPTIONS` files produced by RocksDB 5.8.8 (and head of master) were just over 4096 bytes in size, resulting in the version of glibc I am using (glibc-2.17-196.el7) (on the filesystem used) being passed a 4K buffer for the `fread_unlocked` call and 2 system call reads using a 4096 buffer being used to read the contents of the `OPTIONS` file.
      
        If the buffer size is increased to 8192 then 1 system call read is used to read the contents.
      
        As I think the buffer size is just used for reading `OPTIONS` files, and I thought it likely that `OPTIONS` files have increased in size (as more options are added), I thought I would suggest an increase.
      
      [  If the comments from the top of the `OPTIONS` file are removed, and white space from the start of lines is removed then the size can be reduced to be under 4K, but as more options are added the size seems likely to grow again. ]
      
      Create a new database:
      
      ```
      > ./ldb --create_if_missing --db=/tmp/rdb_tmp put 1 1
      OK
      ```
      
      The OPTIONS file is 4252 bytes:
      
      ```
      > stat /tmp/rdb_tmp/OPTIONS* | head -n 2
        File: ‘/tmp/rdb_tmp/OPTIONS-000005’
        Size: 4252            Blocks: 16         IO Block: 4096   regular file
      ```
      
      Before, the 4096 byte buffer is used from 2 system read calls:
      
      ```
      > strace -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 |
          grep -A 1 'RocksDB option file'
      read(3, "# This is a RocksDB option file."..., 4096) = 4096
      read(3, "e\n  metadata_block_size=4096\n  c"..., 4096) = 156
      ```
      
      ltrace shows 4096 passed to fread_unlocked
      
      ```
      > ltrace -S -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 |
          grep -C 3 'RocksDB option file'
      [pid 51013] fread_unlocked(0x7ffd5fbf2d50, 1, 4096, 0x7fd2e084e780 <unfinished ...>
      [pid 51013] fstat@SYS(3, 0x7ffd5fbf28f0)         = 0
      [pid 51013] mmap@SYS(nil, 4096, 3, 34, -1, 0)    = 0x7fd2e318c000
      [pid 51013] read@SYS(3, "# This is a RocksDB option file."..., 4096) = 4096
      [pid 51013] <... fread_unlocked resumed> )       = 4096
      ...
      ```
      
      After, the 8192 byte buffer is used from 1 system read call:
      
      ```
      > strace -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 | grep -A 1 'RocksDB option file'
      read(3, "# This is a RocksDB option file."..., 8192) = 4252
      read(3, "", 4096)                       = 0
      ```
      
      ltrace shows 8192 passed to fread_unlocked
      
      ```
      > ltrace -S -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 | grep -C 3 'RocksDB option file'
      [pid 146611] fread_unlocked(0x7ffcfba382f0, 1, 8192, 0x7fc4e844e780 <unfinished ...>
      [pid 146611] fstat@SYS(3, 0x7ffcfba380f0)        = 0
      [pid 146611] mmap@SYS(nil, 4096, 3, 34, -1, 0)   = 0x7fc4eaee0000
      [pid 146611] read@SYS(3, "# This is a RocksDB option file."..., 8192) = 4252
      [pid 146611] read@SYS(3, "", 4096)               = 0
      [pid 146611] <... fread_unlocked resumed> )      = 4252
      [pid 146611] feof(0x7fc4e844e780)                = 1
      ```
      Closes https://github.com/facebook/rocksdb/pull/3294
      
      Differential Revision: D6653684
      
      Pulled By: ajkr
      
      fbshipit-source-id: 222f25f5442fefe1dcec18c700bd9e235bb63491
      45828c72
    • C
      Fix memleak when DB::DeleteFile() · 0a7ba0e5
      Changli Gao 提交于
      Summary:
      Because the corresponding read_first_record_cache_ item wasn't
      erased, memory leaked.
      Closes https://github.com/facebook/rocksdb/pull/1712
      
      Differential Revision: D4363654
      
      Pulled By: ajkr
      
      fbshipit-source-id: 7da1adcfc8c380e4ffe05b8769fc2221ad17a225
      0a7ba0e5
    • A
      Update Gemfile.lock · 9c2f64e1
      Andrew Kryczka 提交于
      Summary:
      bump nokogiri number
      Closes https://github.com/facebook/rocksdb/pull/3358
      
      Differential Revision: D6708596
      
      Pulled By: ajkr
      
      fbshipit-source-id: 6662c3ba4994374ecf8a13928e915b655a980b70
      9c2f64e1
    • B
      add WriteBatch::WriteBatch(std::string&&) · 204af1ec
      Bo Liu 提交于
      Summary:
      to save a string copy for some use cases.
      
      The change is pretty straightforward, please feel free to let me know if you want to suggest any tests for it.
      Closes https://github.com/facebook/rocksdb/pull/3349
      
      Differential Revision: D6706828
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 873ce4442937bdc030b395c7f99228eda7f59eb7
      204af1ec
    • A
      Add Jenkins for PPC64le build status badge · d4da02d1
      Adam Retter 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3356
      
      Differential Revision: D6706909
      
      Pulled By: sagar0
      
      fbshipit-source-id: 6e4757d9eceab3e8a6c1b83c1be4108e86576cb2
      d4da02d1
    • A
      FreeBSD build support for RocksDB and RocksJava · a53c571d
      Adam Retter 提交于
      Summary:
      Tested on a clean FreeBSD 11.01 x64.
      
      Closes https://github.com/facebook/rocksdb/pull/1423
      Closes https://github.com/facebook/rocksdb/pull/3357
      
      Differential Revision: D6705868
      
      Pulled By: sagar0
      
      fbshipit-source-id: cbccbbdafd4f42922512ca03619a5d5583a425fd
      a53c571d
  3. 11 1月, 2018 4 次提交
  4. 10 1月, 2018 6 次提交
  5. 09 1月, 2018 2 次提交
  6. 06 1月, 2018 4 次提交
  7. 05 1月, 2018 1 次提交
    • M
      Remove assert(s.ok()) from ::DeleteFile · 1c9ada59
      Maysam Yabandeh 提交于
      Summary:
      DestroyDB that is used in tests loops over the files returned by ::GetChildren and delete them one by one. Such files might be already deleted in the file system (during DeleteObsoleteFileImpl for example) but will get actually deleted with a delay sometimes before ::DeleteFile is called on the file name. We have some test failures where FaultInjectionTestEnv::DeleteFile fails on assert(s.ok()) during DestroyDB. This patch removes the assert statement to fix that.
      Closes https://github.com/facebook/rocksdb/pull/3324
      
      Differential Revision: D6659545
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 4c9552fbcd494dcf3e61d475c11fc965c4388b2c
      1c9ada59
  8. 04 1月, 2018 2 次提交
  9. 03 1月, 2018 1 次提交
    • S
      Speed up BlockTest.BlockReadAmpBitmap · ccc095a0
      Siying Dong 提交于
      Summary:
      BlockTest.BlockReadAmpBitmap is too slow and times out in some environments. Speed it up by:
      (1) improve the way the verification is done. With this it is 5 times faster
      (2) run fewer tests for large blocks. This cut it down by another 10 times.
      Now it can finish in similar time as other tests.
      Closes https://github.com/facebook/rocksdb/pull/3313
      
      Differential Revision: D6643711
      
      Pulled By: siying
      
      fbshipit-source-id: c2397d666eab5421a78ca87e1e45491e0f832a6d
      ccc095a0
  10. 22 12月, 2017 1 次提交
    • B
      Disable onboard cache for compaction output · b5c99cc9
      burtonli 提交于
      Summary:
      FILE_FLAG_WRITE_THROUGH is for disabling device on-board cache in windows API, which should be disabled if user doesn't need system cache.
      There was a perf issue related with this, we found during memtable flush, the high percentile latency jumps significantly. During profiling, we found those high latency (P99.9) read requests got queue-jumped by write requests from memtable flush and takes 80ms or even more time to wait, even when SSD overall IO throughput is relatively low.
      
      After enabling FILE_FLAG_WRITE_THROUGH, we rerun the test found high percentile latency drops a lot without observable impact on writes.
      
      Scenario 1: 40MB/s + 40MB/s  R/W compaction throughput
      
       Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
      ---------------------------------------------------------------
      P99.9 | 56.897 ms | 35.593 ms | -37.4%
      P99 | 3.905 ms | 3.896 ms | -2.8%
      
      Scenario 2:  14MB/s + 14MB/s R/W compaction throughput, cohosted with 100+ other rocksdb instances have manually triggered memtable flush operations (memtable is tiny), creating a lot of randomized the small file writes operations during test.
      
      Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
      ---------------------------------------------------------------
      P99.9 | 86.227   ms | 50.436 ms | -41.5%
      P99 | 8.415   ms | 3.356 ms | -60.1%
      Closes https://github.com/facebook/rocksdb/pull/3225
      
      Differential Revision: D6624174
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 321b86aee9d74470840c70e5d0d4fa9880660a91
      b5c99cc9
  11. 21 12月, 2017 4 次提交
    • A
      fix ForwardIterator reference to temporary object · f00e176c
      Andrew Kryczka 提交于
      Summary:
      Fixes the following ASAN error:
      
      ```
      ==2108042==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fc50ae9b868 at pc 0x7fc5112aff55 bp 0x7fff9eb9dc10 sp 0x7fff9eb9dc08
      === How to use this, how to get the raw stack trace, and more: fburl.com/ASAN ===
      READ of size 8 at 0x7fc50ae9b868 thread T0
      SCARINESS: 23 (8-byte-read-stack-use-after-scope)
           #0 rocksdb/dbformat.h:164                   rocksdb::InternalKeyComparator::user_comparator() const
           #1 librocksdb_src_rocksdb_lib.so+0x1429a7d  rocksdb::RangeDelAggregator::InitRep(std::vector<...> const&)
           #2 librocksdb_src_rocksdb_lib.so+0x142ceae  rocksdb::RangeDelAggregator::AddTombstones(std::unique_ptr<...>)
           #3 librocksdb_src_rocksdb_lib.so+0x1382d88  rocksdb::ForwardIterator::RebuildIterators(bool)
           #4 librocksdb_src_rocksdb_lib.so+0x1382362  rocksdb::ForwardIterator::ForwardIterator(rocksdb::DBImpl*, rocksdb::ReadOptions const&, rocksdb::ColumnFamilyData*, rocksdb::SuperVersion*)
           #5 librocksdb_src_rocksdb_lib.so+0x11f433f  rocksdb::DBImpl::NewIterator(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*)
           #6 rocksdb/src/include/rocksdb/db.h:382     rocksdb::DB::NewIterator(rocksdb::ReadOptions const&)
           #7 rocksdb/db_range_del_test.cc:807         rocksdb::DBRangeDelTest_TailingIteratorRangeTombstoneUnsupported_Test::TestBody()
          #18 rocksdb/db_range_del_test.cc:1006        main
      
      Address 0x7fc50ae9b868 is located in stack of thread T0 at offset 104 in frame
           #0 librocksdb_src_rocksdb_lib.so+0x13825af  rocksdb::ForwardIterator::RebuildIterators(bool)
      ```
      Closes https://github.com/facebook/rocksdb/pull/3300
      
      Differential Revision: D6612989
      
      Pulled By: ajkr
      
      fbshipit-source-id: e7ea2ed914c1b80a8a29d71d92440a6bd9cbcc80
      f00e176c
    • M
      Blog post for WritePrepared Txn · 02a2c117
      Maysam Yabandeh 提交于
      Summary:
      Blog post to introduce the next generation of transaction engine at RocksDB.
      Closes https://github.com/facebook/rocksdb/pull/3296
      
      Differential Revision: D6612932
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 5bfa91ce84e937f5e4346bbda5a4725d0a7fd131
      02a2c117
    • M
      Disable need_log_sync on bg err · 0ef3fdd7
      Maysam Yabandeh 提交于
      Summary:
      When there is a background error PreprocessWrite returns without marking the logs synced. If we keep need_log_sync to true, it would try to sync them at the end, which would break the logic. The patch would unset need_log_sync if the logs end up not being marked for sync in PreprocessWrite.
      Closes https://github.com/facebook/rocksdb/pull/3293
      
      Differential Revision: D6602347
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 37ee04209e8dcfd78de891654ce50d0954abeb38
      0ef3fdd7
    • W
      FIXED: string buffers potentially too small to fit formatted write · 58b841b3
      Wouter Beek 提交于
      Summary:
      This fixes the following warnings when compiled with GCC7:
      
      util/transaction_test_util.cc: In static member function ‘static rocksdb::Status rocksdb::RandomTransactionInserter::DBGet(rocksdb::DB*, rocksdb::Transaction*, rocksdb::ReadOptions&, uint16_t, uint64_t, bool, uint64_t*, std::__cxx11::string*, bool*)’:
      util/transaction_test_util.cc:75:8: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=]
       Status RandomTransactionInserter::DBGet(
              ^~~~~~~~~~~~~~~~~~~~~~~~~
      util/transaction_test_util.cc:84:11: note: ‘snprintf’ output between 5 and 6 bytes into a destination of size 5
         snprintf(prefix_buf, sizeof(prefix_buf), "%.4u", set_i + 1);
         ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      util/transaction_test_util.cc: In static member function ‘static rocksdb::Status rocksdb::RandomTransactionInserter::Verify(rocksdb::DB*, uint16_t, uint64_t, bool, rocksdb::Random64*)’:
      util/transaction_test_util.cc:245:8: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=]
       Status RandomTransactionInserter::Verify(DB* db, uint16_t num_sets,
              ^~~~~~~~~~~~~~~~~~~~~~~~~
      util/transaction_test_util.cc:268:13: note: ‘snprintf’ output between 5 and 6 bytes into a destination of size 5
           snprintf(prefix_buf, sizeof(prefix_buf), "%.4u", set_i + 1);
           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Closes https://github.com/facebook/rocksdb/pull/3295
      
      Differential Revision: D6609411
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 33f0add471056eb59db2f8bd4366e6dfbb1a187d
      58b841b3
  12. 20 12月, 2017 3 次提交
    • Y
      Port 3 way SSE4.2 crc32c implementation from Folly · f54d7f5f
      yingsu00 提交于
      Summary:
      **# Summary**
      
      RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain.
      
      This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used.
      
      **# Performance Test Results**
      We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm.
      
      Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s.
      
      1) ReadRandom in db_bench overall metrics
      
          PER RUN
          Algorithm | run | micros/op | ops/sec |Throughput (MB/s)
          3-way      |  1   | 4.143   | 241387 | 26.7
          3-way      |  2   | 3.775   | 264872 | 29.3
          3-way      | 3    | 4.116   | 242929 | 26.9
          FastCrc32c|1  | 4.037   | 247727 | 27.4
          FastCrc32c|2  | 4.648   | 215166 | 23.8
          FastCrc32c|3  | 4.352   | 229799 | 25.4
      
           AVG
          Algorithm     |    Average of micros/op |   Average of ops/sec |    Average of Throughput (MB/s)
          3-way           |     4.01                               |      249,729                 |      27.63
          FastCrc32c  |     4.35                              |     230,897                  |      25.53
      
       2)   Crc32c computation CPU cost (inclusive samples percentage)
          PER RUN
          Implementation | run |  TotalSamples   | Crc32c percentage
          3-way                 |  1    |  4,572,250,000 | 4.37%
          3-way                 |  2    |  3,779,250,000 | 4.62%
          3-way                 |  3    |  4,129,500,000 | 4.48%
          FastCrc32c       |  1    |  4,663,500,000 | 11.24%
          FastCrc32c       |  2    |  4,047,500,000 | 12.34%
          FastCrc32c       |  3    |  4,366,750,000 | 11.68%
      
       **# Test Plan**
           make -j64 corruption_test && ./corruption_test
            By default it uses 3-way SSE algorithm
      
           NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test
      
          make clean && DEBUG_LEVEL=0 make -j64 db_bench
          make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench
      Closes https://github.com/facebook/rocksdb/pull/3173
      
      Differential Revision: D6330882
      
      Pulled By: yingsu00
      
      fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9
      f54d7f5f
    • Y
      BlobDB: dump blob db options on open · e763e1b6
      Yi Wu 提交于
      Summary:
      We dump blob db options on blob db open, but it was removed by mistake in #3246. Adding it back.
      Closes https://github.com/facebook/rocksdb/pull/3298
      
      Differential Revision: D6607177
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 2a4aacbfa52fd8f1878dc9e1fbb95fe48faf80c0
      e763e1b6
    • Y
      BlobDB: update blob_db_options.bytes_per_sync behavior · 48cf8da2
      Yi Wu 提交于
      Summary:
      Previously, if blob_db_options.bytes_per_sync, there is a background job to call fsync() for every bytes_per_sync bytes written to a blob file. With the change we simply pass bytes_per_sync as env_options_ to blob files so that sync_file_range() will be used instead.
      Closes https://github.com/facebook/rocksdb/pull/3297
      
      Differential Revision: D6606994
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 452424be52e32ba92f5ea603b564e9b88929af47
      48cf8da2
  13. 19 12月, 2017 1 次提交