1. 10 1月, 2018 5 次提交
  2. 09 1月, 2018 2 次提交
  3. 06 1月, 2018 4 次提交
  4. 05 1月, 2018 1 次提交
    • M
      Remove assert(s.ok()) from ::DeleteFile · 1c9ada59
      Maysam Yabandeh 提交于
      Summary:
      DestroyDB that is used in tests loops over the files returned by ::GetChildren and delete them one by one. Such files might be already deleted in the file system (during DeleteObsoleteFileImpl for example) but will get actually deleted with a delay sometimes before ::DeleteFile is called on the file name. We have some test failures where FaultInjectionTestEnv::DeleteFile fails on assert(s.ok()) during DestroyDB. This patch removes the assert statement to fix that.
      Closes https://github.com/facebook/rocksdb/pull/3324
      
      Differential Revision: D6659545
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 4c9552fbcd494dcf3e61d475c11fc965c4388b2c
      1c9ada59
  5. 04 1月, 2018 2 次提交
  6. 03 1月, 2018 1 次提交
    • S
      Speed up BlockTest.BlockReadAmpBitmap · ccc095a0
      Siying Dong 提交于
      Summary:
      BlockTest.BlockReadAmpBitmap is too slow and times out in some environments. Speed it up by:
      (1) improve the way the verification is done. With this it is 5 times faster
      (2) run fewer tests for large blocks. This cut it down by another 10 times.
      Now it can finish in similar time as other tests.
      Closes https://github.com/facebook/rocksdb/pull/3313
      
      Differential Revision: D6643711
      
      Pulled By: siying
      
      fbshipit-source-id: c2397d666eab5421a78ca87e1e45491e0f832a6d
      ccc095a0
  7. 22 12月, 2017 1 次提交
    • B
      Disable onboard cache for compaction output · b5c99cc9
      burtonli 提交于
      Summary:
      FILE_FLAG_WRITE_THROUGH is for disabling device on-board cache in windows API, which should be disabled if user doesn't need system cache.
      There was a perf issue related with this, we found during memtable flush, the high percentile latency jumps significantly. During profiling, we found those high latency (P99.9) read requests got queue-jumped by write requests from memtable flush and takes 80ms or even more time to wait, even when SSD overall IO throughput is relatively low.
      
      After enabling FILE_FLAG_WRITE_THROUGH, we rerun the test found high percentile latency drops a lot without observable impact on writes.
      
      Scenario 1: 40MB/s + 40MB/s  R/W compaction throughput
      
       Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
      ---------------------------------------------------------------
      P99.9 | 56.897 ms | 35.593 ms | -37.4%
      P99 | 3.905 ms | 3.896 ms | -2.8%
      
      Scenario 2:  14MB/s + 14MB/s R/W compaction throughput, cohosted with 100+ other rocksdb instances have manually triggered memtable flush operations (memtable is tiny), creating a lot of randomized the small file writes operations during test.
      
      Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
      ---------------------------------------------------------------
      P99.9 | 86.227   ms | 50.436 ms | -41.5%
      P99 | 8.415   ms | 3.356 ms | -60.1%
      Closes https://github.com/facebook/rocksdb/pull/3225
      
      Differential Revision: D6624174
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 321b86aee9d74470840c70e5d0d4fa9880660a91
      b5c99cc9
  8. 21 12月, 2017 4 次提交
    • A
      fix ForwardIterator reference to temporary object · f00e176c
      Andrew Kryczka 提交于
      Summary:
      Fixes the following ASAN error:
      
      ```
      ==2108042==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fc50ae9b868 at pc 0x7fc5112aff55 bp 0x7fff9eb9dc10 sp 0x7fff9eb9dc08
      === How to use this, how to get the raw stack trace, and more: fburl.com/ASAN ===
      READ of size 8 at 0x7fc50ae9b868 thread T0
      SCARINESS: 23 (8-byte-read-stack-use-after-scope)
           #0 rocksdb/dbformat.h:164                   rocksdb::InternalKeyComparator::user_comparator() const
           #1 librocksdb_src_rocksdb_lib.so+0x1429a7d  rocksdb::RangeDelAggregator::InitRep(std::vector<...> const&)
           #2 librocksdb_src_rocksdb_lib.so+0x142ceae  rocksdb::RangeDelAggregator::AddTombstones(std::unique_ptr<...>)
           #3 librocksdb_src_rocksdb_lib.so+0x1382d88  rocksdb::ForwardIterator::RebuildIterators(bool)
           #4 librocksdb_src_rocksdb_lib.so+0x1382362  rocksdb::ForwardIterator::ForwardIterator(rocksdb::DBImpl*, rocksdb::ReadOptions const&, rocksdb::ColumnFamilyData*, rocksdb::SuperVersion*)
           #5 librocksdb_src_rocksdb_lib.so+0x11f433f  rocksdb::DBImpl::NewIterator(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*)
           #6 rocksdb/src/include/rocksdb/db.h:382     rocksdb::DB::NewIterator(rocksdb::ReadOptions const&)
           #7 rocksdb/db_range_del_test.cc:807         rocksdb::DBRangeDelTest_TailingIteratorRangeTombstoneUnsupported_Test::TestBody()
          #18 rocksdb/db_range_del_test.cc:1006        main
      
      Address 0x7fc50ae9b868 is located in stack of thread T0 at offset 104 in frame
           #0 librocksdb_src_rocksdb_lib.so+0x13825af  rocksdb::ForwardIterator::RebuildIterators(bool)
      ```
      Closes https://github.com/facebook/rocksdb/pull/3300
      
      Differential Revision: D6612989
      
      Pulled By: ajkr
      
      fbshipit-source-id: e7ea2ed914c1b80a8a29d71d92440a6bd9cbcc80
      f00e176c
    • M
      Blog post for WritePrepared Txn · 02a2c117
      Maysam Yabandeh 提交于
      Summary:
      Blog post to introduce the next generation of transaction engine at RocksDB.
      Closes https://github.com/facebook/rocksdb/pull/3296
      
      Differential Revision: D6612932
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 5bfa91ce84e937f5e4346bbda5a4725d0a7fd131
      02a2c117
    • M
      Disable need_log_sync on bg err · 0ef3fdd7
      Maysam Yabandeh 提交于
      Summary:
      When there is a background error PreprocessWrite returns without marking the logs synced. If we keep need_log_sync to true, it would try to sync them at the end, which would break the logic. The patch would unset need_log_sync if the logs end up not being marked for sync in PreprocessWrite.
      Closes https://github.com/facebook/rocksdb/pull/3293
      
      Differential Revision: D6602347
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 37ee04209e8dcfd78de891654ce50d0954abeb38
      0ef3fdd7
    • W
      FIXED: string buffers potentially too small to fit formatted write · 58b841b3
      Wouter Beek 提交于
      Summary:
      This fixes the following warnings when compiled with GCC7:
      
      util/transaction_test_util.cc: In static member function ‘static rocksdb::Status rocksdb::RandomTransactionInserter::DBGet(rocksdb::DB*, rocksdb::Transaction*, rocksdb::ReadOptions&, uint16_t, uint64_t, bool, uint64_t*, std::__cxx11::string*, bool*)’:
      util/transaction_test_util.cc:75:8: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=]
       Status RandomTransactionInserter::DBGet(
              ^~~~~~~~~~~~~~~~~~~~~~~~~
      util/transaction_test_util.cc:84:11: note: ‘snprintf’ output between 5 and 6 bytes into a destination of size 5
         snprintf(prefix_buf, sizeof(prefix_buf), "%.4u", set_i + 1);
         ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      util/transaction_test_util.cc: In static member function ‘static rocksdb::Status rocksdb::RandomTransactionInserter::Verify(rocksdb::DB*, uint16_t, uint64_t, bool, rocksdb::Random64*)’:
      util/transaction_test_util.cc:245:8: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=]
       Status RandomTransactionInserter::Verify(DB* db, uint16_t num_sets,
              ^~~~~~~~~~~~~~~~~~~~~~~~~
      util/transaction_test_util.cc:268:13: note: ‘snprintf’ output between 5 and 6 bytes into a destination of size 5
           snprintf(prefix_buf, sizeof(prefix_buf), "%.4u", set_i + 1);
           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Closes https://github.com/facebook/rocksdb/pull/3295
      
      Differential Revision: D6609411
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 33f0add471056eb59db2f8bd4366e6dfbb1a187d
      58b841b3
  9. 20 12月, 2017 3 次提交
    • Y
      Port 3 way SSE4.2 crc32c implementation from Folly · f54d7f5f
      yingsu00 提交于
      Summary:
      **# Summary**
      
      RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain.
      
      This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used.
      
      **# Performance Test Results**
      We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm.
      
      Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s.
      
      1) ReadRandom in db_bench overall metrics
      
          PER RUN
          Algorithm | run | micros/op | ops/sec |Throughput (MB/s)
          3-way      |  1   | 4.143   | 241387 | 26.7
          3-way      |  2   | 3.775   | 264872 | 29.3
          3-way      | 3    | 4.116   | 242929 | 26.9
          FastCrc32c|1  | 4.037   | 247727 | 27.4
          FastCrc32c|2  | 4.648   | 215166 | 23.8
          FastCrc32c|3  | 4.352   | 229799 | 25.4
      
           AVG
          Algorithm     |    Average of micros/op |   Average of ops/sec |    Average of Throughput (MB/s)
          3-way           |     4.01                               |      249,729                 |      27.63
          FastCrc32c  |     4.35                              |     230,897                  |      25.53
      
       2)   Crc32c computation CPU cost (inclusive samples percentage)
          PER RUN
          Implementation | run |  TotalSamples   | Crc32c percentage
          3-way                 |  1    |  4,572,250,000 | 4.37%
          3-way                 |  2    |  3,779,250,000 | 4.62%
          3-way                 |  3    |  4,129,500,000 | 4.48%
          FastCrc32c       |  1    |  4,663,500,000 | 11.24%
          FastCrc32c       |  2    |  4,047,500,000 | 12.34%
          FastCrc32c       |  3    |  4,366,750,000 | 11.68%
      
       **# Test Plan**
           make -j64 corruption_test && ./corruption_test
            By default it uses 3-way SSE algorithm
      
           NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test
      
          make clean && DEBUG_LEVEL=0 make -j64 db_bench
          make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench
      Closes https://github.com/facebook/rocksdb/pull/3173
      
      Differential Revision: D6330882
      
      Pulled By: yingsu00
      
      fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9
      f54d7f5f
    • Y
      BlobDB: dump blob db options on open · e763e1b6
      Yi Wu 提交于
      Summary:
      We dump blob db options on blob db open, but it was removed by mistake in #3246. Adding it back.
      Closes https://github.com/facebook/rocksdb/pull/3298
      
      Differential Revision: D6607177
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 2a4aacbfa52fd8f1878dc9e1fbb95fe48faf80c0
      e763e1b6
    • Y
      BlobDB: update blob_db_options.bytes_per_sync behavior · 48cf8da2
      Yi Wu 提交于
      Summary:
      Previously, if blob_db_options.bytes_per_sync, there is a background job to call fsync() for every bytes_per_sync bytes written to a blob file. With the change we simply pass bytes_per_sync as env_options_ to blob files so that sync_file_range() will be used instead.
      Closes https://github.com/facebook/rocksdb/pull/3297
      
      Differential Revision: D6606994
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 452424be52e32ba92f5ea603b564e9b88929af47
      48cf8da2
  10. 19 12月, 2017 7 次提交
    • Y
      WritePrepared Txn: Return NotSupported on iterator refresh · 06149429
      Yi Wu 提交于
      Summary:
      A proper implementation of Iterator::Refresh() for WritePreparedTxnDB would require release and acquire another snapshot. Since MyRocks don't make use of Iterator::Refresh(), we just simply mark it as not supported.
      Closes https://github.com/facebook/rocksdb/pull/3290
      
      Differential Revision: D6599931
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 4e1632d967316431424f6e458254ecf9a97567cf
      06149429
    • A
      blog post for auto-tuned rate limiter · 1563801b
      Andrew Kryczka 提交于
      Summary:
      Wrote the blog post.
      Closes https://github.com/facebook/rocksdb/pull/3289
      
      Differential Revision: D6599031
      
      Pulled By: ajkr
      
      fbshipit-source-id: 77ee553196f225f20c56112d2c015b6fa14f1b83
      1563801b
    • Y
      Remove incorrect comment · 2190e967
      Yi Wu 提交于
      Summary:
      We actually create individual compaction filter from compaction filter factory per sub-compaction in `CompactionJob::ProcessKeyValueCompaction`: https://github.com/facebook/rocksdb/blob/master/db/compaction_job.cc#L742
      The comment seems incorrect.
      Closes https://github.com/facebook/rocksdb/pull/3288
      
      Differential Revision: D6598455
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: a6bc059a9103b87a73ae6ec4bb01ca33f5d48cf5
      2190e967
    • M
      WritePrepared Txn: make buck tests parallel · 0faa026d
      Maysam Yabandeh 提交于
      Summary:
      The TSAN version of tests could take quite long. Make the buck tests parallel to avoid timeouts.
      Closes https://github.com/facebook/rocksdb/pull/3280
      
      Differential Revision: D6581594
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 3f8476d8c69f0183e394fa8a2089dd8d4e90c90c
      0faa026d
    • M
      fix release order in validateNumberOfEntries · 78c2eedb
      Maysam Yabandeh 提交于
      Summary:
      ScopedArenaIterator should be defined after range_del_agg so that it destructs the assigned iterator, which depends on range_del_agg, before it range_del_agg is already destructed.
      Closes https://github.com/facebook/rocksdb/pull/3281
      
      Differential Revision: D6592332
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 89a15d8ed13d0fc856b0c47dce3d91778738dbac
      78c2eedb
    • G
      Fix build for linux · aa6509d8
      Guo Xiao 提交于
      Summary:
      * Include `unistd.h` for `sleep(3)`
      * Include `sys/time.h` for `gettimeofday(3)`
      * Include `utils/random.h` for `Random64`
      
      Error messages:
      
      utilities/persistent_cache/hash_table_bench.cc: In constructor ‘rocksdb::HashTableBenchmark::HashTableBenchmark(rocksdb::HashTableImpl<long unsigned int, std::__cxx11::basic_string<char> >*, size_t, size_t, size_t, size_t)’:
      utilities/persistent_cache/hash_table_bench.cc:76:28: error: ‘sleep’ was not declared in this scope
             /* sleep override */ sleep(1);
                                  ^~~~~
      utilities/persistent_cache/hash_table_bench.cc:76:28: note: suggested alternative: ‘strsep’
             /* sleep override */ sleep(1);
                                  ^~~~~
                                  strsep
      utilities/persistent_cache/hash_table_bench.cc: In member function ‘void rocksdb::HashTableBenchmark::RunRead()’:
      utilities/persistent_cache/hash_table_bench.cc:107:5: error: ‘Random64’ was not declared in this scope
           Random64 rgen(time(nullptr));
           ^~~~~~~~
      utilities/persistent_cache/hash_table_bench.cc:107:5: note: suggested alternative: ‘random_r’
           Random64 rgen(time(nullptr));
           ^~~~~~~~
           random_r
      utilities/persistent_cache/hash_table_bench.cc:110:18: error: ‘rgen’ was not declared in this scope
             size_t k = rgen.Next() % max_prepop_key;
                        ^~~~
      utilities/persistent_cache/hash_table_bench.cc: In static member function ‘static uint64_t rocksdb::HashTableBenchmark::NowInMillSec()’:
      utilities/persistent_cache/hash_table_bench.cc:153:5: error: ‘gettimeofday’ was not declared in this scope
           gettimeofday(&tv, /*tz=*/nullptr);
           ^~~~~~~~~~~~
      make[2]: *** [CMakeFiles/hash_table_bench.dir/build.make:63: CMakeFiles/hash_table_bench.dir/utilities/persistent_cache/hash_table_bench.cc.o] Error 1
      make[1]: *** [CMakeFiles/Makefile2:3346: CMakeFiles/hash_table_bench.dir/all] Error 2
      make[1]: *** Waiting for unfinished jobs....
      Closes https://github.com/facebook/rocksdb/pull/3283
      
      Differential Revision: D6594850
      
      Pulled By: ajkr
      
      fbshipit-source-id: fd83957338c210cdfd253763347aafd39476824f
      aa6509d8
    • M
      WritePrepared Txn: non-2pc write in one round · a6d3c762
      Maysam Yabandeh 提交于
      Summary:
      Currently non-2pc writes do the 2nd dummy write to actually commit the transaction. This was necessary to ensure that publishing the commit sequence number will be done only from one queue (the queue that does not write to memtable). This is however not necessary when we have only one write queue, which is actually the setup that would be used by non-2pc writes. This patch eliminates the 2nd write when two_write_queues are disabled by updating the commit map in the 1st write.
      Closes https://github.com/facebook/rocksdb/pull/3277
      
      Differential Revision: D6575392
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 8ab458f7ca506905962f9166026b2ec81e749c46
      a6d3c762
  11. 16 12月, 2017 4 次提交
    • A
      Add a histogram stat for memtable flush · fccc12f3
      Anand Ananthabhotla 提交于
      Summary:
      Add a new histogram stat called rocksdb.db.flush.micros for memtable
      flush
      Closes https://github.com/facebook/rocksdb/pull/3269
      
      Differential Revision: D6559496
      
      Pulled By: anand1976
      
      fbshipit-source-id: f5c771ba2568630458751795e8c37a493ff9b14d
      fccc12f3
    • M
      db_stress: skip snapshot check if cf is dropped · 95583e15
      Maysam Yabandeh 提交于
      Summary:
      We added a new verification that ensures a value that snapshot reads when is released is the same as when it was created. This test however fails when the cf is dropped in between. The patch skips the tests if that was the case.
      Closes https://github.com/facebook/rocksdb/pull/3279
      
      Differential Revision: D6581584
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: afe37d371c0f91818d2e279b3949b810e112e8eb
      95583e15
    • Y
      BlobDB: Remove the need to get sequence number per write · 237b2925
      Yi Wu 提交于
      Summary:
      Previously we store sequence number range of each blob files, and use the sequence number range to check if the file can be possibly visible by a snapshot. But it adds complexity to the code, since the sequence number is only available after a write. (The current implementation get sequence number by calling GetLatestSequenceNumber(), which is wrong.) With the patch, we are not storing sequence number range, and check if snapshot_sequence < obsolete_sequence to decide if the file is visible by a snapshot (previously we check if first_sequence <= snapshot_sequence < obsolete_sequence).
      Closes https://github.com/facebook/rocksdb/pull/3274
      
      Differential Revision: D6571497
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: ca06479dc1fcd8782f6525b62b7762cd47d61909
      237b2925
    • A
      fix backup meta-file buffer overrun · a79c7c05
      Andrew Kryczka 提交于
      Summary:
      - check most times after calling snprintf that the buffer didn't fill up. Previously we'd proceed and use `buf_size - len` as the length in subsequent calls, which underflowed as those are unsigned size_t.
      - replace some memcpys with snprintf for consistency
      Closes https://github.com/facebook/rocksdb/pull/3255
      
      Differential Revision: D6541464
      
      Pulled By: ajkr
      
      fbshipit-source-id: 8610ea6a24f38e0a37c6d17bc65b7c712da6d932
      a79c7c05
  12. 15 12月, 2017 4 次提交
  13. 14 12月, 2017 1 次提交
    • M
      WritePrepared Txn: make db_stress transactional · cd2e5cae
      Maysam Yabandeh 提交于
      Summary:
      Add "--use_txn" option to use transactional API in db_stress, default being WRITE_PREPARED policy, which is the main intention of modifying db_stress. It also extend the existing snapshots to verify that before releasing a snapshot a read from it returns the same value as before.
      Closes https://github.com/facebook/rocksdb/pull/3243
      
      Differential Revision: D6556912
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 1ae31465be362d44bd06e635e2e9e49a1da11268
      cd2e5cae
  14. 13 12月, 2017 1 次提交
    • M
      disableWAL with WriteImplWALOnly · 546a6327
      Maysam Yabandeh 提交于
      Summary:
      Currently WriteImplWALOnly simply returns when disableWAL is set. This is an incorrect behavior since it does not allocated the sequence number, which is a side-effect of writing to the WAL. This patch fixes the issue.
      Closes https://github.com/facebook/rocksdb/pull/3262
      
      Differential Revision: D6550974
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 745a83ae8f04e7ca6c8ffb247d6ef16c287c52e7
      546a6327