1. 15 12月, 2018 2 次提交
    • M
      Fix flaky test DeleteFileRange (#4784) · 4ed3c1eb
      Maysam Yabandeh 提交于
      Summary:
      The test fails sporadically expecting the DB to be empty after DeleteFilesInRange(..., nullptr, nullptr) call which is not. Debugging shows cases where the files are skipped since they are being compacted. The patch fixes the test by waiting for the last CompactRange to finish before calling DeleteFilesInRange.
      Verified by
      ```
      ~/gtest-parallel/gtest-parallel ./db_compaction_test --gtest_filter=DBCompactionTest.DeleteFileRange --repeat=10000
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4784
      
      Differential Revision: D13469402
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 3d8f44abe205b82c69f01e7edf27e1f8098248e1
      4ed3c1eb
    • A
      Synchronize ticker and histogram metrics for Java API (#4733) · d6dfe516
      Adam Singer 提交于
      Summary:
      Updating the `HistogramType.java` and `TickerType.java` to expose and correct metrics for statistics callbacks.
      
      Moved `NO_ITERATOR_CREATED` to the proper stat name and deprecated `NO_ITERATORS`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4733
      
      Differential Revision: D13466936
      
      Pulled By: sagar0
      
      fbshipit-source-id: a58d1edcc07c7b68c3525b1aa05828212c89c6c7
      d6dfe516
  2. 14 12月, 2018 6 次提交
    • A
      Refine db_stress params for atomic flush (#4781) · 8d2b74d2
      Andrew Kryczka 提交于
      Summary:
      Separate flag for enabling option from flag for enabling dedicated atomic stress test. I have found setting the former without setting the latter can detect different problems.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4781
      
      Differential Revision: D13463211
      
      Pulled By: ajkr
      
      fbshipit-source-id: 054f777885b2dc7d5ea99faafa21d6537eee45fd
      8d2b74d2
    • M
      Fix race condition on options_file_number_ (#4780) · 34954233
      Maysam Yabandeh 提交于
      Summary:
      options_file_number_ must be written under db::mutex_ sine its read is protected by mutex_ in ::GetLiveFiles(). However currently it is written in ::RenameTempFileToOptionsFile() which according to its contract must be called without holding db::mutex_. The patch fixes the race condition by also acquitting the mutex_ before writing options_file_number_. Also it does that only if the rename of option file is successful.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4780
      
      Differential Revision: D13461411
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 2d5bae96a1f3e969ef2505b737cf2d7ae749787b
      34954233
    • Y
      Improve flushing multiple column families (#4708) · 4fce44fc
      Yanqin Jin 提交于
      Summary:
      If one column family is dropped, we should simply skip it and continue to flush
      other active ones.
      Currently we use Status::ShutdownInProgress to notify caller of column families
      being dropped. In the future, we should consider using a different Status code.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4708
      
      Differential Revision: D13378954
      
      Pulled By: riversand963
      
      fbshipit-source-id: 42f248cdf2d32d4c0f677cd39012694b8f1328ca
      4fce44fc
    • M
      Reduce runtime of compact_on_deletion_collector_test (#4779) · 67e5b542
      Maysam Yabandeh 提交于
      Summary:
      It sometimes times out with it is run with TSAN. The patch reduces the iteration from 50 to 30. This reduces the normal runtime from 5.2 to 3.1 seconds and should similarly address the TSAN timeout problem.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4779
      
      Differential Revision: D13456862
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: fdc0ad7d781b1c33b771d2415ff5fa2f1b5e2537
      67e5b542
    • D
      Get `CompactionJobInfo` from CompactFiles · 2670fe8c
      DorianZheng 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4716
      
      Differential Revision: D13207677
      
      Pulled By: ajkr
      
      fbshipit-source-id: d0ccf5a66df6cbb07288b0c5ebad81fd9df3926b
      2670fe8c
    • B
      Concurrent task limiter for compaction thread control (#4332) · a8b9891f
      Burton Li 提交于
      Summary:
      The PR is targeting to resolve the issue of:
      https://github.com/facebook/rocksdb/issues/3972#issue-330771918
      
      We have a rocksdb created with leveled-compaction with multiple column families (CFs), some of CFs are using HDD to store big and less frequently accessed data and others are using SSD.
      When there are continuously write traffics going on to all CFs, the compaction thread pool is mostly occupied by those slow HDD compactions, which blocks fully utilize SSD bandwidth.
      Since atomic write and transaction is needed across CFs, so splitting it to multiple rocksdb instance is not an option for us.
      
      With the compaction thread control, we got 30%+ HDD write throughput gain, and also a lot smooth SSD write since less write stall happening.
      
      ConcurrentTaskLimiter can be shared with multi-CFs across rocksdb instances, so the feature does not only work for multi-CFs scenarios, but also for multi-rocksdbs scenarios, who need disk IO resource control per tenant.
      
      The usage is straight forward:
      e.g.:
      
      //
      // Enable compaction thread limiter thru ColumnFamilyOptions
      //
      std::shared_ptr<ConcurrentTaskLimiter> ctl(NewConcurrentTaskLimiter("foo_limiter", 4));
      Options options;
      ColumnFamilyOptions cf_opt(options);
      cf_opt.compaction_thread_limiter = ctl;
      ...
      
      //
      // Compaction thread limiter can be tuned or disabled on-the-fly
      //
      ctl->SetMaxOutstandingTask(12); // enlarge to 12 tasks
      ...
      ctl->ResetMaxOutstandingTask(); // disable (bypass) thread limiter
      ctl->SetMaxOutstandingTask(-1); // Same as above
      ...
      ctl->SetMaxOutstandingTask(0);  // full throttle (0 task)
      
      //
      // Sharing compaction thread limiter among CFs (to resolve multiple storage perf issue)
      //
      std::shared_ptr<ConcurrentTaskLimiter> ctl_ssd(NewConcurrentTaskLimiter("ssd_limiter", 8));
      std::shared_ptr<ConcurrentTaskLimiter> ctl_hdd(NewConcurrentTaskLimiter("hdd_limiter", 4));
      Options options;
      ColumnFamilyOptions cf_opt_ssd1(options);
      ColumnFamilyOptions cf_opt_ssd2(options);
      ColumnFamilyOptions cf_opt_hdd1(options);
      ColumnFamilyOptions cf_opt_hdd2(options);
      ColumnFamilyOptions cf_opt_hdd3(options);
      
      // SSD CFs
      cf_opt_ssd1.compaction_thread_limiter = ctl_ssd;
      cf_opt_ssd2.compaction_thread_limiter = ctl_ssd;
      
      // HDD CFs
      cf_opt_hdd1.compaction_thread_limiter = ctl_hdd;
      cf_opt_hdd2.compaction_thread_limiter = ctl_hdd;
      cf_opt_hdd3.compaction_thread_limiter = ctl_hdd;
      
      ...
      
      //
      // The limiter is disabled by default (or set to nullptr explicitly)
      //
      Options options;
      ColumnFamilyOptions cf_opt(options);
      cf_opt.compaction_thread_limiter = nullptr;
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4332
      
      Differential Revision: D13226590
      
      Pulled By: siying
      
      fbshipit-source-id: 14307aec55b8bd59c8223d04aa6db3c03d1b0c1d
      a8b9891f
  3. 13 12月, 2018 1 次提交
    • M
      Fix flaky test DBCompactionTest::DeleteFileRange (#4776) · 0aa17c10
      Maysam Yabandeh 提交于
      Summary:
      The test has been failing sporadically probably because the configured compaction options were actually unused. Verified that by the following:
      ```
      ~/gtest-parallel/gtest-parallel ./db_compaction_test --gtest_filter=DBCompactionTest.DeleteFileRange --repeat=1000
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4776
      
      Differential Revision: D13441052
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: d35075b9e6cef9b9c9d0d571f9cd72ade8eda55d
      0aa17c10
  4. 12 12月, 2018 7 次提交
  5. 11 12月, 2018 4 次提交
    • B
      Promote CompactionFilter* accessors to ColumnFamilyOptionsInterface (#3461) · 8261e002
      Ben Clay 提交于
      Summary:
      When adding CompactionFilter and CompactionFilterFactory settings to the Java layer, ColumnFamilyOptions was modified directly instead of ColumnFamilyOptionsInterface. This meant that the old-stye Options monolith was left behind.
      
      This patch fixes that, by:
      - promoting the CompactionFilter + CompactionFilterFactory setters from ColumnFamilyOptions -> ColumnFamilyOptionsInterface
      - adding getters in ColumnFamilyOptionsInterface
      - implementing setters in Options
      - implementing getters in both ColumnFamilyOptions and Options
      - adding testcases
      - reusing a test CompactionFilterFactory by moving it to a common location
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3461
      
      Differential Revision: D13278788
      
      Pulled By: sagar0
      
      fbshipit-source-id: 72602c6eb97dc80734e718abb5e2e9958d3c753b
      8261e002
    • A
      Properly set smallest key of subcompaction output (#4723) · 64aabc91
      Abhishek Madan 提交于
      Summary:
      It is possible to see a situation like the following when
      subcompactions are enabled:
      1. A subcompaction boundary is set to `[b, e)`.
      2. The first output file in a subcompaction has `c@20` as its smallest key
      3. The range tombstone `[a, d)30` is encountered.
      4. The tombstone is written to the range-del meta block and the new
         smallest key is set to `b@0` (since no keys in this subcompaction's
         output can be smaller than `b`).
      5. A key `b@10` in a lower level will now reappear, since it is not
         covered by the truncated start key `b@0`.
      
      In general, unless the smallest data key in a file has a seqnum of 0, it
      is not safe to truncate a tombstone at the start key to have a seqnum of
      0, since it can expose keys with a seqnum greater than 0 but less than
      the tombstone's actual seqnum.
      
      To fix this, when the lower bound of a file is from the subcompaction
      boundaries, we now set the seqnum of an artificially extended smallest
      key to the tombstone's seqnum. This is safe because subcompactions
      operate over disjoint sets of keys, and the subcompactions that can
      experience this problem are not the first subcompaction (which is
      unbounded on the left).
      
      Furthermore, there is now an assertion to detect the described anomalous
      case.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4723
      
      Differential Revision: D13236188
      
      Pulled By: abhimadan
      
      fbshipit-source-id: a6da6a113f2de1e2ff307ca72e055300c8fe5692
      64aabc91
    • A
      Reduce javadoc warnings (#4764) · 10e7de77
      Adam Singer 提交于
      Summary:
      Compile logs have a bit of noise due to missing javadoc annotations. Updating docs to reduce.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4764
      
      Differential Revision: D13400193
      
      Pulled By: sagar0
      
      fbshipit-source-id: 65c7efb70747cc3bb35a336a6881ea6536ae5ff4
      10e7de77
    • M
      Fix inline comments for assumed_tracked (#4762) · 21fca397
      Maysam Yabandeh 提交于
      Summary:
      Fix the definition of assumed_tracked in Transaction that was introduced in #4680
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4762
      
      Differential Revision: D13399150
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 2a30fe49e3c44adacd7e45cd48eae95023ca9dca
      21fca397
  6. 08 12月, 2018 5 次提交
  7. 07 12月, 2018 1 次提交
    • M
      Extend Transaction::GetForUpdate with do_validate (#4680) · b878f93c
      Maysam Yabandeh 提交于
      Summary:
      Transaction::GetForUpdate is extended with a do_validate parameter with default value of true. If false it skips validating the snapshot (if there is any) before doing the read. After the read it also returns the latest value (expects the ReadOptions::snapshot to be nullptr). This allows RocksDB applications to use GetForUpdate similarly to how InnoDB does. Similarly ::Merge, ::Put, ::Delete, and ::SingleDelete are extended with assume_exclusive_tracked with default value of false. It true it indicates that call is assumed to be after a ::GetForUpdate(do_validate=false).
      The Java APIs are accordingly updated.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4680
      
      Differential Revision: D13068508
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f0b59db28f7f6a078b60844d902057140765e67d
      b878f93c
  8. 06 12月, 2018 4 次提交
    • Y
      Update HISTORY.md (#4753) · 1d679e35
      Yanqin Jin 提交于
      Summary:
      As titled. Update history to include a recent bug fix in
      9be3e6b4.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4753
      
      Differential Revision: D13350286
      
      Pulled By: riversand963
      
      fbshipit-source-id: b6324780dee4cb1757bc2209403a08531c150c08
      1d679e35
    • Y
      Allow file-ingest-triggered flush to skip waiting for write-stall clear (#4751) · 9be3e6b4
      Yanqin Jin 提交于
      Summary:
      When write stall has already been triggered due to number of L0 files reaching
      threshold, file ingestion must proceed with its flush without waiting for the
      write stall condition to cleared by the compaction because compaction can wait
      for ingestion to finish (circular wait).
      
      In order to avoid this wait, we can set `FlushOptions.allow_write_stall` to be
      true (default is false). Setting it to false can cause deadlock.
      
      This can happen when the number of compaction threads is low.
      
      Considere the following
      ```
      Time  compaction_thread                        ingestion_thread
       |                                             num_running_ingest_file_++
       |    while(num_running_ingest_file_>0){wait}
       |                                             flush
       V
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4751
      
      Differential Revision: D13343037
      
      Pulled By: riversand963
      
      fbshipit-source-id: d3b95938814af46ec4c463feff0b50c70bd8b23f
      9be3e6b4
    • Y
      Move a function to critical section (#4752) · b96fccb1
      Yanqin Jin 提交于
      Summary:
      Test plan
      ```
      $make clean && make -j32 all check
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4752
      
      Differential Revision: D13344705
      
      Pulled By: riversand963
      
      fbshipit-source-id: fc3a43174d09d70ccc2b09decd78e1da1b6ba9d1
      b96fccb1
    • A
      Fix buck dev mode fbcode builds (#4747) · e58d7695
      anand76 提交于
      Summary:
      Don't enable ROCKSDB_JEMALLOC unless the build mode is opt and default
      allocator is jemalloc. In dev mode, this is causing compile/link errors such as -
      ```
      stderr: buck-out/dev/gen/rocksdb/src/rocksdb_lib#compile-pic-malloc_stats.cc.o4768b59e,gcc-5-glibc-2.23-clang/db/malloc_stats.cc.o:malloc_stats.cc:function rocksdb::DumpMallocStats(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*): error: undefined reference to 'malloc_stats_print'
      clang-7.0: error: linker command failed with exit code 1
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4747
      
      Differential Revision: D13324840
      
      Pulled By: anand1976
      
      fbshipit-source-id: 45ffbd4f63fe4d9e8a0473d8f066155e4ef64a14
      e58d7695
  9. 04 12月, 2018 1 次提交
  10. 01 12月, 2018 4 次提交
  11. 30 11月, 2018 5 次提交
    • M
      WritePrepared: followup fix for snapshot double release issue (#4734) · f1b0841f
      Maysam Yabandeh 提交于
      Summary:
      The fix in #4727 for double snapshot release was incomplete since it does not properly remove the duplicate entires in the snapshot list after finding that a snapshot is still valid. The patch does that and also improves the unit test to show the issue.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4734
      
      Differential Revision: D13266260
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 351e2c40cca45a87b757774c11af74182314911e
      f1b0841f
    • Y
      JemallocNodumpAllocator: option to limit tcache memory usage (#4736) · cf1df5d3
      Yi Wu 提交于
      Summary:
      Add option to limit tcache usage by allocation size. This is to reduce total tcache size in case there are many user threads accessing the allocator and incur non-trivial memory usage.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4736
      
      Differential Revision: D13269305
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 95a9b7fc67facd66837c849137e30e137112e19d
      cf1df5d3
    • S
      Move FIFOCompactionPicker to a separate file (#4724) · 70645355
      Sagar Vemuri 提交于
      Summary:
      **Summary:**
      Simplified the code layout by moving FIFOCompactionPicker to a separate file.
      **Why?:**
      While trying to add ttl functionality to universal compaction, I found that `FIFOCompactionPicker` class and its impl methods to be interspersed between `LevelCompactionPicker` methods which kind-of made the code a little hard to traverse. So I moved `FIFOCompactionPicker` to a separate compaction_picker_fifo.h/cc file, similar to `UniversalCompactionPicker`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4724
      
      Differential Revision: D13227914
      
      Pulled By: sagar0
      
      fbshipit-source-id: 89471766ea67fa4d87664a41c057dd7df4b3d4e3
      70645355
    • Y
      Fix a flaky test DBFlushTest.SyncFail (#4633) · 8d7bc76f
      Yanqin Jin 提交于
      Summary:
      There is a race condition in DBFlushTest.SyncFail, as illustrated below.
      ```
      time         thread1                             bg_flush_thread
        |     Flush(wait=false, cfd)
        |     refs_before=cfd->current()->TEST_refs()   PickMemtable calls cfd->current()->Ref()
        V
      ```
      The race condition between thread1 getting the ref count of cfd's current
      version and bg_flush_thread incrementing the cfd's current version makes it
      possible for later assertion on refs_before to fail. Therefore, we add test
      sync points to enforce the order and assert on the ref count before and after
      PickMemtable is called in bg_flush_thread.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4633
      
      Differential Revision: D12967131
      
      Pulled By: riversand963
      
      fbshipit-source-id: a99d2bacb7869ec5d8d03b24ef2babc0e6ae1a3b
      8d7bc76f
    • K
      db/repair: reset Repair::db_lock_ in ctor (#4683) · 7dbee387
      Kefu Chai 提交于
      Summary:
      there is chance that
      
      * the caller tries to repair the db when holding the db_lock, in
        that case the env implementation might not set the `lock`
        parameter of Repairer::Run().
      * the caller somehow never calls Repairer::Run().
      
      either way, the desctructor of Repair will compare the uninitialized
      db_lock_ with nullptr, and tries to unlock it. there is good chance
      that the db_lock_ is not nullptr, then boom.
      Signed-off-by: NKefu Chai <tchaikov@gmail.com>
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4683
      
      Differential Revision: D13260287
      
      Pulled By: riversand963
      
      fbshipit-source-id: 878a119d2e9f10a0fa17ee62cf3fb24b33d49fa5
      7dbee387