1. 05 8月, 2017 1 次提交
  2. 04 8月, 2017 8 次提交
    • A
      db_bench background work thread pool size arguments · dce6d5a8
      Andrew Kryczka 提交于
      Summary:
      The background thread pools' sizes weren't easily configurable by `max_background_compactions` and `max_background_flushes` in multi-instance setups. Introduced separate arguments for their sizes.
      Closes https://github.com/facebook/rocksdb/pull/2680
      
      Differential Revision: D5550675
      
      Pulled By: ajkr
      
      fbshipit-source-id: bab5f0a7bc5db63bb084d0c10facbe437096367d
      dce6d5a8
    • C
      Makefile: fix for GCC 7+ and clang 4+ · 4f81ab38
      Cholerae Hu 提交于
      Summary:
      maysamyabandeh IslamAbdelRahman PTAL
      
      Fix https://github.com/facebook/rocksdb/issues/2672Signed-off-by: NCholerae Hu <huyingqian@pingcap.com>
      Closes https://github.com/facebook/rocksdb/pull/2681
      
      Differential Revision: D5561515
      
      Pulled By: ajkr
      
      fbshipit-source-id: 676187802ebd8a87a6c051bb565818a1bf89d0a9
      4f81ab38
    • Y
      Update all blob db TTL and timestamps to uint64_t · 92afe830
      Yi Wu 提交于
      Summary:
      The current blob db implementation use mix of int32_t, uint32_t and uint64_t for TTL and expiration. Update all timestamps to uint64_t for consistency.
      Closes https://github.com/facebook/rocksdb/pull/2683
      
      Differential Revision: D5557103
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: e4eab2691629a755e614e8cf1eed9c3a681d0c42
      92afe830
    • A
      Fix /bin/bash shebangs · 5883a1ae
      Alan Somers 提交于
      Summary:
      "/bin/bash" is a Linuxism.  "/usr/bin/env bash" is portable.
      Closes https://github.com/facebook/rocksdb/pull/2646
      
      Differential Revision: D5556259
      
      Pulled By: ajkr
      
      fbshipit-source-id: cbffd38ecdbfffb2438969ec007ab345ed893ccb
      5883a1ae
    • A
      Introduce bottom-pri thread pool for large universal compactions · cc01985d
      Andrew Kryczka 提交于
      Summary:
      When we had a single thread pool for compactions, a thread could be busy for a long time (minutes) executing a compaction involving the bottom level. In multi-instance setups, the entire thread pool could be consumed by such bottom-level compactions. Then, top-level compactions (e.g., a few L0 files) would be blocked for a long time ("head-of-line blocking"). Such top-level compactions are critical to prevent compaction stalls as they can quickly reduce number of L0 files / sorted runs.
      
      This diff introduces a bottom-priority queue for universal compactions including the bottom level. This alleviates the head-of-line blocking situation for fast, top-level compactions.
      
      - Added `Env::Priority::BOTTOM` thread pool. This feature is only enabled if user explicitly configures it to have a positive number of threads.
      - Changed `ThreadPoolImpl`'s default thread limit from one to zero. This change is invisible to users as we call `IncBackgroundThreadsIfNeeded` on the low-pri/high-pri pools during `DB::Open` with values of at least one. It is necessary, though, for bottom-pri to start with zero threads so the feature is disabled by default.
      - Separated `ManualCompaction` into two parts in `PrepickedCompaction`. `PrepickedCompaction` is used for any compaction that's picked outside of its execution thread, either manual or automatic.
      - Forward universal compactions involving last level to the bottom pool (worker thread's entry point is `BGWorkBottomCompaction`).
      - Track `bg_bottom_compaction_scheduled_` so we can wait for bottom-level compactions to finish. We don't count them against the background jobs limits. So users of this feature will get an extra compaction for free.
      Closes https://github.com/facebook/rocksdb/pull/2580
      
      Differential Revision: D5422916
      
      Pulled By: ajkr
      
      fbshipit-source-id: a74bd11f1ea4933df3739b16808bb21fcd512333
      cc01985d
    • Y
      Allow concurrent writes to blob db · 0b814ba9
      Yi Wu 提交于
      Summary:
      I'm going with brute-force solution, just letting Put() and Write() holding a mutex before writing. May improve concurrent writing with finer granularity locking later.
      Closes https://github.com/facebook/rocksdb/pull/2682
      
      Differential Revision: D5552690
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 039abd675b5d274a7af6428198d1733cafecef4c
      0b814ba9
    • Y
      Blob DB garbage collection should keep keys with newer version · 2c45ada4
      Yi Wu 提交于
      Summary:
      Fix the bug where if blob db garbage collection revmoe keys with newer version. It shouldn't delete the key from base db when sequence number in base db is not equal to the one in blob log.
      Closes https://github.com/facebook/rocksdb/pull/2678
      
      Differential Revision: D5549752
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: abb8649260963b5c389748023970fd746279d227
      2c45ada4
    • M
      Fix the overflow bug in AwaitState · 58410aee
      Maysam Yabandeh 提交于
      Summary:
      https://github.com/facebook/rocksdb/issues/2559 reports an overflow in AwaitState. nbronson has debugged the issue and presented the fix, which is applied to this patch. Moreover this patch adds more comments to clarify the logic in AwaitState.
      
      I tried with both 16 and 64 threads on update benchmark. The fix lowers cpu usage by 1.6 but also lowers the throughput by 1.6 and 2% respectively. Apparently the bug had favored using the spinning more often.
      
      Benchmarks:
      TEST_TMPDIR=/dev/shm/tmpdb time ./db_bench --benchmarks="fillrandom" --threads=16 --num=2000000
      TEST_TMPDIR=/dev/shm/tmpdb time ./db_bench --use_existing_db=1 --benchmarks="updaterandom[X3]" --threads=16 --num=2000000
      TEST_TMPDIR=/dev/shm/tmpdb time ./db_bench --use_existing_db=1 --benchmarks="updaterandom[X3]" --threads=64 --num=200000
      
      Results
      $ cat update-16t-bug.txt | tail -4
      updaterandom [AVG    3 runs] : 234117 ops/sec;   51.8 MB/sec
      updaterandom [MEDIAN 3 runs] : 233581 ops/sec;   51.7 MB/sec
      3896.42user 1539.12system 6:50.61elapsed 1323%CPU (0avgtext+0avgdata 331308maxresident)k
      0inputs+0outputs (0major+1281001minor)pagefaults 0swaps
      $ cat update-16t-fixed.txt | tail -4
      updaterandom [AVG    3 runs] : 230364 ops/sec;   51.0 MB/sec
      updaterandom [MEDIAN 3 runs] : 226169 ops/sec;   50.0 MB/sec
      3865.46user 1568.32system 6:57.63elapsed 1301%CPU (0avgtext+0avgdata 315012maxresident)k
      0inputs+0outputs (0major+1342568minor)pagefaults 0swaps
      
      $ cat update-64t-bug.txt | tail -4
      updaterandom [AVG    3 runs] : 261878 ops/sec;   57.9 MB/sec
      updaterandom [MEDIAN 3 runs] : 262859 ops/sec;   58.2 MB/sec
      926.27user 578.06system 2:27.46elapsed 1020%CPU (0avgtext+0avgdata 475480maxresident)k
      0inputs+0outputs (0major+1058728minor)pagefaults 0swaps
      $ cat update-64t-fixed.txt | tail -4
      updaterandom [AVG    3 runs] : 256699 ops/sec;   56.8 MB/sec
      updaterandom [MEDIAN 3 runs] : 256380 ops/sec;   56.7 MB/sec
      933.47user 575.37system 2:30.41elapsed 1003%CPU (0avgtext+0avgdata 482340maxresident)k
      0inputs+0outputs (0major+1078557minor)pagefaults 0swaps
      Closes https://github.com/facebook/rocksdb/pull/2679
      
      Differential Revision: D5553732
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 98b72dc3a8e0f22ea29d4f7c7790af10c369c5bb
      58410aee
  3. 03 8月, 2017 2 次提交
    • M
      Refactor TransactionImpl · c3d5c4d3
      Maysam Yabandeh 提交于
      Summary:
      This patch refactors TransactionImpl by separating the logic for pessimistic concurrency control from the implementation of how to write the data to rocksdb. The existing implementation is named WriteCommittedTxnImpl as it writes committed data to the db. A template named WritePreparedTxnImpl is also added which will be later completed to provide a an alternative implementation.
      Closes https://github.com/facebook/rocksdb/pull/2676
      
      Differential Revision: D5549998
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 16298e86b43ca4849324c1f35c731913c6d17bec
      c3d5c4d3
    • A
      support multiple CFs with OPTIONS file · 060ccd4f
      Andrew Kryczka 提交于
      Summary:
      Move an option necessary for running db_bench on multiple CFs into the general initialization area, so it works with both flag-based init and OPTIONS-based init.
      Closes https://github.com/facebook/rocksdb/pull/2675
      
      Differential Revision: D5541378
      
      Pulled By: ajkr
      
      fbshipit-source-id: 169926cb4ae95c17974f744faf7cc794d41e5c0a
      060ccd4f
  4. 02 8月, 2017 2 次提交
    • S
      Fix statistics in RocksJava sample · 34538706
      Sagar Vemuri 提交于
      Summary:
      I observed while doing a `make jtest` that the java sample was broken, due to the changes in #2551 .
      Closes https://github.com/facebook/rocksdb/pull/2674
      
      Differential Revision: D5539807
      
      Pulled By: sagar0
      
      fbshipit-source-id: 2c7e9d84778099dfa1c611996b444efe3c9fd466
      34538706
    • Y
      Dump Blob DB options to info log · 1900771b
      Yi Wu 提交于
      Summary:
      * Dump blob db options to info log
      * Remove BlobDBOptionsImpl to disallow dynamic cast *BlobDBOptions into *BlobDBOptionsImpl. Move options there to be constants or into BlobDBOptions. The dynamic cast is broken after #2645
      * Change some of the default options
      * Remove blob_db_options.min_blob_size, which is unimplemented. Will implement it soon.
      Closes https://github.com/facebook/rocksdb/pull/2671
      
      Differential Revision: D5529912
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: dcd58ca981db5bcc7f123b65a0d6f6ae0dc703c7
      1900771b
  5. 01 8月, 2017 3 次提交
  6. 29 7月, 2017 6 次提交
    • S
      Replace dynamic_cast<> · 21696ba5
      Siying Dong 提交于
      Summary:
      Replace dynamic_cast<> so that users can choose to build with RTTI off, so that they can save several bytes per object, and get tiny more memory available.
      Some nontrivial changes:
      1. Add Comparator::GetRootComparator() to get around the internal comparator hack
      2. Add the two experiemental functions to DB
      3. Add TableFactory::GetOptionString() to avoid unnecessary casting to get the option string
      4. Since 3 is done, move the parsing option functions for table factory to table factory files too, to be symmetric.
      Closes https://github.com/facebook/rocksdb/pull/2645
      
      Differential Revision: D5502723
      
      Pulled By: siying
      
      fbshipit-source-id: fd13cec5601cf68a554d87bfcf056f2ffa5fbf7c
      21696ba5
    • M
      Prevent empty memtables from using a lot of memory · e85f2c64
      Mike Kolupaev 提交于
      Summary:
      This fixes OOMs that we (logdevice) are currently having in production.
      
      SkipListRep constructor does a couple small allocations from ConcurrentArena (see InlineSkipList constructor). ConcurrentArena would sometimes allocate an entire block for that, which is a few megabytes (we use Options::arena_block_size = 4 MB). So an empty memtable can take take 4 MB of memory. We have ~40k column families (spread across 15 DB instances), so 4 MB per empty memtable easily OOMs a machine for us.
      
      This PR makes ConcurrentArena always allocate from Arena's inline block when possible. So as long as InlineSkipList's initial allocations are below 2 KB there would be no blocks allocated for empty memtables.
      Closes https://github.com/facebook/rocksdb/pull/2569
      
      Differential Revision: D5404029
      
      Pulled By: al13n321
      
      fbshipit-source-id: 568ec22a3fd1a485c06123f6b2dfc5e9ef67cd23
      e85f2c64
    • S
      Fix FIFO Compaction with TTL tests · ac748c57
      Sagar Vemuri 提交于
      Summary:
      - FIFOCompactionWithTTLTest was flaky when run in parallel earlier, and hence it was disabled. Fixed it now.
      - Also, faking sleep now instead of really sleeping to make tests more realistic by using TTLs like 1 hour and 1 day.
      Closes https://github.com/facebook/rocksdb/pull/2650
      
      Differential Revision: D5506038
      
      Pulled By: sagar0
      
      fbshipit-source-id: deb429a527f045e3e2c5138b547c3e8ac8586aa2
      ac748c57
    • Y
      Move blob_db/ttl_extractor.h into blob_db/blob_db.h · aaf42fe7
      Yi Wu 提交于
      Summary:
      Move blob_db/ttl_extractor.h into blob_db/blob_db.h
      Also exclude TTLExtractor from LITE build.
      Closes https://github.com/facebook/rocksdb/pull/2665
      
      Differential Revision: D5520009
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 4813dcc272c7cc4bf2cdac285256d9a17d78c7b7
      aaf42fe7
    • S
      Fix license headers in Cassandra related files · aace4651
      Sagar Vemuri 提交于
      Summary:
      I might have missed these while doing some recent cassandra code reviews.
      Closes https://github.com/facebook/rocksdb/pull/2663
      
      Differential Revision: D5520138
      
      Pulled By: sagar0
      
      fbshipit-source-id: 340930afe9efe03c75f535a1da1f89bd3e53c1f9
      aace4651
    • I
      CacheActivityLogger, component to log cache activity into a file · 50a96913
      Islam AbdelRahman 提交于
      Summary:
      Simple component that will add a new entry in a log file every time we lookup/insert a key in SimCache.
      API:
      ```
      SimCache::StartActivityLogging(<file_name>, <env>, <optional_max_size>)
      SimCache::StopActivityLogging()
      ```
      
      Sending for review, Still need to add more comments.
      
      I was thinking about a better approach, but I ended up deciding I will use a mutex to sync the writes to the file, since this feature should not be heavily used and only used to collect info that will be analyzed offline. I think it's okay to hold the mutex every time we lookup/add to the SimCache.
      Closes https://github.com/facebook/rocksdb/pull/2295
      
      Differential Revision: D5063826
      
      Pulled By: IslamAbdelRahman
      
      fbshipit-source-id: f3b5daed8b201987c9a071146ddd5c5740a2dd8c
      50a96913
  7. 28 7月, 2017 8 次提交
    • Y
      Blob DB TTL extractor · 6083bc79
      Yi Wu 提交于
      Summary:
      Introducing blob_db::TTLExtractor to replace extract_ttl_fn. The TTL
      extractor can be use to extract TTL from keys insert with Put or
      WriteBatch. Change over existing extract_ttl_fn are:
      * If value is changed, it will be return via std::string* (rather than Slice*). With Slice* the new value has to be part of the existing value. With std::string* the limitation is removed.
      * It can optionally return TTL or expiration.
      
      Other changes in this PR:
      * replace `std::chrono::system_clock` with `Env::NowMicros` so that I can mock time in tests.
      * add several TTL tests.
      * other minor naming change.
      Closes https://github.com/facebook/rocksdb/pull/2659
      
      Differential Revision: D5512627
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 0dfcb00d74d060b8534c6130c808e4d5d0a54440
      6083bc79
    • A
      fix asan/valgrind for TableCache cleanup · 710411ae
      Andrew Kryczka 提交于
      Summary:
      Breaking commit: d12691b8
      
      In the above commit, I moved the `TableCache` cleanup logic from `Version` destructor into `PurgeObsoleteFiles`. I missed cleaning up `TableCache` entries for the current `Version` during DB destruction.
      
      This PR adds that logic to `VersionSet` destructor. One unfortunate side effect is now we're potentially deleting `TableReader`s after `column_family_set_.reset()`, which means we can't call `BlockBasedTableReader::Close` a second time as the block cache might already be destroyed.
      Closes https://github.com/facebook/rocksdb/pull/2662
      
      Differential Revision: D5515108
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2cb820e19aa813e0d258d17f76b2d7b6b7ee0b18
      710411ae
    • Y
      TARGETS file not setting sse explicitly · 3a3fb00b
      Yi Wu 提交于
      Summary:
      We don't need to set them explicitly.
      Closes https://github.com/facebook/rocksdb/pull/2660
      
      Differential Revision: D5514141
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 10edebfc3cfe0afc00a34519f87fcea4d65069ae
      3a3fb00b
    • S
      Build fewer tests in Travis platform_dependent tests · fca4d6da
      Siying Dong 提交于
      Summary:
      platform_dependent tests in Travis now builds all tests, which is not needed. Only build those tests we need to run.
      Closes https://github.com/facebook/rocksdb/pull/2647
      
      Differential Revision: D5513954
      
      Pulled By: siying
      
      fbshipit-source-id: 4d540b146124e70dd25586c47939d19f93655b0a
      fca4d6da
    • A
      remove unnecessary internal_comparator param in newIterator · 8f553d3c
      Aaron Gao 提交于
      Summary:
      solved https://github.com/facebook/rocksdb/issues/2604
      Closes https://github.com/facebook/rocksdb/pull/2648
      
      Differential Revision: D5504875
      
      Pulled By: lightmark
      
      fbshipit-source-id: c14bb62ccbdc9e7bda9cd914cae4ea0765d882ee
      8f553d3c
    • S
      "ccache -C" in Travis · 7f6d012d
      Siying Dong 提交于
      Summary:
      This is to work around the problem of build error:
      
      util/threadpool_imp.o: file not recognized: File truncated
      
      Just to make the build go through. We should remove it later if we find the real long-term solution.
      Closes https://github.com/facebook/rocksdb/pull/2657
      
      Differential Revision: D5511034
      
      Pulled By: siying
      
      fbshipit-source-id: 229f024bd78ee96799017d4a89be74253058ec30
      7f6d012d
    • A
      move TableCache::EraseHandle outside of db mutex · d12691b8
      Andrew Kryczka 提交于
      Summary:
      Post-compaction work holds onto db mutex for the longest time (found by tracing lock acquires/releases with LTTng and correlating timestamps with our info log). Further experimentation showed `TableCache::EraseHandle` is responsible for ~86% of time mutex is held. We can just release the handle outside the db mutex.
      Closes https://github.com/facebook/rocksdb/pull/2654
      
      Differential Revision: D5507126
      
      Pulled By: ajkr
      
      fbshipit-source-id: 703c01ddf2aea16bc0f9e33c08935d78aa6b781d
      d12691b8
    • A
      fix db_bench argument type · f33f1136
      Andrew Kryczka 提交于
      Summary:
      it should be a bool
      Closes https://github.com/facebook/rocksdb/pull/2653
      
      Differential Revision: D5506148
      
      Pulled By: ajkr
      
      fbshipit-source-id: f142f0f3aa8b678c68adef12e5ac6e1e163306f3
      f33f1136
  8. 27 7月, 2017 5 次提交
  9. 26 7月, 2017 5 次提交
    • M
      Remove the orphan assert on !need_log_sync · 30b58cf7
      Maysam Yabandeh 提交于
      Summary:
      We initially had disabled support for write_options.sync when concurrent_prepare_ is set. We later added this support but the statement that asserts this combination is not used was left there. This patch cleans it up.
      Closes https://github.com/facebook/rocksdb/pull/2642
      
      Differential Revision: D5496101
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: becbc503446f2a51bee24cc861958c090c724ec2
      30b58cf7
    • Y
      Fix flaky write_callback_test · fe1a5559
      Yi Wu 提交于
      Summary:
      The test is failing occasionally on the assert: `ASSERT_TRUE(writer->state == WriteThread::State::STATE_INIT)`. This is because the test don't make the leader wait for long enough before updating state for its followers. The patch move the update to `threads_waiting` to the end of `WriteThread::JoinBatchGroup:Wait` callback to avoid this happening.
      
      Also adding `WriteThread::JoinBatchGroup:Start` and have each thread wait there while another thread is linking to the linked-list. This is to make the check of `is_leader` more deterministic.
      
      Also changing two while-loops of `compare_exchange_strong` to plain `fetch_add`, to make it look cleaner.
      Closes https://github.com/facebook/rocksdb/pull/2640
      
      Differential Revision: D5491525
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 6e897f122082bd6f98e6d51b31a25e5fd0a3fb82
      fe1a5559
    • Y
      5.6.1 release blog post · addbd279
      Yi Wu 提交于
      Summary:
      5.6.1 release blog post
      Closes https://github.com/facebook/rocksdb/pull/2638
      
      Differential Revision: D5491168
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 14e3a92a03684afa4bd19bfb3ffb053cc09f5d4a
      addbd279
    • A
      buckification: remove explicit `-msse*` compiler flags · 30edff30
      Andrew Gallagher 提交于
      Summary: These are implied by default platform flags, in particular, `-march=corei7`.
      
      Reviewed By: pixelb
      
      Differential Revision: D5485414
      
      fbshipit-source-id: 85f1329c71fa81a604760844187cc73877fb40e9
      30edff30
    • M
      Lower num of iterations in DeadlockCycle test · 2b259c9d
      Maysam Yabandeh 提交于
      Summary:
      Currently this test times out with tsan. This is likely due to decreased speed with tsan. By lowering the number of iterations we can still catch a bug as the test is run regularly and multiple runs of the test is equivalent with running the test with more iterations.
      Closes https://github.com/facebook/rocksdb/pull/2639
      
      Differential Revision: D5490549
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: bd69c42a9728d337ac95a06a401088384e51731a
      2b259c9d