1. 14 8月, 2017 1 次提交
    • A
      rocksdb: make buildable on aarch64 · 5449c099
      Andrew Gallagher 提交于
      Summary:
      - Remove default arch-specified flags.
      - Move non-default arch-specific flags to arch-specific param.
      
      Reviewed By: yiwu-arbug
      
      Differential Revision: D5597499
      
      fbshipit-source-id: c53108ac39c73ac36893d3fd9aaf3b5e3080f1ae
      5449c099
  2. 13 8月, 2017 1 次提交
  3. 12 8月, 2017 9 次提交
    • A
      fix deletion dropping in intra-L0 · acf935e4
      Andrew Kryczka 提交于
      Summary:
      `KeyNotExistsBeyondOutputLevel` didn't consider L0 files' key-ranges. So if a key only was covered by older L0 files' key-ranges, we would incorrectly drop deletions of that key. This PR just skips the deletion-dropping optimization when output level is L0.
      Closes https://github.com/facebook/rocksdb/pull/2726
      
      Differential Revision: D5617286
      
      Pulled By: ajkr
      
      fbshipit-source-id: 4bff1396b06d49a828ba4542f249191052915bce
      acf935e4
    • A
      make sst_dump compression size command consistent · 8254e9b5
      Andrew Kryczka 提交于
      Summary:
      - like other subcommands, reporting compression sizes should be specified with the `--command` CLI arg.
      - also added `--compression_types` arg as it's useful to restrict the types of compression used, at least in my dictionary compression experiments.
      Closes https://github.com/facebook/rocksdb/pull/2706
      
      Differential Revision: D5589520
      
      Pulled By: ajkr
      
      fbshipit-source-id: 305bb4ebcc95eecc8a85523cd3b1050619c9ddc5
      8254e9b5
    • A
      db_bench support for non-uniform column family ops · 74f18c13
      Andrew Kryczka 提交于
      Summary:
      Previously we could only select the CF on which to operate uniformly at random. This is a limitation, e.g., when testing universal compaction as all CFs would need to run full compaction at roughly the same time, which isn't realistic.
      
      This PR allows the user to specify the probability distribution for selecting CFs via the `--column_family_distribution` argument.
      Closes https://github.com/facebook/rocksdb/pull/2677
      
      Differential Revision: D5544436
      
      Pulled By: ajkr
      
      fbshipit-source-id: 478d56260995236ae90895ce5bd51f38882e185a
      74f18c13
    • A
      approximate histogram stats to save cpu · 5de98f2d
      Andrew Kryczka 提交于
      Summary:
      sounds like we're willing to tradeoff minor inaccuracy in stats for speed. start with histogram stats. ticker stats will be harder (and, IMO, we shouldn't change them in this manner) as many test cases rely on them being exactly correct.
      Closes https://github.com/facebook/rocksdb/pull/2720
      
      Differential Revision: D5607884
      
      Pulled By: ajkr
      
      fbshipit-source-id: 1b754cda35ea6b252d1fdd5aa3cfb58866506372
      5de98f2d
    • Y
      Fix c_test ASAN failure · 3f588843
      yiwu-arbug 提交于
      Summary:
      Fix c_test missing deletion of write batch pointer.
      Closes https://github.com/facebook/rocksdb/pull/2725
      
      Differential Revision: D5613866
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: bf3f59a6812178577c9c25bae558ef36414a1f51
      3f588843
    • Y
      Fix blob DB transaction usage while GC · e5a1b727
      yiwu-arbug 提交于
      Summary:
      While GC, blob DB use optimistic transaction to delete or replace the index entry in LSM, to guarantee correctness if there's a normal write writing to the same key. However, the previous implementation doesn't call SetSnapshot() nor use GetForUpdate() of transaction API, instead it do its own sequence number checking before beginning the transaction. A normal write can sneak in after the sequence number check and overwrite the key, and the GC will delete or relocate the old version of the key by mistake. Update the code to property use GetForUpdate() to check the existing index entry.
      
      After the patch the sequence number store with each blob record is useless, So I'm considering remove the sequence number from blob record, in another patch.
      Closes https://github.com/facebook/rocksdb/pull/2703
      
      Differential Revision: D5589178
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 8dc960cd5f4e61b36024ba7c32d05584ce149c24
      e5a1b727
    • A
      fix corruption_test valgrind · 6f051e0c
      Andrew Kryczka 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/2724
      
      Differential Revision: D5613416
      
      Pulled By: ajkr
      
      fbshipit-source-id: ed55fb66ab1b41dfdfe765fe3264a1c87a8acb00
      6f051e0c
    • K
      expose set_skip_stats_update_on_db_open to C bindings · ac098a46
      Kent767 提交于
      Summary:
      It would be super helpful to not have to recompile rocksdb to get this performance tweak for mechanical disks.
      
      I have signed the CLA.
      Closes https://github.com/facebook/rocksdb/pull/2718
      
      Differential Revision: D5606994
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: c05e92bad0d03bd38211af1e1ced0d0d1e02f634
      ac098a46
    • S
      Support prefetch last 512KB with direct I/O in block based file reader · 666a005f
      Siying Dong 提交于
      Summary:
      Right now, if direct I/O is enabled, prefetching the last 512KB cannot be applied, except compaction inputs or readahead is enabled for iterators. This can create a lot of I/O for HDD cases. To solve the problem, the 512KB is prefetched in block based table if direct I/O is enabled. The prefetched buffer is passed in totegher with random access file reader, so that we try to read from the buffer before reading from the file. This can be extended in the future to support flexible user iterator readahead too.
      Closes https://github.com/facebook/rocksdb/pull/2708
      
      Differential Revision: D5593091
      
      Pulled By: siying
      
      fbshipit-source-id: ee36ff6d8af11c312a2622272b21957a7b5c81e7
      666a005f
  4. 11 8月, 2017 6 次提交
  5. 10 8月, 2017 3 次提交
    • J
      fix comment · 23c7d135
      jimmyway 提交于
      Summary:
      Signed-off-by: Ntang.jin <tang.jin@istuary.com>
      Closes https://github.com/facebook/rocksdb/pull/2644
      
      Differential Revision: D5600861
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 9516636cb6e77b09fe0ebef78953adf4b7e88cc8
      23c7d135
    • D
      Makefile: correct faligned-new test · 1fbad84b
      Daniel Black 提交于
      Summary:
      Commit 4f81ab38 has the test wrong.
      
      clang doesn't support a -dumpversion option. By lucky coincidence
      clang/gcc --version both place a version number at the same output location
      when --verison is passed.
      
      Example output (1st line only).
      
          $ clang --version
          clang version 3.9.1 (tags/RELEASE_391/final)
      
          $ gcc --version
          gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1)
      
      During the test of the compiler we ensure that a minimum version is met
      as Makefile doesn't support patterns.
      
      Also xcode9 doesn't seem affected by https://github.com/facebook/rocksdb/issues/2672
      and also doesn't have "clang" as the first part of its output so the
      fix implemented here also is Apple clang friendly.
      
          $ clang --version
          Apple LLVM version 9.0.0 (clang-900.0.31)
      Signed-off-by: NDaniel Black <daniel.black@au.ibm.com>
      Closes https://github.com/facebook/rocksdb/pull/2699
      
      Differential Revision: D5600818
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 3b0f2751becb53c1c35468bf29f3f828e7cf2c2a
      1fbad84b
    • A
      add VerifyChecksum() to db.h · 7848f0b2
      Aaron G 提交于
      Summary:
      We need a tool to check any sst file corruption in the db.
      It will check all the sst files in current version and read all the blocks (data, meta, index) with checksum verification. If any verification fails, the function will return non-OK status.
      Closes https://github.com/facebook/rocksdb/pull/2498
      
      Differential Revision: D5324269
      
      Pulled By: lightmark
      
      fbshipit-source-id: 6f8a272008b722402a772acfc804524c9d1a483b
      7848f0b2
  6. 09 8月, 2017 2 次提交
  7. 08 8月, 2017 2 次提交
  8. 07 8月, 2017 2 次提交
  9. 06 8月, 2017 1 次提交
  10. 05 8月, 2017 3 次提交
    • S
      Optimize range-delete aggregator call in merge helper. · 20dc5e74
      Sagar Vemuri 提交于
      Summary:
      In the condition:
      ```
      if (range_del_agg != nullptr &&
          range_del_agg->ShouldDelete(
              iter->key(),
              RangeDelAggregator::RangePositioningMode::kForwardTraversal) &&
          filter != CompactionFilter::Decision::kRemoveAndSkipUntil) {
      ...
      }
      ```
      it could be possible that all the work done in `range_del_agg->ShouldDelete` is wasted due to not having the right `filter` value later on.
      Instead, check `filter` value before even calling `range_del_agg->ShouldDelete`, which is a much more involved function.
      Closes https://github.com/facebook/rocksdb/pull/2690
      
      Differential Revision: D5568931
      
      Pulled By: sagar0
      
      fbshipit-source-id: 17512d52360425c7ae9de7675383f5d7bc3dad58
      20dc5e74
    • Y
      Avoid blob db call Sync() while writing · 0d4a2b73
      Yi Wu 提交于
      Summary:
      The FsyncFiles background job call Fsync() periodically for blob files. However it can access WritableFileWriter concurrently with a Put() or Write(). And WritableFileWriter does not support concurrent access. It will lead to WritableFileWriter buffer being flush with same content twice, and blob file end up corrupted. Fixing by simply let FsyncFiles hold write_mutex_.
      Closes https://github.com/facebook/rocksdb/pull/2685
      
      Differential Revision: D5561908
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: f0bb5bcab0e05694e053b8c49eab43640721e872
      0d4a2b73
    • M
      Don't add -ljemalloc when DISABLE_JEMALLOC is set · 627c9f1a
      Maysam Yabandeh 提交于
      Summary:
      fixes #2555
      Closes https://github.com/facebook/rocksdb/pull/2684
      
      Differential Revision: D5560527
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 6e1d874ae0b4e699a77203d9d52d0bb8f59013b0
      627c9f1a
  11. 04 8月, 2017 8 次提交
    • A
      db_bench background work thread pool size arguments · dce6d5a8
      Andrew Kryczka 提交于
      Summary:
      The background thread pools' sizes weren't easily configurable by `max_background_compactions` and `max_background_flushes` in multi-instance setups. Introduced separate arguments for their sizes.
      Closes https://github.com/facebook/rocksdb/pull/2680
      
      Differential Revision: D5550675
      
      Pulled By: ajkr
      
      fbshipit-source-id: bab5f0a7bc5db63bb084d0c10facbe437096367d
      dce6d5a8
    • C
      Makefile: fix for GCC 7+ and clang 4+ · 4f81ab38
      Cholerae Hu 提交于
      Summary:
      maysamyabandeh IslamAbdelRahman PTAL
      
      Fix https://github.com/facebook/rocksdb/issues/2672Signed-off-by: NCholerae Hu <huyingqian@pingcap.com>
      Closes https://github.com/facebook/rocksdb/pull/2681
      
      Differential Revision: D5561515
      
      Pulled By: ajkr
      
      fbshipit-source-id: 676187802ebd8a87a6c051bb565818a1bf89d0a9
      4f81ab38
    • Y
      Update all blob db TTL and timestamps to uint64_t · 92afe830
      Yi Wu 提交于
      Summary:
      The current blob db implementation use mix of int32_t, uint32_t and uint64_t for TTL and expiration. Update all timestamps to uint64_t for consistency.
      Closes https://github.com/facebook/rocksdb/pull/2683
      
      Differential Revision: D5557103
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: e4eab2691629a755e614e8cf1eed9c3a681d0c42
      92afe830
    • A
      Fix /bin/bash shebangs · 5883a1ae
      Alan Somers 提交于
      Summary:
      "/bin/bash" is a Linuxism.  "/usr/bin/env bash" is portable.
      Closes https://github.com/facebook/rocksdb/pull/2646
      
      Differential Revision: D5556259
      
      Pulled By: ajkr
      
      fbshipit-source-id: cbffd38ecdbfffb2438969ec007ab345ed893ccb
      5883a1ae
    • A
      Introduce bottom-pri thread pool for large universal compactions · cc01985d
      Andrew Kryczka 提交于
      Summary:
      When we had a single thread pool for compactions, a thread could be busy for a long time (minutes) executing a compaction involving the bottom level. In multi-instance setups, the entire thread pool could be consumed by such bottom-level compactions. Then, top-level compactions (e.g., a few L0 files) would be blocked for a long time ("head-of-line blocking"). Such top-level compactions are critical to prevent compaction stalls as they can quickly reduce number of L0 files / sorted runs.
      
      This diff introduces a bottom-priority queue for universal compactions including the bottom level. This alleviates the head-of-line blocking situation for fast, top-level compactions.
      
      - Added `Env::Priority::BOTTOM` thread pool. This feature is only enabled if user explicitly configures it to have a positive number of threads.
      - Changed `ThreadPoolImpl`'s default thread limit from one to zero. This change is invisible to users as we call `IncBackgroundThreadsIfNeeded` on the low-pri/high-pri pools during `DB::Open` with values of at least one. It is necessary, though, for bottom-pri to start with zero threads so the feature is disabled by default.
      - Separated `ManualCompaction` into two parts in `PrepickedCompaction`. `PrepickedCompaction` is used for any compaction that's picked outside of its execution thread, either manual or automatic.
      - Forward universal compactions involving last level to the bottom pool (worker thread's entry point is `BGWorkBottomCompaction`).
      - Track `bg_bottom_compaction_scheduled_` so we can wait for bottom-level compactions to finish. We don't count them against the background jobs limits. So users of this feature will get an extra compaction for free.
      Closes https://github.com/facebook/rocksdb/pull/2580
      
      Differential Revision: D5422916
      
      Pulled By: ajkr
      
      fbshipit-source-id: a74bd11f1ea4933df3739b16808bb21fcd512333
      cc01985d
    • Y
      Allow concurrent writes to blob db · 0b814ba9
      Yi Wu 提交于
      Summary:
      I'm going with brute-force solution, just letting Put() and Write() holding a mutex before writing. May improve concurrent writing with finer granularity locking later.
      Closes https://github.com/facebook/rocksdb/pull/2682
      
      Differential Revision: D5552690
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 039abd675b5d274a7af6428198d1733cafecef4c
      0b814ba9
    • Y
      Blob DB garbage collection should keep keys with newer version · 2c45ada4
      Yi Wu 提交于
      Summary:
      Fix the bug where if blob db garbage collection revmoe keys with newer version. It shouldn't delete the key from base db when sequence number in base db is not equal to the one in blob log.
      Closes https://github.com/facebook/rocksdb/pull/2678
      
      Differential Revision: D5549752
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: abb8649260963b5c389748023970fd746279d227
      2c45ada4
    • M
      Fix the overflow bug in AwaitState · 58410aee
      Maysam Yabandeh 提交于
      Summary:
      https://github.com/facebook/rocksdb/issues/2559 reports an overflow in AwaitState. nbronson has debugged the issue and presented the fix, which is applied to this patch. Moreover this patch adds more comments to clarify the logic in AwaitState.
      
      I tried with both 16 and 64 threads on update benchmark. The fix lowers cpu usage by 1.6 but also lowers the throughput by 1.6 and 2% respectively. Apparently the bug had favored using the spinning more often.
      
      Benchmarks:
      TEST_TMPDIR=/dev/shm/tmpdb time ./db_bench --benchmarks="fillrandom" --threads=16 --num=2000000
      TEST_TMPDIR=/dev/shm/tmpdb time ./db_bench --use_existing_db=1 --benchmarks="updaterandom[X3]" --threads=16 --num=2000000
      TEST_TMPDIR=/dev/shm/tmpdb time ./db_bench --use_existing_db=1 --benchmarks="updaterandom[X3]" --threads=64 --num=200000
      
      Results
      $ cat update-16t-bug.txt | tail -4
      updaterandom [AVG    3 runs] : 234117 ops/sec;   51.8 MB/sec
      updaterandom [MEDIAN 3 runs] : 233581 ops/sec;   51.7 MB/sec
      3896.42user 1539.12system 6:50.61elapsed 1323%CPU (0avgtext+0avgdata 331308maxresident)k
      0inputs+0outputs (0major+1281001minor)pagefaults 0swaps
      $ cat update-16t-fixed.txt | tail -4
      updaterandom [AVG    3 runs] : 230364 ops/sec;   51.0 MB/sec
      updaterandom [MEDIAN 3 runs] : 226169 ops/sec;   50.0 MB/sec
      3865.46user 1568.32system 6:57.63elapsed 1301%CPU (0avgtext+0avgdata 315012maxresident)k
      0inputs+0outputs (0major+1342568minor)pagefaults 0swaps
      
      $ cat update-64t-bug.txt | tail -4
      updaterandom [AVG    3 runs] : 261878 ops/sec;   57.9 MB/sec
      updaterandom [MEDIAN 3 runs] : 262859 ops/sec;   58.2 MB/sec
      926.27user 578.06system 2:27.46elapsed 1020%CPU (0avgtext+0avgdata 475480maxresident)k
      0inputs+0outputs (0major+1058728minor)pagefaults 0swaps
      $ cat update-64t-fixed.txt | tail -4
      updaterandom [AVG    3 runs] : 256699 ops/sec;   56.8 MB/sec
      updaterandom [MEDIAN 3 runs] : 256380 ops/sec;   56.7 MB/sec
      933.47user 575.37system 2:30.41elapsed 1003%CPU (0avgtext+0avgdata 482340maxresident)k
      0inputs+0outputs (0major+1078557minor)pagefaults 0swaps
      Closes https://github.com/facebook/rocksdb/pull/2679
      
      Differential Revision: D5553732
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 98b72dc3a8e0f22ea29d4f7c7790af10c369c5bb
      58410aee
  12. 03 8月, 2017 2 次提交
    • M
      Refactor TransactionImpl · c3d5c4d3
      Maysam Yabandeh 提交于
      Summary:
      This patch refactors TransactionImpl by separating the logic for pessimistic concurrency control from the implementation of how to write the data to rocksdb. The existing implementation is named WriteCommittedTxnImpl as it writes committed data to the db. A template named WritePreparedTxnImpl is also added which will be later completed to provide a an alternative implementation.
      Closes https://github.com/facebook/rocksdb/pull/2676
      
      Differential Revision: D5549998
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 16298e86b43ca4849324c1f35c731913c6d17bec
      c3d5c4d3
    • A
      support multiple CFs with OPTIONS file · 060ccd4f
      Andrew Kryczka 提交于
      Summary:
      Move an option necessary for running db_bench on multiple CFs into the general initialization area, so it works with both flag-based init and OPTIONS-based init.
      Closes https://github.com/facebook/rocksdb/pull/2675
      
      Differential Revision: D5541378
      
      Pulled By: ajkr
      
      fbshipit-source-id: 169926cb4ae95c17974f744faf7cc794d41e5c0a
      060ccd4f