1. 19 8月, 2017 1 次提交
    • M
      Preload l0 index partitions · 1efc600d
      Maysam Yabandeh 提交于
      Summary:
      This fixes the existing logic for pinning l0 index partitions. The patch preloads the partitions into block cache and pin them if they belong to level 0 and pin_l0 is set.
      
      The drawback is that it does many small IOs when preloading all the partitions into the cache is direct io is enabled. Working for a solution for that.
      Closes https://github.com/facebook/rocksdb/pull/2661
      
      Differential Revision: D5554010
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 1e6f32a3524d71355c77d4138516dcfb601ca7b2
      1efc600d
  2. 18 8月, 2017 4 次提交
  3. 17 8月, 2017 6 次提交
    • S
      Allow merge operator to be called even with a single operand · 9a44b4c3
      Sagar Vemuri 提交于
      Summary:
      Added a function `MergeOperator::DoesAllowSingleMergeOperand()` to allow invoking a merge operator even with a single merge operand, if overriden.
      
      This is needed for Cassandra-on-RocksDB work. All Cassandra writes are through merges and this will allow a single merge-value to be updated in the merge-operator invoked via a compaction, if needed, due to an expired TTL.
      Closes https://github.com/facebook/rocksdb/pull/2721
      
      Differential Revision: D5608706
      
      Pulled By: sagar0
      
      fbshipit-source-id: f299f9f91c4d1ac26e48bd5906e122c1c5e5f3fc
      9a44b4c3
    • F
      fix some misspellings · ac8fb77a
      follitude 提交于
      Summary:
      PTAL ajkr
      Closes https://github.com/facebook/rocksdb/pull/2750
      
      Differential Revision: D5648052
      
      Pulled By: ajkr
      
      fbshipit-source-id: 7cd1ddd61364d5a55a10fdd293fa74b2bf89dd98
      ac8fb77a
    • A
      minor improvements to db_stress · 23593171
      Andrew Kryczka 提交于
      Summary:
      fix some things that made this command hard to use from CLI:
      
      - use default values for `target_file_size_base` and `max_bytes_for_level_base`. previously we were using small values for these but default value of `write_buffer_size`, which led to enormous number of L1 files.
      - failure message for `value_size_mult` too big. previously there was just an assert, so in non-debug mode it'd overrun the value buffer and crash mysteriously.
      - only print verification success if there's no failure. before it'd print both in the failure case.
      - support `memtable_prefix_bloom_size_ratio`
      - support `num_bottom_pri_threads` (universal compaction)
      Closes https://github.com/facebook/rocksdb/pull/2741
      
      Differential Revision: D5629495
      
      Pulled By: ajkr
      
      fbshipit-source-id: ddad97d6d4ba0884e7c0f933b0a359712514fc1d
      23593171
    • A
      fix deleterange with memtable prefix bloom · af012c0f
      Andrew Kryczka 提交于
      Summary:
      the range delete tombstones in memtable should be added to the aggregator even when the memtable's prefix bloom filter tells us the lookup key's not there. This bug could cause data to temporarily reappear until the memtable containing range deletions is flushed.
      
      Reported in #2743.
      Closes https://github.com/facebook/rocksdb/pull/2745
      
      Differential Revision: D5639007
      
      Pulled By: ajkr
      
      fbshipit-source-id: 04fc6facb6f978340a3f639536f4ca7c0d73dfc9
      af012c0f
    • A
      update scores after picking universal compaction · 1c8dbe2a
      Andrew Kryczka 提交于
      Summary:
      We forgot to recompute compaction scores after picking a universal compaction like we do in level compaction (https://github.com/facebook/rocksdb/blob/a34b2e388ee51173e44f6aa290f1301c33af9e67/db/compaction_picker.cc#L691-L695). This leads to a fairness issue where we waste compactions on CFs/DB instances that don't need it while others can starve.
      
      Previously, ccecf3f4 fixed the issue for the read-amp-based compaction case; this PR avoids the issue earlier and also for size-ratio-based compactions.
      Closes https://github.com/facebook/rocksdb/pull/2688
      
      Differential Revision: D5566191
      
      Pulled By: ajkr
      
      fbshipit-source-id: 010bccb2a107f6a76f3d3022b90aadce5cc48feb
      1c8dbe2a
    • M
      Update WritePrepared with the pseudo code · eb642530
      Maysam Yabandeh 提交于
      Summary:
      Implement the main body of WritePrepared pseudo code. This includes PrepareInternal and CommitInternal, as well as AddCommitted which updates the commit map. It also provides a IsInSnapshot method that could be later called form the read path to decide if a version is in the read snapshot or it should other be skipped.
      
      This patch lacks unit tests and does not attempt to offer an efficient implementation. The idea is that to have the API specified so that we can work on related tasks in parallel.
      Closes https://github.com/facebook/rocksdb/pull/2713
      
      Differential Revision: D5640021
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: bfa7a05e8d8498811fab714ce4b9c21530514e1c
      eb642530
  4. 16 8月, 2017 4 次提交
  5. 15 8月, 2017 3 次提交
  6. 14 8月, 2017 4 次提交
  7. 13 8月, 2017 1 次提交
  8. 12 8月, 2017 9 次提交
    • A
      fix deletion dropping in intra-L0 · acf935e4
      Andrew Kryczka 提交于
      Summary:
      `KeyNotExistsBeyondOutputLevel` didn't consider L0 files' key-ranges. So if a key only was covered by older L0 files' key-ranges, we would incorrectly drop deletions of that key. This PR just skips the deletion-dropping optimization when output level is L0.
      Closes https://github.com/facebook/rocksdb/pull/2726
      
      Differential Revision: D5617286
      
      Pulled By: ajkr
      
      fbshipit-source-id: 4bff1396b06d49a828ba4542f249191052915bce
      acf935e4
    • A
      make sst_dump compression size command consistent · 8254e9b5
      Andrew Kryczka 提交于
      Summary:
      - like other subcommands, reporting compression sizes should be specified with the `--command` CLI arg.
      - also added `--compression_types` arg as it's useful to restrict the types of compression used, at least in my dictionary compression experiments.
      Closes https://github.com/facebook/rocksdb/pull/2706
      
      Differential Revision: D5589520
      
      Pulled By: ajkr
      
      fbshipit-source-id: 305bb4ebcc95eecc8a85523cd3b1050619c9ddc5
      8254e9b5
    • A
      db_bench support for non-uniform column family ops · 74f18c13
      Andrew Kryczka 提交于
      Summary:
      Previously we could only select the CF on which to operate uniformly at random. This is a limitation, e.g., when testing universal compaction as all CFs would need to run full compaction at roughly the same time, which isn't realistic.
      
      This PR allows the user to specify the probability distribution for selecting CFs via the `--column_family_distribution` argument.
      Closes https://github.com/facebook/rocksdb/pull/2677
      
      Differential Revision: D5544436
      
      Pulled By: ajkr
      
      fbshipit-source-id: 478d56260995236ae90895ce5bd51f38882e185a
      74f18c13
    • A
      approximate histogram stats to save cpu · 5de98f2d
      Andrew Kryczka 提交于
      Summary:
      sounds like we're willing to tradeoff minor inaccuracy in stats for speed. start with histogram stats. ticker stats will be harder (and, IMO, we shouldn't change them in this manner) as many test cases rely on them being exactly correct.
      Closes https://github.com/facebook/rocksdb/pull/2720
      
      Differential Revision: D5607884
      
      Pulled By: ajkr
      
      fbshipit-source-id: 1b754cda35ea6b252d1fdd5aa3cfb58866506372
      5de98f2d
    • Y
      Fix c_test ASAN failure · 3f588843
      yiwu-arbug 提交于
      Summary:
      Fix c_test missing deletion of write batch pointer.
      Closes https://github.com/facebook/rocksdb/pull/2725
      
      Differential Revision: D5613866
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: bf3f59a6812178577c9c25bae558ef36414a1f51
      3f588843
    • Y
      Fix blob DB transaction usage while GC · e5a1b727
      yiwu-arbug 提交于
      Summary:
      While GC, blob DB use optimistic transaction to delete or replace the index entry in LSM, to guarantee correctness if there's a normal write writing to the same key. However, the previous implementation doesn't call SetSnapshot() nor use GetForUpdate() of transaction API, instead it do its own sequence number checking before beginning the transaction. A normal write can sneak in after the sequence number check and overwrite the key, and the GC will delete or relocate the old version of the key by mistake. Update the code to property use GetForUpdate() to check the existing index entry.
      
      After the patch the sequence number store with each blob record is useless, So I'm considering remove the sequence number from blob record, in another patch.
      Closes https://github.com/facebook/rocksdb/pull/2703
      
      Differential Revision: D5589178
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 8dc960cd5f4e61b36024ba7c32d05584ce149c24
      e5a1b727
    • A
      fix corruption_test valgrind · 6f051e0c
      Andrew Kryczka 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/2724
      
      Differential Revision: D5613416
      
      Pulled By: ajkr
      
      fbshipit-source-id: ed55fb66ab1b41dfdfe765fe3264a1c87a8acb00
      6f051e0c
    • K
      expose set_skip_stats_update_on_db_open to C bindings · ac098a46
      Kent767 提交于
      Summary:
      It would be super helpful to not have to recompile rocksdb to get this performance tweak for mechanical disks.
      
      I have signed the CLA.
      Closes https://github.com/facebook/rocksdb/pull/2718
      
      Differential Revision: D5606994
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: c05e92bad0d03bd38211af1e1ced0d0d1e02f634
      ac098a46
    • S
      Support prefetch last 512KB with direct I/O in block based file reader · 666a005f
      Siying Dong 提交于
      Summary:
      Right now, if direct I/O is enabled, prefetching the last 512KB cannot be applied, except compaction inputs or readahead is enabled for iterators. This can create a lot of I/O for HDD cases. To solve the problem, the 512KB is prefetched in block based table if direct I/O is enabled. The prefetched buffer is passed in totegher with random access file reader, so that we try to read from the buffer before reading from the file. This can be extended in the future to support flexible user iterator readahead too.
      Closes https://github.com/facebook/rocksdb/pull/2708
      
      Differential Revision: D5593091
      
      Pulled By: siying
      
      fbshipit-source-id: ee36ff6d8af11c312a2622272b21957a7b5c81e7
      666a005f
  9. 11 8月, 2017 6 次提交
  10. 10 8月, 2017 2 次提交