1. 30 10月, 2018 1 次提交
    • Y
      Avoid memtable cut when active memtable is empty (#4595) · 92b44015
      Yanqin Jin 提交于
      Summary:
      For flush triggered by RocksDB due to memory usage approaching certain
      threshold (WriteBufferManager or Memtable full), we should cut the memtable
      only when the current active memtable is not empty, i.e. contains data. This is
      what we do for non-atomic flush. If we always cut memtable even when the active
      memtable is empty, we will generate extra, empty immutable memtable.
      This is not ideal since it may cause write stall. It also causes some
      DBAtomicFlushTest to fail because cfd->imm()->NumNotFlushed() is different from
      expectation.
      
      Test plan
      ```
      $make clean && make J=1 -j32 all check
      $make clean && OPT="-DROCKSDB_LITE -g" make J=1 -j32 all check
      $make clean && TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make J=1 -j32 valgrind_test
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4595
      
      Differential Revision: D12818520
      
      Pulled By: riversand963
      
      fbshipit-source-id: d867bdbeacf4199fdd642debb085f94703c41a18
      92b44015
  2. 27 10月, 2018 3 次提交
    • Y
      port folly::JemallocNodumpAllocator (#4534) · 5f5fddab
      Yi Wu 提交于
      Summary:
      Introduce `JemallocNodumpAllocator`, which allow exclusion of block cache usage from core dump. It utilize custom hook of jemalloc arena, and when jemalloc arena request memory from system, the allocator use the hook to set `MADV_DONTDUMP ` to the memory. The implementation is basically the same as `folly::JemallocNodumpAllocator`, except for some minor difference:
      1. It only support jemalloc >= 5.0
      2. When the allocator destruct, it explicitly destruct the corresponding arena via `arena.<i>.destroy` via `mallctl`.
      
      Depending on #4502.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4534
      
      Differential Revision: D10435474
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: e80edea755d3853182485d2be710376384ce0bb4
      5f5fddab
    • Y
      Enable atomic flush (#4023) · 5b4c709f
      Yanqin Jin 提交于
      Summary:
      Adds a DB option `atomic_flush` to control whether to enable this feature. This PR is a subset of [PR 3752](https://github.com/facebook/rocksdb/pull/3752).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4023
      
      Differential Revision: D8518381
      
      Pulled By: riversand963
      
      fbshipit-source-id: 1e3bb33e99bb102876a31b378d93b0138ff6634f
      5b4c709f
    • Y
      s/CacheAllocator/MemoryAllocator/g (#4590) · f560c8f5
      Yi Wu 提交于
      Summary:
      Rename the interface, as it is mean to be a generic interface for memory allocation.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4590
      
      Differential Revision: D10866340
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 85cb753351a40cb856c046aeaa3f3b369eef3d16
      f560c8f5
  3. 26 10月, 2018 1 次提交
    • A
      Cache fragmented range tombstones in BlockBasedTableReader (#4493) · 7528130e
      Abhishek Madan 提交于
      Summary:
      This allows tombstone fragmenting to only be performed when the table is opened, and cached for subsequent accesses.
      
      On the same DB used in #4449, running `readrandom` results in the following:
      ```
      readrandom   :       0.983 micros/op 1017076 ops/sec;   78.3 MB/s (63103 of 100000 found)
      ```
      
      Now that Get performance in the presence of range tombstones is reasonable, I also compared the performance between a DB with range tombstones, "expanded" range tombstones (several point tombstones that cover the same keys the equivalent range tombstone would cover, a common workaround for DeleteRange), and no range tombstones. The created DBs had 5 million keys each, and DeleteRange was called at regular intervals (depending on the total number of range tombstones being written) after 4.5 million Puts. The table below summarizes the results of a `readwhilewriting` benchmark (in order to provide somewhat more realistic results):
      ```
         Tombstones?    | avg micros/op | stddev micros/op |  avg ops/s   | stddev ops/s
      ----------------- | ------------- | ---------------- | ------------ | ------------
      None              |        0.6186 |          0.04637 | 1,625,252.90 | 124,679.41
      500 Expanded      |        0.6019 |          0.03628 | 1,666,670.40 | 101,142.65
      500 Unexpanded    |        0.6435 |          0.03994 | 1,559,979.40 | 104,090.52
      1k Expanded       |        0.6034 |          0.04349 | 1,665,128.10 | 125,144.57
      1k Unexpanded     |        0.6261 |          0.03093 | 1,600,457.50 |  79,024.94
      5k Expanded       |        0.6163 |          0.05926 | 1,636,668.80 | 154,888.85
      5k Unexpanded     |        0.6402 |          0.04002 | 1,567,804.70 | 100,965.55
      10k Expanded      |        0.6036 |          0.05105 | 1,667,237.70 | 142,830.36
      10k Unexpanded    |        0.6128 |          0.02598 | 1,634,633.40 |  72,161.82
      25k Expanded      |        0.6198 |          0.04542 | 1,620,980.50 | 116,662.93
      25k Unexpanded    |        0.5478 |          0.0362  | 1,833,059.10 | 121,233.81
      50k Expanded      |        0.5104 |          0.04347 | 1,973,107.90 | 184,073.49
      50k Unexpanded    |        0.4528 |          0.03387 | 2,219,034.50 | 170,984.32
      ```
      
      After a large enough quantity of range tombstones are written, range tombstone Gets can become faster than reading from an equivalent DB with several point tombstones.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4493
      
      Differential Revision: D10842844
      
      Pulled By: abhimadan
      
      fbshipit-source-id: a7d44534f8120e6aabb65779d26c6b9df954c509
      7528130e
  4. 25 10月, 2018 8 次提交
    • Z
      Fix two contrun job failures (#4587) · fe0d2305
      Zhongyi Xie 提交于
      Summary:
      Currently there are two contrun test failures:
      * rocksdb-contrun-lite:
      > tools/db_bench_tool.cc: In function ‘int rocksdb::db_bench_tool(int, char**)’:
      tools/db_bench_tool.cc:5814:5: error: ‘DumpMallocStats’ is not a member of ‘rocksdb’
           rocksdb::DumpMallocStats(&stats_string);
           ^
      make: *** [tools/db_bench_tool.o] Error 1
      * rocksdb-contrun-unity:
      > In file included from unity.cc:44:0:
      db/range_tombstone_fragmenter.cc: In member function ‘void rocksdb::FragmentedRangeTombstoneIterator::FragmentTombstones(std::unique_ptr<rocksdb::InternalIteratorBase<rocksdb::Slice> >, rocksdb::SequenceNumber)’:
      db/range_tombstone_fragmenter.cc:90:14: error: reference to ‘ParsedInternalKeyComparator’ is ambiguous
         auto cmp = ParsedInternalKeyComparator(icmp_);
      
      This PR will fix them
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4587
      
      Differential Revision: D10846554
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 8d3358879e105060197b1379c84aecf51b352b93
      fe0d2305
    • Y
      Remove unused variable · eb8c9918
      Yanqin Jin 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4585
      
      Differential Revision: D10841983
      
      Pulled By: riversand963
      
      fbshipit-source-id: 6a7e0b40065bcfbb10a2cac0cec1e8da0750a617
      eb8c9918
    • J
      WriteBufferManager JNI fixes (#4579) · 6ecd26af
      Jigar Bhati 提交于
      Summary:
      1. `WriteBufferManager` should have a reference alive in Java side through `Options`/`DBOptions` otherwise, if it's GC'ed at java side, native side can seg fault.
      2. native method `setWriteBufferManager()` in `DBOptions.java` doesn't have it's jni method invocation in rocksdbjni which is added in this PR
      3. `DBOptionsTest.java` is referencing object of `Options`. Instead it should be testing against `DBOptions`. Seems like a copy paste error.
      4. Add a getter for WriteBufferManager.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4579
      
      Differential Revision: D10561150
      
      Pulled By: sagar0
      
      fbshipit-source-id: 139a15c7f051a9f77b4200215b88267b48fbc487
      6ecd26af
    • A
      Use only "local" range tombstones during Get (#4449) · 8c78348c
      Abhishek Madan 提交于
      Summary:
      Previously, range tombstones were accumulated from every level, which
      was necessary if a range tombstone in a higher level covered a key in a lower
      level. However, RangeDelAggregator::AddTombstones's complexity is based on
      the number of tombstones that are currently stored in it, which is wasteful in
      the Get case, where we only need to know the highest sequence number of range
      tombstones that cover the key from higher levels, and compute the highest covering
      sequence number at the current level. This change introduces this optimization, and
      removes the use of RangeDelAggregator from the Get path.
      
      In the benchmark results, the following command was used to initialize the database:
      ```
      ./db_bench -db=/dev/shm/5k-rts -use_existing_db=false -benchmarks=filluniquerandom -write_buffer_size=1048576 -compression_type=lz4 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -value_size=112 -key_size=16 -block_size=4096 -level_compaction_dynamic_level_bytes=true -num=5000000 -max_background_jobs=12 -benchmark_write_rate_limit=20971520 -range_tombstone_width=100 -writes_per_range_tombstone=100 -max_num_range_tombstones=50000 -bloom_bits=8
      ```
      
      ...and the following command was used to measure read throughput:
      ```
      ./db_bench -db=/dev/shm/5k-rts/ -use_existing_db=true -benchmarks=readrandom -disable_auto_compactions=true -num=5000000 -reads=100000 -threads=32
      ```
      
      The filluniquerandom command was only run once, and the resulting database was used
      to measure read performance before and after the PR. Both binaries were compiled with
      `DEBUG_LEVEL=0`.
      
      Readrandom results before PR:
      ```
      readrandom   :       4.544 micros/op 220090 ops/sec;   16.9 MB/s (63103 of 100000 found)
      ```
      
      Readrandom results after PR:
      ```
      readrandom   :      11.147 micros/op 89707 ops/sec;    6.9 MB/s (63103 of 100000 found)
      ```
      
      So it's actually slower right now, but this PR paves the way for future optimizations (see #4493).
      
      ----
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4449
      
      Differential Revision: D10370575
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 9a2e152be1ef36969055c0e9eb4beb0d96c11f4d
      8c78348c
    • Z
      use per-level perf context for bloom filter related counters (#4581) · 21bf7421
      Zhongyi Xie 提交于
      Summary:
      PR https://github.com/facebook/rocksdb/pull/4226 introduced per-level perf context which allows breaking down perf context by levels.
      This PR takes advantage of the feature to populate a few counters related to bloom filters
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4581
      
      Differential Revision: D10518010
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 011244561783ec860d32d5b0fa6bce6e78d70ef8
      21bf7421
    • S
      Set WriteCommitted txn id to commit sequence number (#4565) · ad21b1af
      Simon Grätzer 提交于
      Summary:
      SetId and GetId are the experimental API that so far being used in WritePrepared and WriteUnPrepared transactions, where the id is assigned at the prepare time. The patch extends the API to WriteCommitted transactions, by setting the id at commit time.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4565
      
      Differential Revision: D10557862
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 2b27a140682b6185a4988fa88f8152628e0d67af
      ad21b1af
    • S
      Add missing methods to WritableFileWrapper (#4584) · abb8ecb4
      Sagar Vemuri 提交于
      Summary:
      `WritableFileWrapper` was missing some newer methods that were added to `WritableFile`. Without these functions, the missing wrapper methods would fallback to using the default implementations in WritableFile instead of using the corresponding implementations in, say, `PosixWritableFile` or `WinWritableFile`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4584
      
      Differential Revision: D10559199
      
      Pulled By: sagar0
      
      fbshipit-source-id: 0d0f18a486aee727d5b8eebd3110a41988e27391
      abb8ecb4
    • Y
      option to print malloc stats at the end of db_bench (#4582) · 0415244b
      Yi Wu 提交于
      Summary:
      Option to print malloc stats to stdout at the end of db_bench. This is different from `--dump_malloc_stats`, which periodically print the same information to LOG file.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4582
      
      Differential Revision: D10520814
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: beff5e514e414079d31092b630813f82939ffe5c
      0415244b
  5. 24 10月, 2018 5 次提交
    • N
      Adapt three unit tests with newer compiler/libraries (#4562) · 43dbd441
      Neil Mayhew 提交于
      Summary:
      This fixes three tests that fail with relatively recent tools and libraries:
      
      The tests are:
      
      * `spatial_db_test`
      * `table_test`
      * `db_universal_compaction_test`
      
      I'm using:
      
      * `gcc` 7.3.0
      * `glibc` 2.27
      * `snappy` 1.1.7
      * `gflags` 2.2.1
      * `zlib` 1.2.11
      * `bzip2` 1.0.6.0.1
      * `lz4` 1.8.2
      * `jemalloc` 5.0.1
      
      The versions used in the Travis environment (which is two Ubuntu LTS versions behind the current one and doesn't use `lz4` or `jemalloc`) don't seem to have a problem. However, to be safe, I verified that these tests pass with and without my changes in a trusty Docker container without `lz4` and `jemalloc`.
      
      However, I do get an unrelated set of other failures when using a trusty Docker container that uses `lz4` and `jemalloc`:
      
      ```
      db/db_universal_compaction_test.cc:506: Failure
      Value of: num + 1
        Actual: 3
      Expected: NumSortedRuns(1)
      Which is: 4
      [  FAILED  ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/0, where GetParam() = (1, false) (1189 ms)
      [ RUN      ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/1
      db/db_universal_compaction_test.cc:506: Failure
      Value of: num + 1
        Actual: 3
      Expected: NumSortedRuns(1)
      Which is: 4
      [  FAILED  ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/1, where GetParam() = (1, true) (1246 ms)
      [ RUN      ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/2
      db/db_universal_compaction_test.cc:506: Failure
      Value of: num + 1
        Actual: 3
      Expected: NumSortedRuns(1)
      Which is: 4
      [  FAILED  ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/2, where GetParam() = (3, false) (1237 ms)
      [ RUN      ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/3
      db/db_universal_compaction_test.cc:506: Failure
      Value of: num + 1
        Actual: 3
      Expected: NumSortedRuns(1)
      Which is: 4
      [  FAILED  ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/3, where GetParam() = (3, true) (1195 ms)
      [ RUN      ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/4
      db/db_universal_compaction_test.cc:506: Failure
      Value of: num + 1
        Actual: 3
      Expected: NumSortedRuns(1)
      Which is: 4
      [  FAILED  ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/4, where GetParam() = (5, false) (1161 ms)
      [ RUN      ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/5
      db/db_universal_compaction_test.cc:506: Failure
      Value of: num + 1
        Actual: 3
      Expected: NumSortedRuns(1)
      Which is: 4
      [  FAILED  ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/5, where GetParam() = (5, true) (1229 ms)
      ```
      
      I haven't attempted to fix these since I'm not using trusty and Travis doesn't use `lz4` and `jemalloc`. However, the final commit in this PR does at least fix the compilation errors that occur when using trusty's version of `lz4`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4562
      
      Differential Revision: D10510917
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 59534042015ec339270e5fc2f6ac4d859370d189
      43dbd441
    • Z
      fix clang analyzer error (#4583) · f6b151f1
      Zhongyi Xie 提交于
      Summary:
      clang analyzer currently fails with the following warnings:
      > db/log_reader.cc:323:9: warning: Undefined or garbage value returned to caller
              return r;
              ^~~~~~~~
      db/log_reader.cc:344:11: warning: Undefined or garbage value returned to caller
                return r;
                ^~~~~~~~
      db/log_reader.cc:369:11: warning: Undefined or garbage value returned to caller
                return r;
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4583
      
      Differential Revision: D10523517
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 0cc8b8f27657b202bead148bbe7c4aa84fed095b
      f6b151f1
    • Y
      BlobDB: handle IO error on write (#4580) · c7a45ca9
      Yi Wu 提交于
      Summary:
      A fix similar to #4410 but on the write path. On IO error on `SelectBlobFile()` we didn't return error code properly, but simply a nullptr of `BlobFile`. The `AppendBlob()` method didn't have null check for the pointer and caused crash. The fix make sure we properly return error code in this case.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4580
      
      Differential Revision: D10513849
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 80bca920d1d7a3541149de981015ad83e0aa14b5
      c7a45ca9
    • Y
      Fix compile error with aligned-new (#4576) · 742302a1
      Yi Wu 提交于
      Summary:
      In fbcode when we build with clang7++, although -faligned-new is available in compile phase, we link with an older version of libstdc++.a and it doesn't come with aligned-new support (e.g. `nm libstdc++.a | grep align_val_t` return empty). In this case the previous -faligned-new detection can pass but will end up with link error. Fixing it by only have the detection for non-fbcode build.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4576
      
      Differential Revision: D10500008
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: b375de4fbb61d2a08e54ab709441aa8e7b4b08cf
      742302a1
    • J
      Small issues (#4564) · d1c0d3f3
      jsteemann 提交于
      Summary:
      Couple of very minor improvements (typos in comments, full qualification of class name, reordering members of a struct to make it smaller)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4564
      
      Differential Revision: D10510183
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: c7ddf9bfbf2db08cd31896c3fd93789d3fa68c8b
      d1c0d3f3
  6. 23 10月, 2018 2 次提交
    • M
      Fix user comparator receiving internal key (#4575) · c34cc404
      Maysam Yabandeh 提交于
      Summary:
      There was a bug that the user comparator would receive the internal key instead of the user key. The bug was due to RangeMightExistAfterSortedRun expecting user key but receiving internal key when called in GenerateBottommostFiles. The patch augment an existing unit test to reproduce the bug and fixes it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4575
      
      Differential Revision: D10500434
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 858346d2fd102cce9e20516d77338c112bdfe366
      c34cc404
    • S
      Dynamic level to adjust level multiplier when write is too heavy (#4338) · 70242636
      Siying Dong 提交于
      Summary:
      Level compaction usually performs poorly when the writes so heavy that the level targets can't be guaranteed. With this improvement, we improve level_compaction_dynamic_level_bytes = true so that in the write heavy cases, the level multiplier can be slightly adjusted based on the size of L0.
      
      We keep the behavior the same if number of L0 files is under 2X compaction trigger and the total size is less than options.max_bytes_for_level_base, so that unless write is so heavy that compaction cannot keep up, the behavior doesn't change.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4338
      
      Differential Revision: D9636782
      
      Pulled By: siying
      
      fbshipit-source-id: e27fc17a7c29c84b00064cc17536a01dacef7595
      70242636
  7. 22 10月, 2018 1 次提交
    • Y
      Fix RepeatableThreadTest::MockEnvTest hang (#4560) · 933250e3
      Yi Wu 提交于
      Summary:
      When `MockTimeEnv` is used in test to mock time methods, we cannot use `CondVar::TimedWait` because it is using real time, not the mocked time for wait timeout. On Mac the method can return immediately without awaking other waiting threads, if the real time is larger than `wait_until` (which is a mocked time). When that happen, the `wait()` method will fall into an infinite loop.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4560
      
      Differential Revision: D10472851
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 898902546ace7db7ac509337dd8677a527209d19
      933250e3
  8. 20 10月, 2018 3 次提交
    • S
      Fix printf formatting on MacOS (#4533) · f959e880
      Simon Grätzer 提交于
      Summary:
      On MacOS with clang the compilation of _tools/db_bench_tool.cc_ always fails because the format used in a `fprintf` call has the wrong type. This PR should hopefully fix this issue
      ```
      tools/db_bench_tool.cc:4233:61: error: format specifies type 'unsigned long long' but the argument has type 'size_t' (aka 'unsigned long')
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4533
      
      Differential Revision: D10471657
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f20f5f3756d3571b586c895c845d0d4d1e34a398
      f959e880
    • S
      Fix WriteBatchWithIndex's SeekForPrev() (#4559) · c17383f9
      Siying Dong 提交于
      Summary:
      WriteBatchWithIndex's SeekForPrev() has a bug that we internally place the position just before the seek key rather than after. This makes the iterator to miss the result that is the same as the seek key. Fix it by position the iterator equal or smaller.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4559
      
      Differential Revision: D10468534
      
      Pulled By: siying
      
      fbshipit-source-id: 2fb371ae809c561b60a1c11cef71e1c66fea1f19
      c17383f9
    • Y
      Add read retry support to log reader (#4394) · da4aa59b
      Yanqin Jin 提交于
      Summary:
      Current `log::Reader` does not perform retry after encountering `EOF`. In the future, we need the log reader to be able to retry tailing the log even after `EOF`.
      
      Current implementation is simple. It does not provide more advanced retry policies. Will address this in the future.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4394
      
      Differential Revision: D9926508
      
      Pulled By: riversand963
      
      fbshipit-source-id: d86d145792a41bd64a72f642a2a08c7b7b5201e1
      da4aa59b
  9. 19 10月, 2018 4 次提交
  10. 18 10月, 2018 3 次提交
    • J
      Plumb WriteBufferManager through JNI (#4492) · a4d9aa6b
      Jigar Bhati 提交于
      Summary:
      Allow rocks java to explicitly create WriteBufferManager by plumbing it to the native code through JNI.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4492
      
      Differential Revision: D10428506
      
      Pulled By: sagar0
      
      fbshipit-source-id: cd9dd8c2ef745a0303416b44e2080547bdcca1fd
      a4d9aa6b
    • A
      Lazily initialize RangeDelAggregator stripe map entries (#4497) · 45f213b5
      Abhishek Madan 提交于
      Summary:
      When there are no range deletions, flush and compaction perform a binary search
      on an effectively empty map every time they call ShouldDelete. This PR lazily
      initializes each stripe map entry so that the binary search can be elided in
      these cases.
      
      After this PR, the total amount of time spent in compactions is 52.541331s, and the total amount of time spent in flush is 5.532608s, the former of which is a significant improvement from the results after #4495.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4497
      
      Differential Revision: D10428610
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 6f7e1ce3698fac3ef86d1197955e6b72e0931a0f
      45f213b5
    • Z
      Add PerfContextByLevel to provide per level perf context information (#4226) · d6ec2887
      Zhongyi Xie 提交于
      Summary:
      Current implementation of perf context is level agnostic. Making it hard to do performance evaluation for the LSM tree. This PR adds `PerfContextByLevel` to decompose the counters by level.
      This will be helpful when analyzing point and range query performance as well as tuning bloom filter
      Also replaced __thread with thread_local keyword for perf_context
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4226
      
      Differential Revision: D10369509
      
      Pulled By: miasantreble
      
      fbshipit-source-id: f1ced4e0de5fcebdb7f9cff36164516bc6382d82
      d6ec2887
  11. 16 10月, 2018 6 次提交
    • A
      Properly determine a truncated CompactRange stop key (#4496) · 1e384580
      anand1976 提交于
      Summary:
      When a CompactRange() call for a level is truncated before the end key
      is reached, because it exceeds max_compaction_bytes, we need to properly
      set the compaction_end parameter to indicate the stop key. The next
      CompactRange will use that as the begin key. We set it to the smallest
      key of the next file in the level after expanding inputs to get a clean
      cut.
      
      Previously, we were setting it before expanding inputs. So we could end
      up recompacting some files. In a pathological case, where a single key
      has many entries spanning all the files in the level (possibly due to
      merge operands without a partial merge operator, thus resulting in
      compaction output identical to the input), this would result in
      an endless loop over the same set of files.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4496
      
      Differential Revision: D10395026
      
      Pulled By: anand1976
      
      fbshipit-source-id: f0c2f89fee29b4b3be53b6467b53abba8e9146a9
      1e384580
    • Y
      Add support to flush multiple CFs atomically (#4262) · e633983c
      Yanqin Jin 提交于
      Summary:
      Leverage existing `FlushJob` to implement atomic flush of multiple column families.
      
      This PR depends on other PRs and is a subset of #3752 . This PR itself is not sufficient in fulfilling atomic flush.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4262
      
      Differential Revision: D9283109
      
      Pulled By: riversand963
      
      fbshipit-source-id: 65401f913e4160b0a61c0be6cd02adc15dad28ed
      e633983c
    • A
      Avoid per-key linear scan over snapshots in compaction (#4495) · 32b4d4ad
      Andrew Kryczka 提交于
      Summary:
      `CompactionIterator::snapshots_` is ordered by ascending seqnum, just like `DBImpl`'s linked list of snapshots from which it was copied. This PR exploits this ordering to make `findEarliestVisibleSnapshot` do binary search rather than linear scan. This can make flush/compaction significantly faster when many snapshots exist since that function is called on every single key.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4495
      
      Differential Revision: D10386470
      
      Pulled By: ajkr
      
      fbshipit-source-id: 29734991631227b6b7b677e156ac567690118a8b
      32b4d4ad
    • M
      Update WritePrepared blog post with latest results (#4494) · 0f955f2a
      Maysam Yabandeh 提交于
      Summary:
      WritePrepared is declared production ready (overdue update) and the benchmark results are also reported.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4494
      
      Differential Revision: D10385336
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 662672ddfa286aa46af544f505b4d4b7a882d408
      0f955f2a
    • Y
      Replace 'string' with 'const string&' in FileOperationInfo (#4491) · ce522746
      Yanqin Jin 提交于
      Summary:
      Using const string& can avoid one extra string copy. This PR addresses a recent comment made by siying  on #3933.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4491
      
      Differential Revision: D10381211
      
      Pulled By: riversand963
      
      fbshipit-source-id: 27fc2d65d84bc7cd07833c77cdc47f06dcfaeb31
      ce522746
    • Y
      Set -DROCKSDB_JEMALLOC for buck build if jemalloc presents (#4489) · f60c4e5a
      Yi Wu 提交于
      Summary:
      Set the macro if default allocator is jemalloc. It doesn't handle the case when allocator is specified, e.g.
      ```
      cpp_binary(
          name="xxx"
          allocator="jemalloc", # or "malloc" or something else
          deps=["//rocksdb:rocksdb"],
      )
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4489
      
      Differential Revision: D10363683
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 5da490336a8e78e0feb0900c29e8036e7ec6f12b
      f60c4e5a
  12. 13 10月, 2018 3 次提交