1. 28 9月, 2018 2 次提交
  2. 27 9月, 2018 3 次提交
    • S
      assert in PosixEnv::FileExists should be based on errno (#4427) · b1dad4cf
      Sagar Vemuri 提交于
      Summary:
      The assert in PosixEnv::FileExists is currently based on the return value of `access` syscall. Instead it should be based on errno.
      
      Initially I wanted to remove this assert as [`access`](https://linux.die.net/man/2/access) can error out in a few other cases (like EROFS). But on thinking more it feels like the assert is doing the right thing ...  its good to crash on EROFS, EFAULT, EINVAL, and other major filesystem related problems so that the user is immediately aware of the problems while testing.
      (I think it might be ok to crash on EIO as well, but there might be a specific reason why it was decided not to crash for EIO, and I don't have that context. So letting the letting the assert checks remain as is for now).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4427
      
      Differential Revision: D10037200
      
      Pulled By: sagar0
      
      fbshipit-source-id: 5cc96116a2e53cef701f444a8b5290576f311e51
      b1dad4cf
    • A
      Fix benchmark script with vector memtable (#4428) · d56070d8
      Andrew Kryczka 提交于
      Summary:
      I guess we didn't update this script when `--allow_concurrent_memtable_write` became true by default.
      
      Fixes #4413.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4428
      
      Differential Revision: D10036452
      
      Pulled By: ajkr
      
      fbshipit-source-id: f464be0642bd096d9040f82cdc3eae614a902183
      d56070d8
    • Y
      Improve log handling when recover without flush (#4405) · dc813e4b
      Yi Wu 提交于
      Summary:
      Improve log handling when avoid_flush_during_recovery=true.
      1. restore total_log_size_ after recovery, by summing up existing log sizes. Fixes #4253.
      2. truncate the last existing log, since this log can contain preallocated space and it will be a waste to keep the space. It avoids a crash loop of user application cause a lot of log with non-trivial size being created and ultimately take up all disk space.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4405
      
      Differential Revision: D9953933
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 967780fee8acec7f358b6eb65190fb4684f82e56
      dc813e4b
  3. 26 9月, 2018 2 次提交
    • N
      Handle tombstones at the same seqno in the CollapsedRangeDelMap (#4424) · 17edc82a
      Nikhil Benesch 提交于
      Summary:
      The CollapsedRangeDelMap was entirely mishandling tombstones at the same
      sequence number when the tombstones did not have identical start and end
      keys. Such tombstones are common since 90fc4069, which causes
      tombstones to be split during compactions.
      
      For example, if the tombstone [a, c) @ 1 lies across a compaction
      boundary at b, it will be split into [a, b) @ 1 and [b, c) @ 1. Without
      this patch, the collapsed range deletion map would look like this:
      
        a -> 1
        b -> 1
        c -> 0
      
      Notice how the b -> 1 entry is redundant. When the tombstones overlap,
      the problem is even worse. Consider tombstones [a, c) @ 1 and [b, d) @
      1, which produces this map without this patch:
      
        a -> 1
        b -> 1
        c -> 0
        d -> 0
      
      This map is corrupt, as a map can never contain adjacent sentinel (zero)
      entries. When the iterator advances from b to c, it will notice that c
      is a sentinel enty and skip to d--but d is also a sentinel entry! Asking
      what tombstone this iterator points to will trigger an assertion, as it
      is not pointing to a valid tombstone.
      
      /cc ajkr
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4424
      
      Differential Revision: D10039248
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 6d737c1e88d60e80cf27286726627ba44463e7f4
      17edc82a
    • Y
      Update TARGETS file template (#4426) · 31d46993
      Yi Wu 提交于
      Summary:
      Update template of TARGETS file according to recent changes in #4371 , #4363 and https://github.com/facebook/rocksdb/commit/dbf44c314b4adf3276afc1ca797b88944ca3162c.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4426
      
      Differential Revision: D10025053
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: e6a0a702bfd401fc1af240ee446f5690f0bcd85d
      31d46993
  4. 22 9月, 2018 1 次提交
    • A
      Improve RangeDelAggregator benchmarks (#4395) · 3c350a7c
      Abhishek Madan 提交于
      Summary:
      Improve time measurements for AddTombstones to only include the
      call and not the VectorIterator setup. Also add a new
      add_tombstones_per_run flag to call AddTombstones multiple times per
      aggregator, which will help simulate more realistic workloads.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4395
      
      Differential Revision: D9996811
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 5865a95c323fbd9b3606493013664b4890fe5a02
      3c350a7c
  5. 21 9月, 2018 2 次提交
  6. 20 9月, 2018 3 次提交
    • C
      add GetAggregatedLongProperty for Java API (#4379) · 02dc0749
      Chen, You 提交于
      Summary:
      Add Java API `getAggregatedLongProperty(final String property)`
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4379
      
      Differential Revision: D9921463
      
      Pulled By: sagar0
      
      fbshipit-source-id: a02512e1b2aff4765a10b77de9a7bf7b1909d954
      02dc0749
    • A
      Generate appropriate number of keys in db_bench (#4404) · 519f8b14
      Abhishek Madan 提交于
      Summary:
      If range tombstones are generated every few writes, the
      KeyGenerator's limit is now extended to account for the additional
      Next() calls. This is primarily important for `filluniquerandom`
      benchmarks that enforce the call limit.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4404
      
      Differential Revision: D9949326
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 0bdfeb2cad2098dc0b8b029236dab5e4bef25e38
      519f8b14
    • Z
      add missing range in random.choice argument (#4397) · 9b3cf908
      Zhongyi Xie 提交于
      Summary:
      This will fix the broken asan crash test:
      > Traceback (most recent call last):
        File "tools/db_crashtest.py", line 384, in <module>
          main()
        File "tools/db_crashtest.py", line 368, in main
          parser.add_argument("--" + k, type=type(v() if callable(v) else v))
        File "tools/db_crashtest.py", line 59, in <lambda>
          "index_block_restart_interval": lambda: random.choice(1, 16),
      TypeError: choice() takes exactly 2 arguments (3 given)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4397
      
      Differential Revision: D9933041
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 10998e5bc6b6a5cea3e4088b18465affc246e639
      9b3cf908
  7. 19 9月, 2018 4 次提交
    • M
      Extend crash test with index_block_restart_interval (#4383) · a0ebec38
      Maysam Yabandeh 提交于
      Summary:
      The default for index_block_restart_interval is 1 but some use 16 in production. The patch extends crash test to test both values.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4383
      
      Differential Revision: D9887304
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: a8d00fea974a79ad563f9f4d9d7b069e9f746a8f
      a0ebec38
    • F
      Fix issue with docs/feed.xml validation (#4392) · 886766c3
      Fosco Marotto 提交于
      Summary:
      Per #4387 this should address the validation error with the link tag.  This is a quick fix, a future iteration could significantly upgrade the jekyll integration.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4392
      
      Differential Revision: D9923643
      
      Pulled By: gfosco
      
      fbshipit-source-id: e7ed478e55c907add8319290326540e6e44fc0d6
      886766c3
    • A
      Unit test for custom comparator RangeDelAggregator (#4388) · 990b52e9
      Andrew Kryczka 提交于
      Summary:
      Add a unit test for range collapsing when non-default comparator is used. This exposes the bug fixed in #4386.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4388
      
      Differential Revision: D9918252
      
      Pulled By: ajkr
      
      fbshipit-source-id: 99501b96b251eab41791a7e33b27055ee36c5c39
      990b52e9
    • J
      use specified comparator in CollapsedRangeDelMap (#4386) · 27221b0c
      jsteemann 提交于
      Summary:
      The Comparator passed to CollapsedRangeDelMap was not used for
      operator less of the std::map `rep_` object contained in
      CollapsedRangeDelMap. So the map was always sorted using the
      default ByteWiseComparator, which seems wrong.
      
      Passing the specified Comparator through for usage in that map
      object fixes actual problems we were seeing with RangeDelete operations
      that do not delete keys as expected when using a custom Comparator.
      
      I found that the tests in current master crash when I run them locally,
      both with and without my patch, at the very same location. I therefore
      don't know if the patch breaks something else, but it seems to fix
      RangeDeletion issues in our product that uses RocksDB.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4386
      
      Differential Revision: D9916506
      
      Pulled By: ajkr
      
      fbshipit-source-id: 27bff8c775831f089dde8c5289df7343d88b2d66
      27221b0c
  8. 18 9月, 2018 5 次提交
    • M
      Fix bug in partition filters with format_version=4 (#4381) · 65ac72ed
      Maysam Yabandeh 提交于
      Summary:
      Value delta encoding in format_version 4 requires the differences between the size of two consecutive handles to be sent to BlockBuilder::Add. This applies not only to indexes on blocks but also the indexes on indexes and filters in partitioned indexes and filters respectively. The patch fixes a bug where the partitioned filters would encode the entire size of the handle rather than the difference of the size with the last size.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4381
      
      Differential Revision: D9879505
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 27a22e49b482b927fbd5629dc310c46d63d4b6d1
      65ac72ed
    • A
      Add RangeDelAggregator microbenchmarks (#4363) · 1626f6ab
      Abhishek Madan 提交于
      Summary:
      To measure the results of upcoming DeleteRange v2 work, this commit adds
      simple benchmarks for RangeDelAggregator. It measures the average time
      for AddTombstones and ShouldDelete calls.
      
      Using this to compare the results before #4014 and on the latest master (using the default arguments) produces the following results:
      
      Before #4014:
      ```
      =======================
      Results:
      =======================
      AddTombstones:          1356.28 us
      ShouldDelete:           0.401732 us
      ```
      
      Latest master:
      ```
      =======================
      Results:
      =======================
      AddTombstones:          740.82 us
      ShouldDelete:           0.383271 us
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4363
      
      Differential Revision: D9881676
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 793e7d61aa4b9d47eb917bbcc03f08695b5e5442
      1626f6ab
    • A
      Fix regression test failures introduced by PR #4164 (#4375) · 30c21df9
      Anand Ananthabhotla 提交于
      Summary:
      1. Add override keyword to overridden virtual functions in EventListener
      2. Fix a memory corruption that can happen during DB shutdown when in
      read-only mode due to a background write error
      3. Fix uninitialized buffers in error_handler_test.cc that cause
      valgrind to complain
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4375
      
      Differential Revision: D9875779
      
      Pulled By: anand1976
      
      fbshipit-source-id: 022ede1edc01a9f7e21ecf4c61ef7d46545d0640
      30c21df9
    • A
      Support manual flush in stress/crash tests (#4368) · 8c252046
      Andrew Kryczka 提交于
      Summary:
      - Made stress test call `Flush()` periodically according to `--flush_one_in` flag.
      - Enabled by default in crash test.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4368
      
      Differential Revision: D9838593
      
      Pulled By: ajkr
      
      fbshipit-source-id: fe5a6e49b36e5ea752acc3aa8be364f8ef34d9cc
      8c252046
    • S
      Fix sync-point comment in Block destructor (#4380) · ac467903
      Sagar Vemuri 提交于
      Summary:
      This is a follow up to #4370. The earlier comment is not correct.
      
      Thanks to ajkr for pointing this out.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4380
      
      Differential Revision: D9874667
      
      Pulled By: sagar0
      
      fbshipit-source-id: f4e092d86b29c715258210b770643d367e38caae
      ac467903
  9. 16 9月, 2018 2 次提交
    • A
      Remove trace_analyzer_tool.cc from rocksdb_lib buck target (#4371) · dfda9102
      Anand Ananthabhotla 提交于
      Summary:
      Including tools/trace_analyzer_tool.cc in rocksdb_lib was causing conflicts in dependent binaries due to duplicate gflag (other_prefix).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4371
      
      Differential Revision: D9846953
      
      Pulled By: anand1976
      
      fbshipit-source-id: 80b4aa36ab8428b8f6dceb896c45532684102709
      dfda9102
    • A
      Auto recovery from out of space errors (#4164) · a27fce40
      Anand Ananthabhotla 提交于
      Summary:
      This commit implements automatic recovery from a Status::NoSpace() error
      during background operations such as write callback, flush and
      compaction. The broad design is as follows -
      1. Compaction errors are treated as soft errors and don't put the
      database in read-only mode. A compaction is delayed until enough free
      disk space is available to accomodate the compaction outputs, which is
      estimated based on the input size. This means that users can continue to
      write, and we rely on the WriteController to delay or stop writes if the
      compaction debt becomes too high due to persistent low disk space
      condition
      2. Errors during write callback and flush are treated as hard errors,
      i.e the database is put in read-only mode and goes back to read-write
      only fater certain recovery actions are taken.
      3. Both types of recovery rely on the SstFileManagerImpl to poll for
      sufficient disk space. We assume that there is a 1-1 mapping between an
      SFM and the underlying OS storage container. For cases where multiple
      DBs are hosted on a single storage container, the user is expected to
      allocate a single SFM instance and use the same one for all the DBs. If
      no SFM is specified by the user, DBImpl::Open() will allocate one, but
      this will be one per DB and each DB will recover independently. The
      recovery implemented by SFM is as follows -
        a) On the first occurance of an out of space error during compaction,
      subsequent
        compactions will be delayed until the disk free space check indicates
        enough available space. The required space is computed as the sum of
        input sizes.
        b) The free space check requirement will be removed once the amount of
        free space is greater than the size reserved by in progress
        compactions when the first error occured
        c) If the out of space error is a hard error, a background thread in
        SFM will poll for sufficient headroom before triggering the recovery
        of the database and putting it in write-only mode. The headroom is
        calculated as the sum of the write_buffer_size of all the DB instances
        associated with the SFM
      4. EventListener callbacks will be called at the start and completion of
      automatic recovery. Users can disable the auto recov ery in the start
      callback, and later initiate it manually by calling DB::Resume()
      
      Todo:
      1. More extensive testing
      2. Add disk full condition to db_stress (follow-on PR)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
      
      Differential Revision: D9846378
      
      Pulled By: anand1976
      
      fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
      a27fce40
  10. 15 9月, 2018 6 次提交
  11. 14 9月, 2018 3 次提交
  12. 13 9月, 2018 1 次提交
    • M
      Reduce IndexBlockIter size (#4358) · 9ea9007b
      Maysam Yabandeh 提交于
      Summary:
      With #3983 the size of IndexBlockIter was increased. This had resulted in a regression on P50 latencies in one of our benchmarks. The patch reduces IndexBlockIter size be eliminating active_comparator_ field from the class.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4358
      
      Differential Revision: D9781737
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 71e2b28d90ff0813db9e04b737ae73e185583c52
      9ea9007b
  13. 12 9月, 2018 3 次提交
    • D
      Initialize uninitialized std::atomic variables · ca92fc71
      Dan Melnic 提交于
      Summary: Initialize uninitialized std::atomic variables
      
      Reviewed By: yfeldblum
      
      Differential Revision: D9758050
      
      fbshipit-source-id: 865d89eddafc81f3cab6f11e2ebb669f7ff70d04
      ca92fc71
    • Y
      Fix Makefile target 'jtest' on PowerPC (#4357) · 3ba3b153
      Yanqin Jin 提交于
      Summary:
      Before the fix:
      On a PowerPC machine, run the following
      ```
      $ make jtest
      ```
      The command will fail due to "undefined symbol: crc32c_ppc". It was caused by
      'rocksdbjava' Makefile target not including crc32c_ppc object files when
      generating the shared lib. The fix is simple.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4357
      
      Differential Revision: D9779474
      
      Pulled By: riversand963
      
      fbshipit-source-id: 3c5ec9068c2b9c796e6500f71cd900267064fd51
      3ba3b153
    • P
      Lint TARGETS files with buildifier · dbf44c31
      Philip Jameson 提交于
      Summary: Build file formatting
      
      Reviewed By: mzlee
      
      Differential Revision: D9728238
      
      fbshipit-source-id: 99a266d5d2260eabfd63a200b2994c6850b59cf4
      dbf44c31
  14. 11 9月, 2018 2 次提交
    • A
      Restrict RangeDelAggregator's tombstone end-key truncation (#4356) · c86a22ac
      Abhishek Madan 提交于
      Summary:
      `RangeDelAggregator::AddTombstones` contained an assertion which stated that, if a range tombstone extended past the largest key in the sstable, then `FileMetaData::largest` must have a sentinel sequence number of `kMaxSequenceNumber`, which implies that the tombstone's end key is safe to truncate. However, `largest` will not be a sentinel key when the next sstable in the level's smallest key is equal to the current sstable's largest key, which caused the assertion to fail.
      
      The assertion must hold for the truncation to be safe, so it has been moved to an additional check on end-key truncation.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4356
      
      Differential Revision: D9760891
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 7c20c3885cd919dcd14f291f88fd27aa33defebc
      c86a22ac
    • M
      Skip concurrency control during recovery of pessimistic txn (#4346) · 3f528226
      Maysam Yabandeh 提交于
      Summary:
      TransactionOptions::skip_concurrency_control allows pessimistic transactions to skip the overhead of concurrency control. This could be as an optimization if the application knows that the transaction would not have any conflict with concurrent transactions. It is currently used during recovery assuming (i) application guarantees no conflict between prepared transactions in the WAL (ii) application guarantees that recovered transactions will be rolled back/commit before new transactions start.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4346
      
      Differential Revision: D9759149
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f896e84fa58b0b584be904c7fd3883a41ea3215b
      3f528226
  15. 08 9月, 2018 1 次提交