1. 10 10月, 2018 1 次提交
    • Z
      add locking around calls to RecalculateWriteStallConditions in column_family_test (#4474) · 283a700f
      Zhongyi Xie 提交于
      Summary:
      this should fix the current failing TSAN jobs:
      The callstack for TSAN:
      > WARNING: ThreadSanitizer: data race (pid=87440)
        Read of size 8 at 0x7d580000fce0 by thread T22 (mutexes: write M548703):
          #0 rocksdb::InternalStats::DumpCFStatsNoFileHistogram(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) db/internal_stats.cc:1204 (column_family_test+0x00000080eca7)
          #1 rocksdb::InternalStats::DumpCFStats(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) db/internal_stats.cc:1169 (column_family_test+0x0000008106d0)
          #2 rocksdb::InternalStats::HandleCFStats(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, rocksdb::Slice) db/internal_stats.cc:578 (column_family_test+0x000000810720)
          #3 rocksdb::InternalStats::GetStringProperty(rocksdb::DBPropertyInfo const&, rocksdb::Slice const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) db/internal_stats.cc:488 (column_family_test+0x00000080670c)
          #4 rocksdb::DBImpl::DumpStats() db/db_impl.cc:625 (column_family_test+0x00000070ce9a)
      
      >  Previous write of size 8 at 0x7d580000fce0 by main thread:
          #0 rocksdb::InternalStats::AddCFStats(rocksdb::InternalStats::InternalCFStatsType, unsigned long) db/internal_stats.h:324 (column_family_test+0x000000693bbf)
          #1 rocksdb::ColumnFamilyData::RecalculateWriteStallConditions(rocksdb::MutableCFOptions const&) db/column_family.cc:818 (column_family_test+0x000000693bbf)
          #2 rocksdb::ColumnFamilyTest_WriteStallSingleColumnFamily_Test::TestBody() db/column_family_test.cc:2563 (column_family_test+0x0000005e5a49)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4474
      
      Differential Revision: D10262099
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 1247973a3ca32e399b4575d3401dd5439c39efc5
      283a700f
  2. 09 10月, 2018 7 次提交
    • Z
      move dump stats to a separate thread (#4382) · cac87fcf
      Zhongyi Xie 提交于
      Summary:
      Currently statistics are supposed to be dumped to info log at intervals of `options.stats_dump_period_sec`. However the implementation choice was to bind it with compaction thread, meaning if the database has been serving very light traffic, the stats may not get dumped at all.
      We decided to separate stats dumping into a new timed thread using `TimerQueue`, which is already used in blob_db. This will allow us schedule new timed tasks with more deterministic behavior.
      
      Tested with db_bench using `--stats_dump_period_sec=20` in command line:
      > LOG:2018/09/17-14:07:45.575025 7fe99fbfe700 [WARN] [db/db_impl.cc:605] ------- DUMPING STATS -------
      LOG:2018/09/17-14:08:05.643286 7fe99fbfe700 [WARN] [db/db_impl.cc:605] ------- DUMPING STATS -------
      LOG:2018/09/17-14:08:25.691325 7fe99fbfe700 [WARN] [db/db_impl.cc:605] ------- DUMPING STATS -------
      LOG:2018/09/17-14:08:45.740989 7fe99fbfe700 [WARN] [db/db_impl.cc:605] ------- DUMPING STATS -------
      
      LOG content:
      > 2018/09/17-14:07:45.575025 7fe99fbfe700 [WARN] [db/db_impl.cc:605] ------- DUMPING STATS -------
      2018/09/17-14:07:45.575080 7fe99fbfe700 [WARN] [db/db_impl.cc:606]
      ** DB Stats **
      Uptime(secs): 20.0 total, 20.0 interval
      Cumulative writes: 4447K writes, 4447K keys, 4447K commit groups, 1.0 writes per commit group, ingest: 5.57 GB, 285.01 MB/s
      Cumulative WAL: 4447K writes, 0 syncs, 4447638.00 writes per sync, written: 5.57 GB, 285.01 MB/s
      Cumulative stall: 00:00:0.012 H:M:S, 0.1 percent
      Interval writes: 4447K writes, 4447K keys, 4447K commit groups, 1.0 writes per commit group, ingest: 5700.71 MB, 285.01 MB/s
      Interval WAL: 4447K writes, 0 syncs, 4447638.00 writes per sync, written: 5.57 MB, 285.01 MB/s
      Interval stall: 00:00:0.012 H:M:S, 0.1 percent
      ** Compaction Stats [default] **
      Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4382
      
      Differential Revision: D9933051
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 6d12bb1e4977674eea4bf2d2ac6d486b814bb2fa
      cac87fcf
    • F
      Update version macro for 5.17 (#4472) · 35f26bec
      Fosco Marotto 提交于
      Summary:
      Forgot this in previous commit.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4472
      
      Differential Revision: D10244227
      
      Pulled By: gfosco
      
      fbshipit-source-id: ba0cf7a2f5271f0d9f9443004e2620887cd5fd11
      35f26bec
    • D
      Fix DBImpl::GetColumnFamilyHandleUnlocked race condition (#4391) · 27090ae8
      DorianZheng 提交于
      Summary:
      - Fix DBImpl API race condition
      
      The timeline of execution flow is as follow:
      ```
      timeline              user_thread1                      user_thread2
      t1   |     cfh = GetColumnFamilyHandleUnlocked(0)
      t2   |     id1 = cfh->GetID()
      t3   |                                                GetColumnFamilyHandleUnlocked(1)
      t4   |     id2 = cfh->GetID()
           V
      ```
      The original implementation return a pointer to a stateful variable, so that the return `ColumnFamilyHandle` will be changed when another thread calls `GetColumnFamilyHandleUnlocked` with different `column family id`
      
      - Expose ColumnFamily ID to compaction event listener
      
      - Fix the return status of `DBImpl::GetLatestSequenceForKey`
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4391
      
      Differential Revision: D10221243
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: dec60ee9ff0c8261a2f2413a8506ec1063991993
      27090ae8
    • D
      Expose column family id to OnCompactionCompleted (#4466) · e0f05754
      DorianZheng 提交于
      Summary:
      The controller you requested could not be found. PTAL
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4466
      
      Differential Revision: D10241358
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 99664eb286860a6c8844d50efeb0ef6f0e10dd1e
      e0f05754
    • D
      Fix return status of DBImpl::GetLatestSequenceForKey · 7487a762
      DorianZheng 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4467
      
      Differential Revision: D10241418
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: f6adbe7292b2c934e14971c7432b3eb115c35026
      7487a762
    • F
      Update HISTORY.md to current status (#4471) · b787cf9e
      Fosco Marotto 提交于
      Summary:
      5.16.x status wasn't tracked, and also updated for pending 5.17 release.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4471
      
      Differential Revision: D10240925
      
      Pulled By: gfosco
      
      fbshipit-source-id: 95ab368a04a65b201d2518097af69edf2402f544
      b787cf9e
    • B
      RocksJava: memory_util support (#4446) · c9048021
      Ben Clay 提交于
      Summary:
      JNI passthrough for utilities/memory/memory_util.cc
      
      sagar0 adamretter
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4446
      
      Differential Revision: D10174578
      
      Pulled By: sagar0
      
      fbshipit-source-id: d1d196d771dff22afb7ef7500f308233675696f8
      c9048021
  3. 06 10月, 2018 2 次提交
  4. 05 10月, 2018 4 次提交
  5. 04 10月, 2018 1 次提交
  6. 03 10月, 2018 3 次提交
    • I
      Introduce CacheAllocator, a custom allocator for cache blocks (#4437) · 1cf5deb8
      Igor Canadi 提交于
      Summary:
      This is a conceptually simple change, but it touches many files to
      pass the allocator through function calls.
      
      We introduce CacheAllocator, which can be used by clients to configure
      custom allocator for cache blocks. Our motivation is to hook this up
      with folly's `JemallocNodumpAllocator`
      (https://github.com/facebook/folly/blob/f43ce6d6866b7b994b3019df561109afae050ebc/folly/experimental/JemallocNodumpAllocator.h),
      but there are many other possible use cases.
      
      Additionally, this commit cleans up memory allocation in
      `util/compression.h`, making sure that all allocations are wrapped in a
      unique_ptr as soon as possible.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4437
      
      Differential Revision: D10132814
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: be1343a4b69f6048df127939fea9bbc96969f564
      1cf5deb8
    • Y
      Check for compression lib support before test exec (#4443) · 4e58b2ea
      Yanqin Jin 提交于
      Summary:
      Before running CompactFilesTest.SentinelCompressionType, we should check
      whether zlib and snappy are supported.
      
      CompactFilesTest.SentinelCompressionType is a newly added test. Compilation and
      linking with different options, e.g. COMPILE_WITH_TSAN, COMPILE_WITH_ASAN, etc.
      lead to generation of different binaries. On the one hand, it's not clear why
      zlib or snappy is present under ASAN, but not under TSAN. On the other hand,
      changing the compilation flags for TSAN or ASAN seems a bigger change worth much
      more attention. To unblock the cont-runs, I suggest that we simply add these
      two checks at the beginning of the test, as we did for
      GeneralTableTest.ApproximateOffsetOfCompressed in table/table_test.cc.
      
      Future actions include invesigating the absence of zlib and snappy when
      compiling with TSAN, i.e. COMPILE_WITH_TSAN=1, if necessary.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4443
      
      Differential Revision: D10140935
      
      Pulled By: riversand963
      
      fbshipit-source-id: 62f96d1e685386accd2ef0b98f6f754d3fd67b3e
      4e58b2ea
    • J
      Adding IOTA Foundation to USERS.MD (#4436) · d78b2893
      Jakub Cech 提交于
      Summary:
      Adding IOTA Foundation to USERS.MD
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4436
      
      Differential Revision: D10108142
      
      Pulled By: sagar0
      
      fbshipit-source-id: 948dc9f7169cec5c113ae347f1af765a41355aae
      d78b2893
  7. 02 10月, 2018 2 次提交
    • G
      Add proper newline markdown (#4434) · 477107d6
      Gihwan Oh 提交于
      Summary:
      Add newline for readability
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4434
      
      Differential Revision: D10127684
      
      Pulled By: riversand963
      
      fbshipit-source-id: 39f3ed7eaea655b6ff83474bc9f7616c6ad59107
      477107d6
    • Y
      Remove a race condition between lsdir and rm (#4440) · be5cc4c7
      Yanqin Jin 提交于
      Summary:
      In DBCompactionTestWithParam::ManualLevelCompactionOutputPathId, there is
      a race condition between `DBTestBase::GetSstFileCount` and
      `DBImpl::PurgeObsoleteFiles`. The following graph explains why.
      
      ```
      Timeline  db_compact_test_t              bg_flush_t         bg_compact_t
          |  [initiate bg flush and
          |      start waiting]
          |                                     flush
          |                                     DeleteObsoleteFiles
          |  [waken up by bg_flush_t which
          |   signaled in DeleteObsoleteFiles]
          |
          |  [initiate compaction and
          |   start waiting]
          |
          |                                                         [compact,
          |                                                          set manual.done to true]
          |                                   [signal at the end of
          |                                    BackgroundCallFlush]
          |
          |  [waken up by bg_flush_t
          |   which signaled before
          |   returning from
          |   BackgroundCallFlush]
          |
          |  Check manual.done is true
          |
          |  GetSstFileCount    <-- race condition -->           PurgeObsoleteFiles
          V
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4440
      
      Differential Revision: D10122628
      
      Pulled By: riversand963
      
      fbshipit-source-id: 3ede73c39fee6ad804dc6ac1ed84759c7e63977f
      be5cc4c7
  8. 01 10月, 2018 1 次提交
  9. 28 9月, 2018 2 次提交
  10. 27 9月, 2018 3 次提交
    • S
      assert in PosixEnv::FileExists should be based on errno (#4427) · b1dad4cf
      Sagar Vemuri 提交于
      Summary:
      The assert in PosixEnv::FileExists is currently based on the return value of `access` syscall. Instead it should be based on errno.
      
      Initially I wanted to remove this assert as [`access`](https://linux.die.net/man/2/access) can error out in a few other cases (like EROFS). But on thinking more it feels like the assert is doing the right thing ...  its good to crash on EROFS, EFAULT, EINVAL, and other major filesystem related problems so that the user is immediately aware of the problems while testing.
      (I think it might be ok to crash on EIO as well, but there might be a specific reason why it was decided not to crash for EIO, and I don't have that context. So letting the letting the assert checks remain as is for now).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4427
      
      Differential Revision: D10037200
      
      Pulled By: sagar0
      
      fbshipit-source-id: 5cc96116a2e53cef701f444a8b5290576f311e51
      b1dad4cf
    • A
      Fix benchmark script with vector memtable (#4428) · d56070d8
      Andrew Kryczka 提交于
      Summary:
      I guess we didn't update this script when `--allow_concurrent_memtable_write` became true by default.
      
      Fixes #4413.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4428
      
      Differential Revision: D10036452
      
      Pulled By: ajkr
      
      fbshipit-source-id: f464be0642bd096d9040f82cdc3eae614a902183
      d56070d8
    • Y
      Improve log handling when recover without flush (#4405) · dc813e4b
      Yi Wu 提交于
      Summary:
      Improve log handling when avoid_flush_during_recovery=true.
      1. restore total_log_size_ after recovery, by summing up existing log sizes. Fixes #4253.
      2. truncate the last existing log, since this log can contain preallocated space and it will be a waste to keep the space. It avoids a crash loop of user application cause a lot of log with non-trivial size being created and ultimately take up all disk space.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4405
      
      Differential Revision: D9953933
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 967780fee8acec7f358b6eb65190fb4684f82e56
      dc813e4b
  11. 26 9月, 2018 2 次提交
    • N
      Handle tombstones at the same seqno in the CollapsedRangeDelMap (#4424) · 17edc82a
      Nikhil Benesch 提交于
      Summary:
      The CollapsedRangeDelMap was entirely mishandling tombstones at the same
      sequence number when the tombstones did not have identical start and end
      keys. Such tombstones are common since 90fc4069, which causes
      tombstones to be split during compactions.
      
      For example, if the tombstone [a, c) @ 1 lies across a compaction
      boundary at b, it will be split into [a, b) @ 1 and [b, c) @ 1. Without
      this patch, the collapsed range deletion map would look like this:
      
        a -> 1
        b -> 1
        c -> 0
      
      Notice how the b -> 1 entry is redundant. When the tombstones overlap,
      the problem is even worse. Consider tombstones [a, c) @ 1 and [b, d) @
      1, which produces this map without this patch:
      
        a -> 1
        b -> 1
        c -> 0
        d -> 0
      
      This map is corrupt, as a map can never contain adjacent sentinel (zero)
      entries. When the iterator advances from b to c, it will notice that c
      is a sentinel enty and skip to d--but d is also a sentinel entry! Asking
      what tombstone this iterator points to will trigger an assertion, as it
      is not pointing to a valid tombstone.
      
      /cc ajkr
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4424
      
      Differential Revision: D10039248
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 6d737c1e88d60e80cf27286726627ba44463e7f4
      17edc82a
    • Y
      Update TARGETS file template (#4426) · 31d46993
      Yi Wu 提交于
      Summary:
      Update template of TARGETS file according to recent changes in #4371 , #4363 and https://github.com/facebook/rocksdb/commit/dbf44c314b4adf3276afc1ca797b88944ca3162c.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4426
      
      Differential Revision: D10025053
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: e6a0a702bfd401fc1af240ee446f5690f0bcd85d
      31d46993
  12. 22 9月, 2018 1 次提交
    • A
      Improve RangeDelAggregator benchmarks (#4395) · 3c350a7c
      Abhishek Madan 提交于
      Summary:
      Improve time measurements for AddTombstones to only include the
      call and not the VectorIterator setup. Also add a new
      add_tombstones_per_run flag to call AddTombstones multiple times per
      aggregator, which will help simulate more realistic workloads.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4395
      
      Differential Revision: D9996811
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 5865a95c323fbd9b3606493013664b4890fe5a02
      3c350a7c
  13. 21 9月, 2018 2 次提交
  14. 20 9月, 2018 3 次提交
    • C
      add GetAggregatedLongProperty for Java API (#4379) · 02dc0749
      Chen, You 提交于
      Summary:
      Add Java API `getAggregatedLongProperty(final String property)`
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4379
      
      Differential Revision: D9921463
      
      Pulled By: sagar0
      
      fbshipit-source-id: a02512e1b2aff4765a10b77de9a7bf7b1909d954
      02dc0749
    • A
      Generate appropriate number of keys in db_bench (#4404) · 519f8b14
      Abhishek Madan 提交于
      Summary:
      If range tombstones are generated every few writes, the
      KeyGenerator's limit is now extended to account for the additional
      Next() calls. This is primarily important for `filluniquerandom`
      benchmarks that enforce the call limit.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4404
      
      Differential Revision: D9949326
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 0bdfeb2cad2098dc0b8b029236dab5e4bef25e38
      519f8b14
    • Z
      add missing range in random.choice argument (#4397) · 9b3cf908
      Zhongyi Xie 提交于
      Summary:
      This will fix the broken asan crash test:
      > Traceback (most recent call last):
        File "tools/db_crashtest.py", line 384, in <module>
          main()
        File "tools/db_crashtest.py", line 368, in main
          parser.add_argument("--" + k, type=type(v() if callable(v) else v))
        File "tools/db_crashtest.py", line 59, in <lambda>
          "index_block_restart_interval": lambda: random.choice(1, 16),
      TypeError: choice() takes exactly 2 arguments (3 given)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4397
      
      Differential Revision: D9933041
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 10998e5bc6b6a5cea3e4088b18465affc246e639
      9b3cf908
  15. 19 9月, 2018 4 次提交
    • M
      Extend crash test with index_block_restart_interval (#4383) · a0ebec38
      Maysam Yabandeh 提交于
      Summary:
      The default for index_block_restart_interval is 1 but some use 16 in production. The patch extends crash test to test both values.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4383
      
      Differential Revision: D9887304
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: a8d00fea974a79ad563f9f4d9d7b069e9f746a8f
      a0ebec38
    • F
      Fix issue with docs/feed.xml validation (#4392) · 886766c3
      Fosco Marotto 提交于
      Summary:
      Per #4387 this should address the validation error with the link tag.  This is a quick fix, a future iteration could significantly upgrade the jekyll integration.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4392
      
      Differential Revision: D9923643
      
      Pulled By: gfosco
      
      fbshipit-source-id: e7ed478e55c907add8319290326540e6e44fc0d6
      886766c3
    • A
      Unit test for custom comparator RangeDelAggregator (#4388) · 990b52e9
      Andrew Kryczka 提交于
      Summary:
      Add a unit test for range collapsing when non-default comparator is used. This exposes the bug fixed in #4386.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4388
      
      Differential Revision: D9918252
      
      Pulled By: ajkr
      
      fbshipit-source-id: 99501b96b251eab41791a7e33b27055ee36c5c39
      990b52e9
    • J
      use specified comparator in CollapsedRangeDelMap (#4386) · 27221b0c
      jsteemann 提交于
      Summary:
      The Comparator passed to CollapsedRangeDelMap was not used for
      operator less of the std::map `rep_` object contained in
      CollapsedRangeDelMap. So the map was always sorted using the
      default ByteWiseComparator, which seems wrong.
      
      Passing the specified Comparator through for usage in that map
      object fixes actual problems we were seeing with RangeDelete operations
      that do not delete keys as expected when using a custom Comparator.
      
      I found that the tests in current master crash when I run them locally,
      both with and without my patch, at the very same location. I therefore
      don't know if the patch breaks something else, but it seems to fix
      RangeDeletion issues in our product that uses RocksDB.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4386
      
      Differential Revision: D9916506
      
      Pulled By: ajkr
      
      fbshipit-source-id: 27bff8c775831f089dde8c5289df7343d88b2d66
      27221b0c
  16. 18 9月, 2018 2 次提交
    • M
      Fix bug in partition filters with format_version=4 (#4381) · 65ac72ed
      Maysam Yabandeh 提交于
      Summary:
      Value delta encoding in format_version 4 requires the differences between the size of two consecutive handles to be sent to BlockBuilder::Add. This applies not only to indexes on blocks but also the indexes on indexes and filters in partitioned indexes and filters respectively. The patch fixes a bug where the partitioned filters would encode the entire size of the handle rather than the difference of the size with the last size.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4381
      
      Differential Revision: D9879505
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 27a22e49b482b927fbd5629dc310c46d63d4b6d1
      65ac72ed
    • A
      Add RangeDelAggregator microbenchmarks (#4363) · 1626f6ab
      Abhishek Madan 提交于
      Summary:
      To measure the results of upcoming DeleteRange v2 work, this commit adds
      simple benchmarks for RangeDelAggregator. It measures the average time
      for AddTombstones and ShouldDelete calls.
      
      Using this to compare the results before #4014 and on the latest master (using the default arguments) produces the following results:
      
      Before #4014:
      ```
      =======================
      Results:
      =======================
      AddTombstones:          1356.28 us
      ShouldDelete:           0.401732 us
      ```
      
      Latest master:
      ```
      =======================
      Results:
      =======================
      AddTombstones:          740.82 us
      ShouldDelete:           0.383271 us
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4363
      
      Differential Revision: D9881676
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 793e7d61aa4b9d47eb917bbcc03f08695b5e5442
      1626f6ab