1. 03 1月, 2020 1 次提交
    • M
      Prevent an incompatible combination of options (#6254) · 48a678b7
      Maysam Yabandeh 提交于
      Summary:
      allow_concurrent_memtable_write is incompatible with non-zero max_successive_merges. Although we check this at runtime, we currently don't prevent the user from setting this combination in options. This has led to stress tests to fail with this combination is tried in ::SetOptions. The patch fixes that.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6254
      
      Differential Revision: D19265819
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 47f2e2dc26fe0972c7152f4da15dadb9703f1179
      48a678b7
  2. 19 12月, 2019 1 次提交
    • S
      Fix potential overflow in CalculateSSTWriteHint() (#6212) · ef918947
      sdong 提交于
      Summary:
      level passed into ColumnFamilyData::CalculateSSTWriteHint() can be smaller than base_level in current version, which would cause overflow.
      We see ubsan complains:
      
      db/compaction/compaction_job.cc:1511:39: runtime error: load of value 4294967295, which is not a valid value for type 'Env::WriteLifeTimeHint'
      
      and I hope this commit fixes it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6212
      
      Test Plan: Run existing tests and see them to pass.
      
      Differential Revision: D19168442
      
      fbshipit-source-id: bf8fd86f85478ecfa7556db46dc3242de8c83dc9
      ef918947
  3. 18 12月, 2019 1 次提交
    • delete superversions in BackgroundCallPurge (#6146) · 39fcaf82
      解轶伦 提交于
      Summary:
      I found that CleanupSuperVersion() may block Get() for 30ms+ (per MemTable is 256MB).
      
      Then I found "delete sv" in ~SuperVersion() takes the time.
      
      The backtrace looks like this
      
      DBImpl::GetImpl() -> DBImpl::ReturnAndCleanupSuperVersion() ->
      DBImpl::CleanupSuperVersion() : delete sv; -> ~SuperVersion()
      
      I think it's better to delete in a background thread,  please review it。
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6146
      
      Differential Revision: D18972066
      
      fbshipit-source-id: 0f7b0b70b9bb1e27ad6fc1c8a408fbbf237ae08c
      39fcaf82
  4. 14 12月, 2019 1 次提交
    • A
      Introduce a new storage specific Env API (#5761) · afa2420c
      anand76 提交于
      Summary:
      The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc.
      
      This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO.
      
      The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before.
      
      This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection.
      
      The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761
      
      Differential Revision: D18868376
      
      Pulled By: anand1976
      
      fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
      afa2420c
  5. 13 12月, 2019 1 次提交
  6. 27 11月, 2019 1 次提交
    • S
      Make default value of options.ttl to be 30 days when it is supported. (#6073) · 77eab5c8
      sdong 提交于
      Summary:
      By default options.ttl is disabled. We believe a better default will be 30 days, which means deleted data the database will be removed from SST files slightly after 30 days, for most of the cases.
      
      Make the default UINT64_MAX - 1 to indicate that it is not overridden by users.
      
      Change periodic_compaction_seconds to be UINT64_MAX - 1 to UINT64_MAX  too to be consistent. Also fix a small bug in the previous periodic_compaction_seconds default code.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6073
      
      Test Plan: Add unit tests for it.
      
      Differential Revision: D18669626
      
      fbshipit-source-id: 957cd4374cafc1557d45a0ba002010552a378cc8
      77eab5c8
  7. 23 11月, 2019 2 次提交
    • S
      Support ttl in Universal Compaction (#6071) · 669ea77d
      Sagar Vemuri 提交于
      Summary:
      `options.ttl` is now supported in universal compaction, similar to how periodic compactions are implemented in PR https://github.com/facebook/rocksdb/issues/5970 .
      Setting `options.ttl` will simply set `options.periodic_compaction_seconds` to execute the periodic compactions code path.
      Discarded PR https://github.com/facebook/rocksdb/issues/4749 in lieu of this.
      
      This is a short term work-around/hack of falling back to periodic compactions when ttl is set.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6071
      
      Test Plan: Added a unit test.
      
      Differential Revision: D18668336
      
      Pulled By: sagar0
      
      fbshipit-source-id: e75f5b81ba949f77ef9eff05e44bb1c757f58612
      669ea77d
    • S
      Support options.ttl with options.max_open_files = -1 (#6060) · d8c28e69
      sdong 提交于
      Summary:
      Previously, options.ttl cannot be set with options.max_open_files = -1, because it makes use of creation_time field in table properties, which is not available unless max_open_files = -1. With this commit, the information will be stored in manifest and when it is available, will be used instead.
      
      Note that, this change will break forward compatibility for release 5.1 and older.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6060
      
      Test Plan: Extend existing test case to options.max_open_files != -1, and simulate backward compatility in one test case by forcing the value to be 0.
      
      Differential Revision: D18631623
      
      fbshipit-source-id: 30c232a8672de5432ce9608bb2488ecc19138830
      d8c28e69
  8. 20 11月, 2019 1 次提交
    • L
      Fix corruption with intra-L0 on ingested files (#5958) · ec3e3c3e
      Little-Wallace 提交于
      Summary:
      ## Problem Description
      
      Our process was abort when it call `CheckConsistency`. And the information in  `stderr` show that "`L0 files seqno 3001491972 3004797440 vs. 3002875611 3004524421` ".  Here are the causes of the accident I investigated.
      
      * RocksDB will call `CheckConsistency` whenever `MANIFEST` file is update. It will check sequence number interval of every file, except files which were ingested.
      * When one file is ingested into RocksDB, it will be assigned the value of global sequence number, and the minimum and maximum seqno of this file are equal, which are both equal to global sequence number.
      * `CheckConsistency`  determines whether the file is ingested by whether the smallest and largest seqno of an sstable file are equal.
      * If IntraL0Compaction picks one sst which was ingested just now and compacted it into another sst,  the `smallest_seqno` of this new file will be smaller than his `largest_seqno`.
          * If more than one ingested file was ingested before memtable schedule flush,  and they all compact into one new sstable file by `IntraL0Compaction`. The sequence interval of this new file will be included in the interval of the memtable.  So `CheckConsistency` will return a `Corruption`.
          * If a sstable was ingested after the memtable was schedule to flush, which would assign a larger seqno to it than memtable. Then the file was compacted with other files (these files were all flushed before the memtable) in L0 into one file. This compaction start before the flush job of memtable start,  but completed after the flush job finish. So this new file produced by the compaction (we call it s1) would have a larger interval of sequence number than the file produced by flush (we call it s2).  **But there was still some data in s1  written into RocksDB before the s2, so it's possible that some data in s2 was cover by old data in s1.** Of course, it would also make a `Corruption` because of overlap of seqno. There is the relationship of the files:
          > s1.smallest_seqno < s2.smallest_seqno < s2.largest_seqno  < s1.largest_seqno
      
      So I skip pick sst file which was ingested in function `FindIntraL0Compaction `
      
      ## Reason
      
      Here is my bug report: https://github.com/facebook/rocksdb/issues/5913
      
      There are two situations that can cause the check to fail.
      
      ### First situation:
      - First we ingest five external sst into Rocksdb, and they happened to be ingested in L0. and there had been some data in memtable, which make the smallest sequence number of memtable is less than which of sst that we ingest.
      
      - If there had been one compaction job which compacted sst from L0 to L1, `LevelCompactionPicker` would trigger a `IntraL0Compaction` which would compact this five sst from L0 to L0. We call this sst A, which was merged from five ingested sst.
      
      - Then some data was put into memtable, and memtable was flushed to L0. We called this sst B.
      - RocksDB check consistency , and find the `smallest_seqno` of B is  less than that of A and crash. Because A was merged from five sst, the smallest sequence number of it was less than the biggest sequece number of itself, so RocksDB could not tell if A was produce by ingested.
      
      ### Secondary situaion
      
      - First we have flushed many sst in L0,  we call them [s1, s2, s3].
      
      - There is an immutable memtable request to be flushed, but because flush thread is busy, so it has not been picked. we call it m1.  And at the moment, one sst is ingested into L0. We call it s4. Because s4 is ingested after m1 became immutable memtable, so it has a larger log sequence number than m1.
      
      - m1 is flushed in L0. because it is small, this flush job finish quickly. we call it s5.
      
      - [s1, s2, s3, s4] are compacted into one sst to L0, by IntraL0Compaction.  We call it s6.
        - compacted 4@0 files to L0
      - When s6 is added into manifest,  the corruption happened. because the largest sequence number of s6 is equal to s4, and they are both larger than that of s5.  But because s1 is older than m1, so the smallest sequence number of s6 is smaller than that of s5.
         - s6.smallest_seqno < s5.smallest_seqno < s5.largest_seqno < s6.largest_seqno
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5958
      
      Differential Revision: D18601316
      
      fbshipit-source-id: 5fe54b3c9af52a2e1400728f565e895cde1c7267
      ec3e3c3e
  9. 08 11月, 2019 1 次提交
  10. 01 11月, 2019 1 次提交
    • S
      Make FIFO compaction take default 30 days TTL by default (#5987) · 2a9e5caf
      sdong 提交于
      Summary:
      Right now, by default FIFO compaction has no TTL. We believe that a default TTL of 30 days will be better. With this patch, the default will be changed to 30 days. Default of Options.periodic_compaction_seconds will mean the same as options.ttl. If Options.ttl and Options.periodic_compaction_seconds left default, a default 30 days TTL will be used. If both options are set, the stricter value of the two will be used.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5987
      
      Test Plan: Add an option sanitize test to cover the case.
      
      Differential Revision: D18237935
      
      fbshipit-source-id: a6dcea1f36c3849e13c0a69e413d73ad8eab58c9
      2a9e5caf
  11. 30 10月, 2019 1 次提交
    • S
      Auto enable Periodic Compactions if a Compaction Filter is used (#5865) · 4c9aa30a
      Sagar Vemuri 提交于
      Summary:
      - Periodic compactions are auto-enabled if a compaction filter or a compaction filter factory is set, in Level Compaction.
      - The default value of `periodic_compaction_seconds` is changed to UINT64_MAX, which lets RocksDB auto-tune periodic compactions as needed. An explicit value of 0 will still work as before ie. to disable periodic compactions completely. For now, on seeing a compaction filter along with a UINT64_MAX value for `periodic_compaction_seconds`, RocksDB will make SST files older than 30 days to go through periodic copmactions.
      
      Some RocksDB users make use of compaction filters to control when their data can be deleted, usually with a custom TTL logic. But it is occasionally possible that the compactions get delayed by considerable time due to factors like low writes to a key range, data reaching bottom level, etc before the TTL expiry. Periodic Compactions feature was originally built to help such cases. Now periodic compactions are auto enabled by default when compaction filters or compaction filter factories are used, as it is generally helpful to all cases to collect garbage.
      
      `periodic_compaction_seconds` is set to a large value, 30 days, in `SanitizeOptions` when RocksDB sees that a `compaction_filter` or `compaction_filter_factory` is used.
      
      This is done only for Level Compaction style.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5865
      
      Test Plan:
      - Added a new test `DBCompactionTest.LevelPeriodicCompactionWithCompactionFilters` to make sure that `periodic_compaction_seconds` is set if either `compaction_filter` or `compaction_filter_factory` options are set.
      - `COMPILE_WITH_ASAN=1 make check`
      
      Differential Revision: D17659180
      
      Pulled By: sagar0
      
      fbshipit-source-id: 4887b9cf2e53cf2dc93a7b658c6b15e1181217ee
      4c9aa30a
  12. 21 9月, 2019 1 次提交
  13. 24 8月, 2019 1 次提交
    • Z
      Refactor trimming logic for immutable memtables (#5022) · 2f41ecfe
      Zhongyi Xie 提交于
      Summary:
      MyRocks currently sets `max_write_buffer_number_to_maintain` in order to maintain enough history for transaction conflict checking. The effectiveness of this approach depends on the size of memtables. When memtables are small, it may not keep enough history; when memtables are large, this may consume too much memory.
      We are proposing a new way to configure memtable list history: by limiting the memory usage of immutable memtables. The new option is `max_write_buffer_size_to_maintain` and it will take precedence over the old `max_write_buffer_number_to_maintain` if they are both set to non-zero values. The new option accounts for the total memory usage of flushed immutable memtables and mutable memtable. When the total usage exceeds the limit, RocksDB may start dropping immutable memtables (which is also called trimming history), starting from the oldest one.
      The semantics of the old option actually works both as an upper bound and lower bound. History trimming will start if number of immutable memtables exceeds the limit, but it will never go below (limit-1) due to history trimming.
      In order the mimic the behavior with the new option, history trimming will stop if dropping the next immutable memtable causes the total memory usage go below the size limit. For example, assuming the size limit is set to 64MB, and there are 3 immutable memtables with sizes of 20, 30, 30. Although the total memory usage is 80MB > 64MB, dropping the oldest memtable will reduce the memory usage to 60MB < 64MB, so in this case no memtable will be dropped.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5022
      
      Differential Revision: D14394062
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 60457a509c6af89d0993f988c9b5c2aa9e45f5c5
      2f41ecfe
  14. 14 6月, 2019 1 次提交
  15. 07 6月, 2019 1 次提交
  16. 04 6月, 2019 1 次提交
  17. 01 6月, 2019 1 次提交
  18. 31 5月, 2019 2 次提交
  19. 30 5月, 2019 1 次提交
  20. 25 4月, 2019 1 次提交
    • M
      Don't call FindObsoleteFiles() in ~ColumnFamilyHandleImpl() if CF is not dropped (#5238) · cd77d3c5
      Mike Kolupaev 提交于
      Summary:
      We have a DB with ~4k column families and ~70k files. On shutdown, destroying the 4k ColumnFamilyHandle-s takes over 2 minutes. Most of this time is spent in VersionSet::AddLiveFiles() called from FindObsoleteFiles() from ~ColumnFamilyHandleImpl(). It's just iterating over the list of files in memory. This seems completely unnecessary as no obsolete files are actually found since the CFs are not even dropped. This PR fixes that.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5238
      
      Differential Revision: D15056342
      
      Pulled By: siying
      
      fbshipit-source-id: 2aa342ef3770b4aa384ce81f8768e485480e4f08
      cd77d3c5
  21. 17 4月, 2019 1 次提交
  22. 02 4月, 2019 1 次提交
    • M
      Add DBOptions. avoid_unnecessary_blocking_io to defer file deletions (#5043) · 120bc471
      Mike Kolupaev 提交于
      Summary:
      Just like ReadOptions::background_purge_on_iterator_cleanup but for ColumnFamilyHandle instead of Iterator.
      
      In our use case we sometimes call ColumnFamilyHandle's destructor from low-latency threads, and sometimes it blocks the thread for a few seconds deleting the files. To avoid that, we can either offload ColumnFamilyHandle's destruction to a background thread on our side, or add this option on rocksdb side. This PR does the latter, to be consistent with how we solve exactly the same problem for iterators using background_purge_on_iterator_cleanup option.
      
      (EDIT: It's avoid_unnecessary_blocking_io now, and affects both CF drops and iterator destructors.)
      I'm not quite comfortable with having two separate options (background_purge_on_iterator_cleanup and background_purge_on_cf_cleanup) for such a rarely used thing. Maybe we should merge them? Rename background_purge_on_cf_cleanup to something like delete_files_on_background_threads_only or avoid_blocking_io_in_unexpected_places, and make iterators use it instead of the one in ReadOptions? I can do that here if you guys think it's better.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5043
      
      Differential Revision: D14339233
      
      Pulled By: al13n321
      
      fbshipit-source-id: ccf7efa11c85c9a5b91d969bb55627d0fb01e7b8
      120bc471
  23. 19 1月, 2019 1 次提交
    • A
      Digest ZSTD compression dictionary once when writing SST file (#4849) · 01013ae7
      Andrew Kryczka 提交于
      Summary:
      This is essentially a re-submission of #4251 with a few improvements:
      
      - Split `CompressionDict` into two separate classes: `CompressionDict` and `UncompressionDict`
      - Eliminated `Init` functions. Instead do all initialization work in constructors.
      - Added test case for parallel DB open, which is the scenario where #4251 failed under TSAN.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4849
      
      Differential Revision: D13606039
      
      Pulled By: ajkr
      
      fbshipit-source-id: 08c236059798c710db9cbf545fce0f371232d447
      01013ae7
  24. 18 12月, 2018 2 次提交
  25. 30 11月, 2018 1 次提交
    • S
      Move FIFOCompactionPicker to a separate file (#4724) · 70645355
      Sagar Vemuri 提交于
      Summary:
      **Summary:**
      Simplified the code layout by moving FIFOCompactionPicker to a separate file.
      **Why?:**
      While trying to add ttl functionality to universal compaction, I found that `FIFOCompactionPicker` class and its impl methods to be interspersed between `LevelCompactionPicker` methods which kind-of made the code a little hard to traverse. So I moved `FIFOCompactionPicker` to a separate compaction_picker_fifo.h/cc file, similar to `UniversalCompactionPicker`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4724
      
      Differential Revision: D13227914
      
      Pulled By: sagar0
      
      fbshipit-source-id: 89471766ea67fa4d87664a41c057dd7df4b3d4e3
      70645355
  26. 29 11月, 2018 1 次提交
    • A
      Clean up FragmentedRangeTombstoneList (#4692) · 8fe1e06c
      Abhishek Madan 提交于
      Summary:
      Removed `one_time_use` flag, which removed the need for some
      tests, and changed all `NewRangeTombstoneIterator` methods to return
      `FragmentedRangeTombstoneIterators`.
      
      These changes also led to removing `RangeDelAggregatorV2::AddUnfragmentedTombstones`
      and one of the `MemTableListVersion::AddRangeTombstoneIterators` methods.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4692
      
      Differential Revision: D13106570
      
      Pulled By: abhimadan
      
      fbshipit-source-id: cbab5432d7fc2d9cdfd8d9d40361a1bffaa8f845
      8fe1e06c
  27. 22 11月, 2018 1 次提交
    • A
      Introduce RangeDelAggregatorV2 (#4649) · 457f77b9
      Abhishek Madan 提交于
      Summary:
      The old RangeDelAggregator did expensive pre-processing work
      to create a collapsed, binary-searchable representation of range
      tombstones. With FragmentedRangeTombstoneIterator, much of this work is
      now unnecessary. RangeDelAggregatorV2 takes advantage of this by seeking
      in each iterator to find a covering tombstone in ShouldDelete, while
      doing minimal work in AddTombstones. The old RangeDelAggregator is still
      used during flush/compaction for now, though RangeDelAggregatorV2 will
      support those uses in a future PR.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4649
      
      Differential Revision: D13146964
      
      Pulled By: abhimadan
      
      fbshipit-source-id: be29a4c020fc440500c137216fcc1cf529571eb3
      457f77b9
  28. 31 10月, 2018 1 次提交
    • A
      Promote rocksdb.{deleted.keys,merge.operands} to main table properties (#4594) · eaaf1a6f
      Abhishek Madan 提交于
      Summary:
      Since the number of range deletions are reported in
      TableProperties, it is confusing to not report the number of merge
      operands and point deletions as top-level properties; they are
      accessible through the public API, but since they are not the "main"
      properties, they do not appear in aggregated table properties, or the
      string representation of table properties.
      
      This change promotes those two property keys to
      `rocksdb/table_properties.h`, adds corresponding uint64 members for
      them, deprecates the old access methods `GetDeletedKeys()` and
      `GetMergeOperands()` (though they are still usable for now), and removes
      `InternalKeyPropertiesCollector`. The property key strings are the same
      as before this change, so this should be able to read DBs written from older
      versions (though I haven't tested this yet).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4594
      
      Differential Revision: D12826893
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 9e4e4fbdc5b0da161c89582566d184101ba8eb68
      eaaf1a6f
  29. 04 5月, 2018 1 次提交
    • S
      Skip deleted WALs during recovery · d5954929
      Siying Dong 提交于
      Summary:
      This patch record min log number to keep to the manifest while flushing SST files to ignore them and any WAL older than them during recovery. This is to avoid scenarios when we have a gap between the WAL files are fed to the recovery procedure. The gap could happen by for example out-of-order WAL deletion. Such gap could cause problems in 2PC recovery where the prepared and commit entry are placed into two separate WAL and gap in the WALs could result into not processing the WAL with the commit entry and hence breaking the 2PC recovery logic.
      
      Before the commit, for 2PC case, we determined which log number to keep in FindObsoleteFiles(). We looked at the earliest logs with outstanding prepare entries, or prepare entries whose respective commit or abort are in memtable. With the commit, the same calculation is done while we apply the SST flush. Just before installing the flush file, we precompute the earliest log file to keep after the flush finishes using the same logic (but skipping the memtables just flushed), record this information to the manifest entry for this new flushed SST file. This pre-computed value is also remembered in memory, and will later be used to determine whether a log file can be deleted. This value is unlikely to change until next flush because the commit entry will stay in memtable. (In WritePrepared, we could have removed the older log files as soon as all prepared entries are committed. It's not yet done anyway. Even if we do it, the only thing we loss with this new approach is earlier log deletion between two flushes, which does not guarantee to happen anyway because the obsolete file clean-up function is only executed after flush or compaction)
      
      This min log number to keep is stored in the manifest using the safely-ignore customized field of AddFile entry, in order to guarantee that the DB generated using newer release can be opened by previous releases no older than 4.2.
      Closes https://github.com/facebook/rocksdb/pull/3765
      
      Differential Revision: D7747618
      
      Pulled By: siying
      
      fbshipit-source-id: d00c92105b4f83852e9754a1b70d6b64cb590729
      d5954929
  30. 28 4月, 2018 2 次提交
    • H
      Add max_subcompactions as a compaction option · ed7a95b2
      Huachao Huang 提交于
      Summary:
      Sometimes we want to compact files as fast as possible, but don't want to set a large `max_subcompactions` in the `DBOptions` by default.
      I add a `max_subcompactions` options to `CompactionOptions` so that we can choose a proper concurrency dynamically.
      Closes https://github.com/facebook/rocksdb/pull/3775
      
      Differential Revision: D7792357
      
      Pulled By: ajkr
      
      fbshipit-source-id: 94f54c3784dce69e40a229721a79a97e80cd6a6c
      ed7a95b2
    • Y
      Rename pending_compaction_ to queued_for_compaction_. · 7dfbe335
      Yanqin Jin 提交于
      Summary:
      We use `queued_for_flush_` to indicate a column family has been added to the
      flush queue. Similarly and to be consistent in our naming, we need to use `queued_for_compaction_` to indicate a column family has been added to the compaction queue. In the past we used
      `pending_compaction_` which can also be ambiguous.
      Closes https://github.com/facebook/rocksdb/pull/3781
      
      Differential Revision: D7790063
      
      Pulled By: riversand963
      
      fbshipit-source-id: 6786b11a4fcaea36dc9b4672233dbe042f921804
      7dfbe335
  31. 27 4月, 2018 1 次提交
    • Y
      Rename pending_flush_ to queued_for_flush_. · 513b5ce6
      Yanqin Jin 提交于
      Summary:
      With ColumnFamilyData::pending_flush_, we have the following code snippet in DBImpl::ScheedulePendingFlush
      
      ```
      if (!cfd->pending_flush() && cfd->imm()->IsFlushPending()) {
      ...
      }
      ```
      
      `Pending` is ambiguous, and I feel `queued_for_flush` is a better name,
      especially for the sake of readability.
      Closes https://github.com/facebook/rocksdb/pull/3777
      
      Differential Revision: D7783066
      
      Pulled By: riversand963
      
      fbshipit-source-id: f1bd8c8bfe5eafd2c94da0d8566c9b2b6bb57229
      513b5ce6
  32. 13 4月, 2018 1 次提交
  33. 11 4月, 2018 1 次提交
    • A
      fix calling SetOptions on deprecated options · 019d7894
      Andrew Kryczka 提交于
      Summary:
      In `cf_options_type_info`, the deprecated options are all considered to have offset zero in the `MutableCFOptions` struct. Previously we weren't checking in `GetMutableOptionsFromStrings` whether the provided option was deprecated or not and simply writing the provided value to the offset specified by `cf_options_type_info`. That meant setting any deprecated option would overwrite the first element in the struct, which is `write_buffer_size`. `db_stress` hit this often since it calls `SetOptions` with `soft_rate_limit=0` and `hard_rate_limit=0`, which are both deprecated so cause `write_buffer_size` to be set to zero, which causes it to crash on the following assertion:
      
      ```
      db_stress: db/memtable.cc:106: rocksdb::MemTable::MemTable(const rocksdb::InternalKeyComparator&, const rocksdb::ImmutableCFOptions&, const rocksdb::MutableCFOptions&, rocksdb::WriteBufferManager*, rocksdb::SequenceNumber, uint32_t): Assertion `!ShouldScheduleFlush()' failed.
      ```
      
      We fix it by skipping deprecated options (and logging a warning) when users provide them to `SetOptions`. I didn't want to fail the call for compatibility reasons.
      Closes https://github.com/facebook/rocksdb/pull/3700
      
      Differential Revision: D7572596
      
      Pulled By: ajkr
      
      fbshipit-source-id: bd5d84e14c0c39f30c5d4c6df7c1503d2c28ecf1
      019d7894
  34. 06 4月, 2018 1 次提交
    • P
      Support for Column family specific paths. · 446b32cf
      Phani Shekhar Mantripragada 提交于
      Summary:
      In this change, an option to set different paths for different column families is added.
      This option is set via cf_paths setting of ColumnFamilyOptions. This option will work in a similar fashion to db_paths setting. Cf_paths is a vector of Dbpath values which contains a pair of the absolute path and target size. Multiple levels in a Column family can go to different paths if cf_paths has more than one path.
      To maintain backward compatibility, if cf_paths is not specified for a column family, db_paths setting will be used. Note that, if db_paths setting is also not specified, RocksDB already has code to use db_name as the only path.
      
      Changes :
      1) A new member "cf_paths" is added to ImmutableCfOptions. This is set, based on cf_paths setting of ColumnFamilyOptions and db_paths setting of ImmutableDbOptions.  This member is used to identify the path information whenever files are accessed.
      2) Validation checks are added for cf_paths setting based on existing checks for db_paths setting.
      3) DestroyDB, PurgeObsoleteFiles etc. are edited to support multiple cf_paths.
      4) Unit tests are added appropriately.
      Closes https://github.com/facebook/rocksdb/pull/3102
      
      Differential Revision: D6951697
      
      Pulled By: ajkr
      
      fbshipit-source-id: 60d2262862b0a8fd6605b09ccb0da32bb331787d
      446b32cf
  35. 23 3月, 2018 1 次提交
    • Z
      FlushReason improvement · 1cbc96d2
      Zhongyi Xie 提交于
      Summary:
      Right now flush reason "SuperVersion Change" covers a few different scenarios which is a bit vague. For example, the following db_bench job should trigger "Write Buffer Full"
      
      > $ TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304
      $ grep 'flush_reason' /dev/shm/dbbench/LOG
      ...
      2018/03/06-17:30:42.543638 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242543634, "job": 192, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018024, "flush_reason": "SuperVersion Change"}
      2018/03/06-17:30:42.569541 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242569536, "job": 193, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "SuperVersion Change"}
      2018/03/06-17:30:42.596396 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242596392, "job": 194, "event": "flush_started", "num_memtables": 1, "num_entries": 7008, "num_deletes": 0, "memory_usage": 1018048, "flush_reason": "SuperVersion Change"}
      2018/03/06-17:30:42.622444 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242622440, "job": 195, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "SuperVersion Change"}
      
      With the fix:
      > 2018/03/19-14:40:02.341451 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602341444, "job": 98, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018008, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.379655 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602379642, "job": 100, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018016, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.418479 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602418474, "job": 101, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.455084 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602455079, "job": 102, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018048, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.492293 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602492288, "job": 104, "event": "flush_started", "num_memtables": 1, "num_entries": 7007, "num_deletes": 0, "memory_usage": 1018056, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.528720 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602528715, "job": 105, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "Write Buffer Full"}
      2018/03/19-14:40:02.566255 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602566238, "job": 107, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018112, "flush_reason": "Write Buffer Full"}
      Closes https://github.com/facebook/rocksdb/pull/3627
      
      Differential Revision: D7328772
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 67c94065fbdd36930f09930aad0aaa6d2c152bb8
      1cbc96d2
  36. 21 3月, 2018 1 次提交
    • S
      Memory Problem Of Destorying ColumnFamilyHandle after deleting the CF · 93d52696
      Siying Dong 提交于
      Summary:
      When destorying column family handle after the column family has been deleted, the handle may hold share pointers of some objects in ColumnFamilyOptions, but in the destructor, the destructing order may cause some of the objects to be destoryed before being used by the following steps. Fix it by making a copy of the option object and destory it as the last step.
      Closes https://github.com/facebook/rocksdb/pull/3610
      
      Differential Revision: D7281025
      
      Pulled By: siying
      
      fbshipit-source-id: ac18f3b2841788cba4ccfa1abd8d59158c1113bc
      93d52696