1. 26 10月, 2017 1 次提交
    • A
      single-file bottom-level compaction when snapshot released · 9b18cc23
      Andrew Kryczka 提交于
      Summary:
      When snapshots are held for a long time, files may reach the bottom level containing overwritten/deleted keys. We previously had no mechanism to trigger compaction on such files. This particularly impacted DBs that write to different parts of the keyspace over time, as such files would never be naturally compacted due to second-last level files moving down. This PR introduces a mechanism for bottommost files to be recompacted upon releasing all snapshots that prevent them from dropping their deleted/overwritten keys.
      
      - Changed `CompactionPicker` to compact files in `BottommostFilesMarkedForCompaction()`. These are the last choice when picking. Each file will be compacted alone and output to the same level in which it originated. The goal of this type of compaction is to rewrite the data excluding deleted/overwritten keys.
      - Changed `ReleaseSnapshot()` to recompute the bottom files marked for compaction when the oldest existing snapshot changes, and schedule a compaction if needed. We cache the value that oldest existing snapshot needs to exceed in order for another file to be marked in `bottommost_files_mark_threshold_`, which allows us to avoid recomputing marked files for most snapshot releases.
      - Changed `VersionStorageInfo` to track the list of bottommost files, which is recomputed every time the version changes by `UpdateBottommostFiles()`. The list of marked bottommost files is first computed in `ComputeBottommostFilesMarkedForCompaction()` when the version changes, but may also be recomputed when `ReleaseSnapshot()` is called.
      - Extracted core logic of `Compaction::IsBottommostLevel()` into `VersionStorageInfo::RangeMightExistAfterSortedRun()` since logic to check whether a file is bottommost is now necessary outside of compaction.
      Closes https://github.com/facebook/rocksdb/pull/3009
      
      Differential Revision: D6062044
      
      Pulled By: ajkr
      
      fbshipit-source-id: 123d201cf140715a7d5928e8b3cb4f9cd9f7ad21
      9b18cc23
  2. 24 10月, 2017 1 次提交
  3. 20 10月, 2017 1 次提交
    • S
      Make FIFO compaction options dynamically configurable · f0804db7
      Sagar Vemuri 提交于
      Summary:
      ColumnFamilyOptions::compaction_options_fifo and all its sub-fields can be set dynamically now.
      
      Some of the ways in which the fifo compaction options can be set are:
      - `SetOptions({{"compaction_options_fifo", "{max_table_files_size=1024}"}})`
      - `SetOptions({{"compaction_options_fifo", "{ttl=600;}"}})`
      - `SetOptions({{"compaction_options_fifo", "{max_table_files_size=1024;ttl=600;}"}})`
      - `SetOptions({{"compaction_options_fifo", "{max_table_files_size=51;ttl=49;allow_compaction=true;}"}})`
      
      Most of the code has been made generic enough so that it could be reused later to make universal options (and other such nested defined-types) dynamic with very few lines of parsing/serializing code changes.
      Introduced a few new functions like `ParseStruct`, `SerializeStruct` and `GetStringFromStruct`.
      The duplicate code in `GetStringFromDBOptions` and `GetStringFromColumnFamilyOptions` has been moved into `GetStringFromStruct`. So they become just simple wrappers now.
      Closes https://github.com/facebook/rocksdb/pull/3006
      
      Differential Revision: D6058619
      
      Pulled By: sagar0
      
      fbshipit-source-id: 1e8f78b3374ca5249bb4f3be8a6d3bb4cbc52f92
      f0804db7
  4. 11 10月, 2017 1 次提交
    • A
      fix file numbers after repair · 70aa9421
      Andrew Kryczka 提交于
      Summary:
      The file numbers assigned post-repair were sometimes smaller than older files' numbers due to `LogAndApply` saving the wrong next file number in the manifest.
      
      - Mark the highest file seen during repair as used before `LogAndApply` so the correct next file number will be stored.
      - Renamed `MarkFileNumberUsedDuringRecovery` to `MarkFileNumberUsed` since now it's used during repair in addition to during recovery
      - Added `TEST_Current_Next_FileNo` to expose the next file number for the unit test.
      Closes https://github.com/facebook/rocksdb/pull/2988
      
      Differential Revision: D6018083
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3f25cbf74439cb8f16dd12af90b67f9f9f75e718
      70aa9421
  5. 04 10月, 2017 1 次提交
    • Y
      Add ValueType::kTypeBlobIndex · d1cab2b6
      Yi Wu 提交于
      Summary:
      Add kTypeBlobIndex value type, which will be used by blob db only, to insert a (key, blob_offset) KV pair. The purpose is to
      1. Make it possible to open existing rocksdb instance as blob db. Existing value will be of kTypeIndex type, while value inserted by blob db will be of kTypeBlobIndex.
      2. Make rocksdb able to detect if the db contains value written by blob db, if so return error.
      3. Make it possible to have blob db optionally store value in SST file (with kTypeValue type) or as a blob value (with kTypeBlobIndex type).
      
      The root db (DBImpl) basically pretended kTypeBlobIndex are normal value on write. On Get if is_blob is provided, return whether the value read is of kTypeBlobIndex type, or return Status::NotSupported() status if is_blob is not provided. On scan allow_blob flag is pass and if the flag is true, return wether the value is of kTypeBlobIndex type via iter->IsBlob().
      
      Changes on blob db side will be in a separate patch.
      Closes https://github.com/facebook/rocksdb/pull/2886
      
      Differential Revision: D5838431
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 3c5306c62bc13bb11abc03422ec5cbcea1203cca
      d1cab2b6
  6. 29 9月, 2017 1 次提交
  7. 13 9月, 2017 1 次提交
    • A
      Fix naming in InternalKey · 5785b1fc
      Amy Xu 提交于
      Summary:
      - Switched all instances of SetMinPossibleForUserKey and SetMaxPossibleForUserKey in accordance to InternalKeyComparator's comparison logic
      Closes https://github.com/facebook/rocksdb/pull/2868
      
      Differential Revision: D5804152
      
      Pulled By: axxufb
      
      fbshipit-source-id: 80be35e04f2e8abc35cc64abe1fecb03af24e183
      5785b1fc
  8. 12 9月, 2017 1 次提交
    • M
      write-prepared txn: call IsInSnapshot · f46464d3
      Maysam Yabandeh 提交于
      Summary:
      This patch instruments the read path to verify each read value against an optional ReadCallback class. If the value is rejected, the reader moves on to the next value. The WritePreparedTxn makes use of this feature to skip sequence numbers that are not in the read snapshot.
      Closes https://github.com/facebook/rocksdb/pull/2850
      
      Differential Revision: D5787375
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 49d808b3062ab35e7ae98ad388f659757794184c
      f46464d3
  9. 25 8月, 2017 1 次提交
    • Y
      Allow DB reopen with reduced options.num_levels · 3c840d1a
      Yi Wu 提交于
      Summary:
      Allow user to reduce number of levels in LSM by issue a full CompactRange() and put the result in a lower level, and then reopen DB with reduced options.num_levels. Previous this will fail on reopen on when recovery replaying the previous MANIFEST and found a historical file was on a higher level than the new options.num_levels. The workaround was after CompactRange(), reopen the DB with old num_levels, which will create a new MANIFEST, and then reopen the DB again with new num_levels.
      
      This patch relax the check of levels during recovery. It allows DB to open if there was a historical file on level > options.num_levels, but was also deleted.
      Closes https://github.com/facebook/rocksdb/pull/2740
      
      Differential Revision: D5629354
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 545903f6b36b6083e8cbaf777176aef2f488021d
      3c840d1a
  10. 04 8月, 2017 1 次提交
    • A
      Introduce bottom-pri thread pool for large universal compactions · cc01985d
      Andrew Kryczka 提交于
      Summary:
      When we had a single thread pool for compactions, a thread could be busy for a long time (minutes) executing a compaction involving the bottom level. In multi-instance setups, the entire thread pool could be consumed by such bottom-level compactions. Then, top-level compactions (e.g., a few L0 files) would be blocked for a long time ("head-of-line blocking"). Such top-level compactions are critical to prevent compaction stalls as they can quickly reduce number of L0 files / sorted runs.
      
      This diff introduces a bottom-priority queue for universal compactions including the bottom level. This alleviates the head-of-line blocking situation for fast, top-level compactions.
      
      - Added `Env::Priority::BOTTOM` thread pool. This feature is only enabled if user explicitly configures it to have a positive number of threads.
      - Changed `ThreadPoolImpl`'s default thread limit from one to zero. This change is invisible to users as we call `IncBackgroundThreadsIfNeeded` on the low-pri/high-pri pools during `DB::Open` with values of at least one. It is necessary, though, for bottom-pri to start with zero threads so the feature is disabled by default.
      - Separated `ManualCompaction` into two parts in `PrepickedCompaction`. `PrepickedCompaction` is used for any compaction that's picked outside of its execution thread, either manual or automatic.
      - Forward universal compactions involving last level to the bottom pool (worker thread's entry point is `BGWorkBottomCompaction`).
      - Track `bg_bottom_compaction_scheduled_` so we can wait for bottom-level compactions to finish. We don't count them against the background jobs limits. So users of this feature will get an extra compaction for free.
      Closes https://github.com/facebook/rocksdb/pull/2580
      
      Differential Revision: D5422916
      
      Pulled By: ajkr
      
      fbshipit-source-id: a74bd11f1ea4933df3739b16808bb21fcd512333
      cc01985d
  11. 28 7月, 2017 2 次提交
    • A
      fix asan/valgrind for TableCache cleanup · 710411ae
      Andrew Kryczka 提交于
      Summary:
      Breaking commit: d12691b8
      
      In the above commit, I moved the `TableCache` cleanup logic from `Version` destructor into `PurgeObsoleteFiles`. I missed cleaning up `TableCache` entries for the current `Version` during DB destruction.
      
      This PR adds that logic to `VersionSet` destructor. One unfortunate side effect is now we're potentially deleting `TableReader`s after `column_family_set_.reset()`, which means we can't call `BlockBasedTableReader::Close` a second time as the block cache might already be destroyed.
      Closes https://github.com/facebook/rocksdb/pull/2662
      
      Differential Revision: D5515108
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2cb820e19aa813e0d258d17f76b2d7b6b7ee0b18
      710411ae
    • A
      move TableCache::EraseHandle outside of db mutex · d12691b8
      Andrew Kryczka 提交于
      Summary:
      Post-compaction work holds onto db mutex for the longest time (found by tracing lock acquires/releases with LTTng and correlating timestamps with our info log). Further experimentation showed `TableCache::EraseHandle` is responsible for ~86% of time mutex is held. We can just release the handle outside the db mutex.
      Closes https://github.com/facebook/rocksdb/pull/2654
      
      Differential Revision: D5507126
      
      Pulled By: ajkr
      
      fbshipit-source-id: 703c01ddf2aea16bc0f9e33c08935d78aa6b781d
      d12691b8
  12. 22 7月, 2017 2 次提交
  13. 16 7月, 2017 1 次提交
  14. 29 6月, 2017 1 次提交
    • M
      Improve Status message for block checksum mismatches · 397ab111
      Mike Kolupaev 提交于
      Summary:
      We've got some DBs where iterators return Status with message "Corruption: block checksum mismatch" all the time. That's not very informative. It would be much easier to investigate if the error message contained the file name - then we would know e.g. how old the corrupted file is, which would be very useful for finding the root cause. This PR adds file name, offset and other stuff to some block corruption-related status messages.
      
      It doesn't improve all the error messages, just a few that were easy to improve. I'm mostly interested in "block checksum mismatch" and "Bad table magic number" since they're the only corruption errors that I've ever seen in the wild.
      Closes https://github.com/facebook/rocksdb/pull/2507
      
      Differential Revision: D5345702
      
      Pulled By: al13n321
      
      fbshipit-source-id: fc8023d43f1935ad927cef1b9c55481ab3cb1339
      397ab111
  15. 28 6月, 2017 1 次提交
    • S
      FIFO Compaction with TTL · 1cd45cd1
      Sagar Vemuri 提交于
      Summary:
      Introducing FIFO compactions with TTL.
      
      FIFO compaction is based on size only which makes it tricky to enable in production as use cases can have organic growth. A user requested an option to drop files based on the time of their creation instead of the total size.
      
      To address that request:
      - Added a new TTL option to FIFO compaction options.
      - Updated FIFO compaction score to take TTL into consideration.
      - Added a new table property, creation_time, to keep track of when the SST file is created.
      - Creation_time is set as below:
        - On Flush: Set to the time of flush.
        - On Compaction: Set to the max creation_time of all the files involved in the compaction.
        - On Repair and Recovery: Set to the time of repair/recovery.
        - Old files created prior to this code change will have a creation_time of 0.
      - FIFO compaction with TTL is enabled when ttl > 0. All files older than ttl will be deleted during compaction. i.e. `if (file.creation_time < (current_time - ttl)) then delete(file)`. This will enable cases where you might want to delete all files older than, say, 1 day.
      - FIFO compaction will fall back to the prior way of deleting files based on size if:
        - the creation_time of all files involved in compaction is 0.
        - the total size (of all SST files combined) does not drop below `compaction_options_fifo.max_table_files_size` even if the files older than ttl are deleted.
      
      This feature is not supported if max_open_files != -1 or with table formats other than Block-based.
      
      **Test Plan:**
      Added tests.
      
      **Benchmark results:**
      Base: FIFO with max size: 100MB ::
      ```
      svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100
      
      readwhilewriting :       1.924 micros/op 519858 ops/sec;   13.6 MB/s (1176277 of 5000000 found)
      ```
      
      With TTL (a low one for testing) ::
      ```
      svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100 --fifo_compaction_ttl=20
      
      readwhilewriting :       1.902 micros/op 525817 ops/sec;   13.7 MB/s (1185057 of 5000000 found)
      ```
      Example Log lines:
      ```
      2017/06/26-15:17:24.609249 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609177) [db/compaction_picker.cc:1471] [default] FIFO compaction: picking file 40 with creation time 1498515423 for deletion
      2017/06/26-15:17:24.609255 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609234) [db/db_impl_compaction_flush.cc:1541] [default] Deleted 1 files
      ...
      2017/06/26-15:17:25.553185 7fd5a61a5800 [DEBUG] [db/db_impl_files.cc:309] [JOB 0] Delete /dev/shm/dbbench/000040.sst type=2 #40 -- OK
      2017/06/26-15:17:25.553205 7fd5a61a5800 EVENT_LOG_v1 {"time_micros": 1498515445553199, "job": 0, "event": "table_file_deletion", "file_number": 40}
      ```
      
      SST Files remaining in the dbbench dir, after db_bench execution completed:
      ```
      svemuri@dev15905 ~/rocksdb (fifo-compaction)  $ ls -l /dev/shm//dbbench/*.sst
      -rw-r--r--. 1 svemuri users 30749887 Jun 26 15:17 /dev/shm//dbbench/000042.sst
      -rw-r--r--. 1 svemuri users 30768779 Jun 26 15:17 /dev/shm//dbbench/000044.sst
      -rw-r--r--. 1 svemuri users 30757481 Jun 26 15:17 /dev/shm//dbbench/000046.sst
      ```
      Closes https://github.com/facebook/rocksdb/pull/2480
      
      Differential Revision: D5305116
      
      Pulled By: sagar0
      
      fbshipit-source-id: 3e5cfcf5dd07ed2211b5b37492eb235b45139174
      1cd45cd1
  16. 25 6月, 2017 1 次提交
    • M
      Optimize for serial commits in 2PC · 499ebb3a
      Maysam Yabandeh 提交于
      Summary:
      Throughput: 46k tps in our sysbench settings (filling the details later)
      
      The idea is to have the simplest change that gives us a reasonable boost
      in 2PC throughput.
      
      Major design changes:
      1. The WAL file internal buffer is not flushed after each write. Instead
      it is flushed before critical operations (WAL copy via fs) or when
      FlushWAL is called by MySQL. Flushing the WAL buffer is also protected
      via mutex_.
      2. Use two sequence numbers: last seq, and last seq for write. Last seq
      is the last visible sequence number for reads. Last seq for write is the
      next sequence number that should be used to write to WAL/memtable. This
      allows to have a memtable write be in parallel to WAL writes.
      3. BatchGroup is not used for writes. This means that we can have
      parallel writers which changes a major assumption in the code base. To
      accommodate for that i) allow only 1 WriteImpl that intends to write to
      memtable via mem_mutex_--which is fine since in 2PC almost all of the memtable writes
      come via group commit phase which is serial anyway, ii) make all the
      parts in the code base that assumed to be the only writer (via
      EnterUnbatched) to also acquire mem_mutex_, iii) stat updates are
      protected via a stat_mutex_.
      
      Note: the first commit has the approach figured out but is not clean.
      Submitting the PR anyway to get the early feedback on the approach. If
      we are ok with the approach I will go ahead with this updates:
      0) Rebase with Yi's pipelining changes
      1) Currently batching is disabled by default to make sure that it will be
      consistent with all unit tests. Will make this optional via a config.
      2) A couple of unit tests are disabled. They need to be updated with the
      serial commit of 2PC taken into account.
      3) Replacing BatchGroup with mem_mutex_ got a bit ugly as it requires
      releasing mutex_ beforehand (the same way EnterUnbatched does). This
      needs to be cleaned up.
      Closes https://github.com/facebook/rocksdb/pull/2345
      
      Differential Revision: D5210732
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 78653bd95a35cd1e831e555e0e57bdfd695355a4
      499ebb3a
  17. 23 6月, 2017 1 次提交
    • S
      Fix Data Race Between CreateColumnFamily() and GetAggregatedIntProperty() · 6837a176
      Siying Dong 提交于
      Summary:
      CreateColumnFamily() releases DB mutex after adding column family to the set and install super version (to write option file), so if users call GetAggregatedIntProperty() in the middle, then super version will be null and the process will crash. Fix it by skipping those column families without super version installed.
      
      Maybe we should also fix the problem of releasing the lock when reading option file, but it is more risky. so I'm doing a quick and safer fix and we can investigate it later.
      Closes https://github.com/facebook/rocksdb/pull/2475
      
      Differential Revision: D5298053
      
      Pulled By: siying
      
      fbshipit-source-id: 4b3c8f91c60400b163fcc6cda8a0c77723be0ef6
      6837a176
  18. 13 6月, 2017 1 次提交
  19. 12 6月, 2017 1 次提交
    • S
      Sample number of reads per SST file · 5582123d
      Siying Dong 提交于
      Summary:
      We estimate number of reads per SST files, by updating the counter per file in sampled read requests. This information can later be used to trigger compactions to improve read performacne.
      Closes https://github.com/facebook/rocksdb/pull/2417
      
      Differential Revision: D5193528
      
      Pulled By: siying
      
      fbshipit-source-id: b4241c5ad0eaf444b61afb53f8e6290d9f5da2df
      5582123d
  20. 03 6月, 2017 1 次提交
    • M
      Fix interaction between CompactionFilter::Decision::kRemoveAndSkipUnt… · 138b87ea
      Mike Kolupaev 提交于
      Summary:
      Fixes the following scenario:
       1. Set prefix extractor. Enable bloom filters, with `whole_key_filtering = false`. Use compaction filter that sometimes returns `kRemoveAndSkipUntil`.
       2. Do a compaction.
       3. Compaction creates an iterator with `total_order_seek = false`, calls `SeekToFirst()` on it, then repeatedly calls `Next()`.
       4. At some point compaction filter returns `kRemoveAndSkipUntil`.
       5. Compaction calls `Seek(skip_until)` on the iterator. The key that it seeks to happens to have prefix that doesn't match the bloom filter. Since `total_order_seek = false`, iterator becomes invalid, and compaction thinks that it has reached the end. The rest of the compaction input is silently discarded.
      
      The fix is to make compaction iterator use `total_order_seek = true`.
      
      The implementation for PlainTable is quite awkward. I've made `kRemoveAndSkipUntil` officially incompatible with PlainTable. If you try to use them together, compaction will fail, and DB will enter read-only mode (`bg_error_`). That's not a very graceful way to communicate a misconfiguration, but the alternatives don't seem worth the implementation time and complexity. To be able to check in advance that `kRemoveAndSkipUntil` is not going to be used with PlainTable, we'd need to extend the interface of either `CompactionFilter` or `InternalIterator`. It seems unlikely that anyone will ever want to use `kRemoveAndSkipUntil` with PlainTable: PlainTable probably has very few users, and `kRemoveAndSkipUntil` has only one user so far: us (logdevice).
      Closes https://github.com/facebook/rocksdb/pull/2349
      
      Differential Revision: D5110388
      
      Pulled By: lightmark
      
      fbshipit-source-id: ec29101a99d9dcd97db33923b87f72bce56cc17a
      138b87ea
  21. 02 6月, 2017 2 次提交
    • A
      Fix TSAN: avoid arena mode with range deletions · 215076ef
      Andrew Kryczka 提交于
      Summary:
      The range deletion meta-block iterators weren't getting cleaned up properly since they don't support arena allocation. I didn't implement arena support since, in the general case, each iterator is used only once and separately from all other iterators, so there should be no benefit to data locality.
      
      Anyways, this diff fixes up #2370 by treating range deletion iterators as non-arena-allocated.
      Closes https://github.com/facebook/rocksdb/pull/2399
      
      Differential Revision: D5171119
      
      Pulled By: ajkr
      
      fbshipit-source-id: bef6f5c4c5905a124f4993945aed4bd86e2807d8
      215076ef
    • A
      account for L0 size in estimated compaction bytes · 3a8a848a
      Andrew Kryczka 提交于
      Summary:
      also changed the `>` in the comparison against `level0_file_num_compaction_trigger` into a `>=` since exactly `level0_file_num_compaction_trigger` can trigger a compaction from L0.
      Closes https://github.com/facebook/rocksdb/pull/2179
      
      Differential Revision: D4915772
      
      Pulled By: ajkr
      
      fbshipit-source-id: e38fec6253de6f9a40e61734615c6670d84038aa
      3a8a848a
  22. 01 6月, 2017 1 次提交
    • A
      Support ingest file when range deletions exist · 9c9909bf
      Andrew Kryczka 提交于
      Summary:
      Previously we returned NotSupported when ingesting files into a database containing any range deletions. This diff adds the support.
      
      - Flush if any memtable contains range deletions overlapping the to-be-ingested file
      - Place to-be-ingested file before any level that contains range deletions overlapping it.
      - Added support for `Version` to return iterators over range deletions in a given level. Previously, we piggybacked getting range deletions onto `Version`'s `Get()` / `AddIterator()` functions by passing them a `RangeDelAggregator*`. But file ingestion needs to get iterators over range deletions, not populate an aggregator (since the aggregator does collapsing and doesn't expose the actual ranges).
      Closes https://github.com/facebook/rocksdb/pull/2370
      
      Differential Revision: D5127648
      
      Pulled By: ajkr
      
      fbshipit-source-id: 816faeb9708adfa5287962bafdde717db56e3f1a
      9c9909bf
  23. 31 5月, 2017 1 次提交
  24. 27 5月, 2017 1 次提交
  25. 23 5月, 2017 1 次提交
  26. 06 5月, 2017 1 次提交
    • A
      do not read next datablock if upperbound is reached · a30a6960
      Aaron Gao 提交于
      Summary:
      Now if we have iterate_upper_bound set, we continue read until get a key >= upper_bound. For a lot of cases that neighboring data blocks have a user key gap between them, our index key will be a user key in the middle to get a shorter size. For example, if we have blocks:
      [a b c d][f g h]
      Then the index key for the first block will be 'e'.
      then if upper bound is any key between 'd' and 'e', for example, d1, d2, ..., d99999999999, we don't have to read the second block and also know that we have done our iteration by reaching the last key that smaller the upper bound already.
      
      This diff can reduce RA in most cases.
      Closes https://github.com/facebook/rocksdb/pull/2239
      
      Differential Revision: D4990693
      
      Pulled By: lightmark
      
      fbshipit-source-id: ab30ea2e3c6edf3fddd5efed3c34fcf7739827ff
      a30a6960
  27. 05 5月, 2017 3 次提交
    • S
      Allow IntraL0 compaction in FIFO Compaction · 264d3f54
      Siying Dong 提交于
      Summary:
      Allow an option for users to do some compaction in FIFO compaction, to pay some write amplification for fewer number of files.
      Closes https://github.com/facebook/rocksdb/pull/2163
      
      Differential Revision: D4895953
      
      Pulled By: siying
      
      fbshipit-source-id: a1ab608dd0627211f3e1f588a2e97159646e1231
      264d3f54
    • A
      Set lower-bound on dynamic level sizes · 8c3a180e
      Andrew Kryczka 提交于
      Summary:
      Changed dynamic leveling to stop setting the base level's size bound below `max_bytes_for_level_base`.
      
      Behavior for config where `max_bytes_for_level_base == level0_file_num_compaction_trigger * write_buffer_size` and same amount of data in L0 and base-level:
      
      - Before #2027, compaction scoring would favor base-level due to dividing by size smaller than `max_bytes_for_level_base`.
      - After #2027, L0 and Lbase get equal scores. The disadvantage is L0 is often compacted before reaching the num files trigger since `write_buffer_size` can be bigger than the dynamically chosen base-level size. This increases write-amp.
      - After this diff, L0 and Lbase still get equal scores. Now it takes `level0_file_num_compaction_trigger` files of size `write_buffer_size` to trigger L0 compaction by size, fixing the write-amp problem above.
      Closes https://github.com/facebook/rocksdb/pull/2123
      
      Differential Revision: D4861570
      
      Pulled By: ajkr
      
      fbshipit-source-id: 467ddef56ed1f647c14d86bb018bcb044c39b964
      8c3a180e
    • L
      max_open_files dynamic set, follow up · a45e98a5
      Leonidas Galanis 提交于
      Summary:
      Followup to make 0x40000 a TableCache constant that indicates infinite capacity
      Closes https://github.com/facebook/rocksdb/pull/2247
      
      Differential Revision: D5001349
      
      Pulled By: lgalanis
      
      fbshipit-source-id: ce7bd2e54b0975bb9f8680fdaa0f8bb0e7ae81a2
      a45e98a5
  28. 04 5月, 2017 1 次提交
    • L
      Max open files mutable · e7ae4a3a
      Leonidas Galanis 提交于
      Summary:
      Makes max_open_files db option dynamically set-able by SetDBOptions. During the call of SetDBOptions we call SetCapacity on the table cache, which is a LRUCache.
      Closes https://github.com/facebook/rocksdb/pull/2185
      
      Differential Revision: D4979189
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: ca7e8dc5e3619c79434f579be4847c0f7e56afda
      e7ae4a3a
  29. 28 4月, 2017 1 次提交
  30. 20 4月, 2017 1 次提交
  31. 14 4月, 2017 1 次提交
    • A
      change use_direct_writes to use_direct_io_for_flush_and_compaction · 44fa8ece
      Aaron Gao 提交于
      Summary:
      Replace Options::use_direct_writes with Options::use_direct_io_for_flush_and_compaction
      Now if Options::use_direct_io_for_flush_and_compaction = true, we will enable direct io for both reads and writes for flush and compaction job. Whereas Options::use_direct_reads controls user reads like iterator and Get().
      Closes https://github.com/facebook/rocksdb/pull/2117
      
      Differential Revision: D4860912
      
      Pulled By: lightmark
      
      fbshipit-source-id: d93575a8a5e780cf7e40797287edc425ee648c19
      44fa8ece
  32. 07 4月, 2017 1 次提交
    • S
      Move various string utility functions into string_util · 343b59d6
      Sagar Vemuri 提交于
      Summary:
      This is an effort to club all string related utility functions into one common place, in string_util, so that it is easier for everyone to know what string processing functions are available. Right now they seem to be spread out across multiple modules, like logging and options_helper.
      
      Check the sub-commits for easier reviewing.
      Closes https://github.com/facebook/rocksdb/pull/2094
      
      Differential Revision: D4837730
      
      Pulled By: sagar0
      
      fbshipit-source-id: 344278a
      343b59d6
  33. 06 4月, 2017 1 次提交
  34. 05 4月, 2017 1 次提交
    • A
      Level-based L0->L0 compaction · d659faad
      Andrew Kryczka 提交于
      Summary:
      Level-based L0->L0 compaction operates on spans of files that aren't currently being compacted. It reduces the number of L0 files, thus making write stall conditions harder to reach.
      
      - L0->L0 is triggered when base level is unavailable due to pending compactions
      - L0->L0 always outputs one file of at most `max_level0_burst_file_size` bytes.
      - Subcompactions are disabled for L0->L0 since we want to output one file.
      - Input files are chosen as the longest span of available files that will fit within the size limit. This minimizes number of files in L0.
      Closes https://github.com/facebook/rocksdb/pull/2027
      
      Differential Revision: D4760318
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9d07183
      d659faad
  35. 04 4月, 2017 1 次提交