1. 05 2月, 2019 1 次提交
    • M
      WritePrepared: release snapshot equal to max (#4944) · dcb73e77
      Maysam Yabandeh 提交于
      Summary:
      WritePrepared maintains a list of snapshots that are <= max_evicted_seq_. Based on this list, old_commit_map_ is updated if an evicted commit entry overlaps with such snapshot. Such lists are garbage collected when the release of snapshot is reported to WritePreparedTxnDB, which is the next time max_evicted_seq_ is updated and yet the snapshot is not found is the list returned from DB. This logic was broken since ReleaseSnapshotInternal was using "< max_evicted_seq_" to cleanup old_commit_map_, which would leave a snapshot uncleaned if it "= max_evicted_seq_". The patch fixes that and adds a unit test to check for the bug.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4944
      
      Differential Revision: D13945000
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 0c904294f735911f52348a148bf1f945282fc17c
      dcb73e77
  2. 02 2月, 2019 2 次提交
  3. 01 2月, 2019 4 次提交
    • Y
      fix for nvme device path (#4866) · 4091597c
      Young Tack Jin 提交于
      Summary:
      nvme device path doesn't have "block" as like "nvme/nvme0/nvme0n1"
      or "nvme/nvme0/nvme0n1/nvme0n1p1". the last directory such as
      "nvme0n1p1" should be removed if nvme drive is partitioned.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4866
      
      Differential Revision: D13627824
      
      Pulled By: riversand963
      
      fbshipit-source-id: 09ab968f349f3dbb890beea20193f1359b17d317
      4091597c
    • Y
      Use correct FileMeta for atomic flush result install (#4932) · 842cdc11
      Yanqin Jin 提交于
      Summary:
      1. this commit fixes our handling of a combination of two separate edge
      cases. If a flush job does not pick any memtable to flush (because another
      flush job has already picked the same memtables), and the column family
      assigned to the flush job is dropped right before RocksDB calls
      rocksdb::InstallMemtableAtomicFlushResults, our original code passes
      a FileMetaData object whose file number is 0, failing the assertion in
      rocksdb::InstallMemtableAtomicFlushResults (assert(m->GetFileNumber() > 0)).
      2. Also piggyback a small change: since we already create a local copy of column family's mutable CF options to eliminate potential race condition with `SetOptions` call, we might as well use the local copy in other function calls in the same scope.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4932
      
      Differential Revision: D13901322
      
      Pulled By: riversand963
      
      fbshipit-source-id: b936580af7c127ea0c6c19ea10cd5fcede9fb0f9
      842cdc11
    • A
      Fix `WriteBatchBase::DeleteRange` API comment (#4935) · 0ea57115
      Andrew Kryczka 提交于
      Summary:
      The `DeleteRange` end key is exclusive, not inclusive. Updated API comment accordingly.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4935
      
      Differential Revision: D13905406
      
      Pulled By: ajkr
      
      fbshipit-source-id: f577db841a279427991ecf9005cd56b30c8eb3c7
      0ea57115
    • M
      Take snapshots once for all cf flushes (#4934) · 35e5689e
      Maysam Yabandeh 提交于
      Summary:
      FlushMemTablesToOutputFiles calls FlushMemTableToOutputFile for each column family. The patch moves the take-snapshot logic to outside FlushMemTableToOutputFile so that it does it once for all the flushes. This also addresses a deadlock issue for resetting the managed snapshot of job_snapshot in the 2nd call to FlushMemTableToOutputFile.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4934
      
      Differential Revision: D13900747
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f3cd650c5fff24cf95c1aaf8a10c149d42bf042c
      35e5689e
  4. 30 1月, 2019 3 次提交
  5. 29 1月, 2019 5 次提交
  6. 26 1月, 2019 2 次提交
  7. 25 1月, 2019 2 次提交
  8. 24 1月, 2019 8 次提交
  9. 23 1月, 2019 5 次提交
    • Z
      add cast to avoid loss of precision error (#4906) · cbe02392
      Zhongyi Xie 提交于
      Summary:
      this PR address the following error:
      > tools/db_bench_tool.cc:4776:68: error: implicit conversion loses integer precision: 'int64_t' (aka 'long') to 'unsigned int' [-Werror,-Wshorten-64-to-32]
              s = db_with_cfh->db->Put(write_options_, key, gen.Generate(value_size));
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4906
      
      Differential Revision: D13780185
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 1c83a77d341099518c72f0f4a63e97ab9c4784b3
      cbe02392
    • S
      Deleting Blob files also goes through SstFileManager (#4904) · 08b8cea6
      Siying Dong 提交于
      Summary:
      Right now, deleting blob files is not rate limited, even if SstFileManger is specified.
      On the other hand, rate limiting blob deletion is not supported. With this change, Blob file
      deletion will go through SstFileManager too.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4904
      
      Differential Revision: D13772545
      
      Pulled By: siying
      
      fbshipit-source-id: bd1b1d0beb26d5167385e00b7ecb8b94b879de84
      08b8cea6
    • P
      Add load() statements to TARGETS files · b2ba0685
      Philip Jameson 提交于
      Reviewed By: luciang
      
      Differential Revision: D13733578
      
      fbshipit-source-id: 556c115935aa42c1da85ec0e91199b9f198fc467
      b2ba0685
    • S
      Remove unused Blob WAL filter (#4896) · 8189c184
      Sagar Vemuri 提交于
      Summary:
      Remove unused blob WAL filter so that users are not confused.
      I was initially under the impression that we have WAL Filter support in BlobDB.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4896
      
      Differential Revision: D13725709
      
      Pulled By: sagar0
      
      fbshipit-source-id: f997d7546e138a474036e88b957907cc714327f1
      8189c184
    • Z
      Generate mixed workload with Get, Put, Seek in db_bench (#4788) · ce8e88d2
      Zhichao Cao 提交于
      Summary:
      Based on the specific workload models (key access distribution, value size distribution, and iterator scan length distribution, the QPS variation), the MixGraph benchmark generate the synthetic workload according to these distributions which can reflect the real-world workload characteristics.
      
      After user enable the tracing function, they will get the trace file. By analyzing the trace file with the trace_analyzer tool, user can generate a set of statistic data files including. The *_accessed_key_stats.txt,  *-accessed_value_size_distribution.txt, *-iterator_length_distribution.txt, and *-qps_stats.txt are mainly used to fit the Matlab model fitting. After that, user can get the parameters of the workload distributions (the modeling details are described: [here](https://github.com/facebook/rocksdb/wiki/RocksDB-Trace%2C-Replay%2C-and-Analyzer))
      
      The key access distribution follows the The two-term power model. The probability density function is: `f(x) = ax^{b}+c`. The corresponding parameters are key_dist_a, key_dist_b, and key_dist_c in db_bench
      
      For the value size distribution and iterator scan length distribution, they both follow the Generalized Pareto Distribution. The probability density function is `f(x) = (1/sigma)(1+k*(x-theta)/sigma))^{-1-1/k)`. The parameters are: value_k, value_theta, value_sigma and iter_k, iter_theta, iter_sigma. For more information about the Generalized Pareto Distribution, users can find the [wiki](https://en.wikipedia.org/wiki/Generalized_Pareto_distribution) and [Matalb page](https://www.mathworks.com/help/stats/generalized-pareto-distribution.html)
      
      As for the QPS, it follows the diurnal pattern. So Sine is a good model to fit it. `F(x) = sine_a*sin(sine_b*x + sine_c) + sine_d`. The trace_will tell you the average QPS in the print out resutls, which is sine_d. After user fit the "*-qps_stats.txt" to the Matlab model, user can get the sine_a, sine_b, and sine_c. By using the 4 parameters, user can control the QPS variation including the period, average, changes.
      
      To use the bench mark, user can indicate the following parameters as examples:
      ```
      -benchmarks="mixgraph" -key_dist_a=0.002312 -key_dist_b=0.3467 -value_k=0.9233 -value_sigma=226.4092 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.7 -mix_put_ratio=0.25 -mix_seek_ratio=0.05 -sine_mix_rate_interval_milliseconds=500 -sine_a=15000 -sine_b=1 -sine_d=20000
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4788
      
      Differential Revision: D13573940
      
      Pulled By: sagar0
      
      fbshipit-source-id: e184c27e07b4f1bc0b436c2be36c5090c1fb0222
      ce8e88d2
  10. 20 1月, 2019 1 次提交
  11. 19 1月, 2019 3 次提交
    • A
      Digest ZSTD compression dictionary once when writing SST file (#4849) · 01013ae7
      Andrew Kryczka 提交于
      Summary:
      This is essentially a re-submission of #4251 with a few improvements:
      
      - Split `CompressionDict` into two separate classes: `CompressionDict` and `UncompressionDict`
      - Eliminated `Init` functions. Instead do all initialization work in constructors.
      - Added test case for parallel DB open, which is the scenario where #4251 failed under TSAN.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4849
      
      Differential Revision: D13606039
      
      Pulled By: ajkr
      
      fbshipit-source-id: 08c236059798c710db9cbf545fce0f371232d447
      01013ae7
    • Y
      WritePrepared: fix two versions in compaction see different status for released snapshots (#4890) · b1ad6ebb
      Yi Wu 提交于
      Summary:
      Fix how CompactionIterator::findEarliestVisibleSnapshots handles released snapshot. It fixing the two scenarios:
      
      Scenario 1:
      key1 has two values v1 and v2. There're two snapshots s1 and s2 taken after v1 and v2 are committed. Right after compaction output v2, s1 is released. Now findEarliestVisibleSnapshot may see s1 being released, and return the next snapshot, which is s2. That's larger than v2's earliest visible snapshot, which was s1.
      The fix: the only place we check against last snapshot and current key snapshot is when we decide whether to compact out a value if it is hidden by a later value. In the check if we see current snapshot is even larger than last snapshot, we know last snapshot is released, and we are safe to compact out current key.
      
      Scenario 2:
      key1 has two values v1 and v2. there are two snapshots s1 and s2 taken after v1 and v2 are committed. During compaction before we process the key, s1 is released. When compaction process v2, snapshot checker may return kSnapshotReleased, and the earliest visible snapshot for v2 become s2. When compaction process v1, snapshot checker may return kIsInSnapshot (for WritePrepared transaction, it could be because v1 is still in commit cache). The result will become inconsistent here.
      The fix: remember the set of released snapshots ever reported by snapshot checker, and ignore them when finding result for findEarliestVisibleSnapshot.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4890
      
      Differential Revision: D13705538
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: e577f0d9ee1ff5a6035f26859e56902ecc85a5a4
      b1ad6ebb
    • M
      WritePrepared: commit of delayed prepared entries (#4894) · 7fd9813b
      Maysam Yabandeh 提交于
      Summary:
      Here is the order of ops in a commit: 1) update commit cache 2) publish seq, 3) RemovePrepared. In case of a delayed prepared, there will be a gap between when the commit is visible to snapshots until delayed_prepared_ is cleaned up. To tell apart this case from a delayed uncommitted txn from, the commit entry of a delayed prepared is also stored in delayed_prepared_commits_, which is updated before publishing the commit.
      Also logic in GetSnapshotInternal that ensures that each new snapshot is always larger than max_evicted_seq_ is updated to check against the upcoming value of max_evicted_seq_ rather than its current one. This is because AdvanceMaxEvictedSeq gets the list of snapshots lower than the new max, before updating max_evicted_seq_.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4894
      
      Differential Revision: D13726988
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 1e70d78061b50c944c9816bf4b6dac405ab4ccd3
      7fd9813b
  12. 18 1月, 2019 2 次提交
  13. 17 1月, 2019 2 次提交
    • S
      Remove an unused option (#4888) · 3cfc7515
      Sagar Vemuri 提交于
      Summary:
      Remove `garbage_collection_deletion_size_threshold` as it is not used anywhere.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4888
      
      Differential Revision: D13685982
      
      Pulled By: sagar0
      
      fbshipit-source-id: e08d3017b9a0c8fa99bc332b595ee4ed9db70c87
      3cfc7515
    • Y
      WritePrepared: fix issue with snapshot released during compaction (#4858) · 128f5328
      Yi Wu 提交于
      Summary:
      Compaction iterator keep a copy of list of live snapshots at the beginning of compaction, and then query snapshot checker to verify if values of a sequence number is visible to these snapshots. However when the snapshot is released in the middle of compaction, the snapshot checker implementation (i.e. WritePreparedSnapshotChecker) may remove info with the snapshot and may report incorrect result, which lead to values being compacted out when it shouldn't. This patch conservatively keep the values if snapshot checker determines that the snapshots is released.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4858
      
      Differential Revision: D13617146
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: cf18a94f6f61a94bcff73c280f117b224af5fbc3
      128f5328