1. 24 1月, 2019 5 次提交
  2. 23 1月, 2019 5 次提交
    • Z
      add cast to avoid loss of precision error (#4906) · cbe02392
      Zhongyi Xie 提交于
      Summary:
      this PR address the following error:
      > tools/db_bench_tool.cc:4776:68: error: implicit conversion loses integer precision: 'int64_t' (aka 'long') to 'unsigned int' [-Werror,-Wshorten-64-to-32]
              s = db_with_cfh->db->Put(write_options_, key, gen.Generate(value_size));
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4906
      
      Differential Revision: D13780185
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 1c83a77d341099518c72f0f4a63e97ab9c4784b3
      cbe02392
    • S
      Deleting Blob files also goes through SstFileManager (#4904) · 08b8cea6
      Siying Dong 提交于
      Summary:
      Right now, deleting blob files is not rate limited, even if SstFileManger is specified.
      On the other hand, rate limiting blob deletion is not supported. With this change, Blob file
      deletion will go through SstFileManager too.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4904
      
      Differential Revision: D13772545
      
      Pulled By: siying
      
      fbshipit-source-id: bd1b1d0beb26d5167385e00b7ecb8b94b879de84
      08b8cea6
    • P
      Add load() statements to TARGETS files · b2ba0685
      Philip Jameson 提交于
      Reviewed By: luciang
      
      Differential Revision: D13733578
      
      fbshipit-source-id: 556c115935aa42c1da85ec0e91199b9f198fc467
      b2ba0685
    • S
      Remove unused Blob WAL filter (#4896) · 8189c184
      Sagar Vemuri 提交于
      Summary:
      Remove unused blob WAL filter so that users are not confused.
      I was initially under the impression that we have WAL Filter support in BlobDB.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4896
      
      Differential Revision: D13725709
      
      Pulled By: sagar0
      
      fbshipit-source-id: f997d7546e138a474036e88b957907cc714327f1
      8189c184
    • Z
      Generate mixed workload with Get, Put, Seek in db_bench (#4788) · ce8e88d2
      Zhichao Cao 提交于
      Summary:
      Based on the specific workload models (key access distribution, value size distribution, and iterator scan length distribution, the QPS variation), the MixGraph benchmark generate the synthetic workload according to these distributions which can reflect the real-world workload characteristics.
      
      After user enable the tracing function, they will get the trace file. By analyzing the trace file with the trace_analyzer tool, user can generate a set of statistic data files including. The *_accessed_key_stats.txt,  *-accessed_value_size_distribution.txt, *-iterator_length_distribution.txt, and *-qps_stats.txt are mainly used to fit the Matlab model fitting. After that, user can get the parameters of the workload distributions (the modeling details are described: [here](https://github.com/facebook/rocksdb/wiki/RocksDB-Trace%2C-Replay%2C-and-Analyzer))
      
      The key access distribution follows the The two-term power model. The probability density function is: `f(x) = ax^{b}+c`. The corresponding parameters are key_dist_a, key_dist_b, and key_dist_c in db_bench
      
      For the value size distribution and iterator scan length distribution, they both follow the Generalized Pareto Distribution. The probability density function is `f(x) = (1/sigma)(1+k*(x-theta)/sigma))^{-1-1/k)`. The parameters are: value_k, value_theta, value_sigma and iter_k, iter_theta, iter_sigma. For more information about the Generalized Pareto Distribution, users can find the [wiki](https://en.wikipedia.org/wiki/Generalized_Pareto_distribution) and [Matalb page](https://www.mathworks.com/help/stats/generalized-pareto-distribution.html)
      
      As for the QPS, it follows the diurnal pattern. So Sine is a good model to fit it. `F(x) = sine_a*sin(sine_b*x + sine_c) + sine_d`. The trace_will tell you the average QPS in the print out resutls, which is sine_d. After user fit the "*-qps_stats.txt" to the Matlab model, user can get the sine_a, sine_b, and sine_c. By using the 4 parameters, user can control the QPS variation including the period, average, changes.
      
      To use the bench mark, user can indicate the following parameters as examples:
      ```
      -benchmarks="mixgraph" -key_dist_a=0.002312 -key_dist_b=0.3467 -value_k=0.9233 -value_sigma=226.4092 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.7 -mix_put_ratio=0.25 -mix_seek_ratio=0.05 -sine_mix_rate_interval_milliseconds=500 -sine_a=15000 -sine_b=1 -sine_d=20000
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4788
      
      Differential Revision: D13573940
      
      Pulled By: sagar0
      
      fbshipit-source-id: e184c27e07b4f1bc0b436c2be36c5090c1fb0222
      ce8e88d2
  3. 20 1月, 2019 1 次提交
  4. 19 1月, 2019 3 次提交
    • A
      Digest ZSTD compression dictionary once when writing SST file (#4849) · 01013ae7
      Andrew Kryczka 提交于
      Summary:
      This is essentially a re-submission of #4251 with a few improvements:
      
      - Split `CompressionDict` into two separate classes: `CompressionDict` and `UncompressionDict`
      - Eliminated `Init` functions. Instead do all initialization work in constructors.
      - Added test case for parallel DB open, which is the scenario where #4251 failed under TSAN.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4849
      
      Differential Revision: D13606039
      
      Pulled By: ajkr
      
      fbshipit-source-id: 08c236059798c710db9cbf545fce0f371232d447
      01013ae7
    • Y
      WritePrepared: fix two versions in compaction see different status for released snapshots (#4890) · b1ad6ebb
      Yi Wu 提交于
      Summary:
      Fix how CompactionIterator::findEarliestVisibleSnapshots handles released snapshot. It fixing the two scenarios:
      
      Scenario 1:
      key1 has two values v1 and v2. There're two snapshots s1 and s2 taken after v1 and v2 are committed. Right after compaction output v2, s1 is released. Now findEarliestVisibleSnapshot may see s1 being released, and return the next snapshot, which is s2. That's larger than v2's earliest visible snapshot, which was s1.
      The fix: the only place we check against last snapshot and current key snapshot is when we decide whether to compact out a value if it is hidden by a later value. In the check if we see current snapshot is even larger than last snapshot, we know last snapshot is released, and we are safe to compact out current key.
      
      Scenario 2:
      key1 has two values v1 and v2. there are two snapshots s1 and s2 taken after v1 and v2 are committed. During compaction before we process the key, s1 is released. When compaction process v2, snapshot checker may return kSnapshotReleased, and the earliest visible snapshot for v2 become s2. When compaction process v1, snapshot checker may return kIsInSnapshot (for WritePrepared transaction, it could be because v1 is still in commit cache). The result will become inconsistent here.
      The fix: remember the set of released snapshots ever reported by snapshot checker, and ignore them when finding result for findEarliestVisibleSnapshot.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4890
      
      Differential Revision: D13705538
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: e577f0d9ee1ff5a6035f26859e56902ecc85a5a4
      b1ad6ebb
    • M
      WritePrepared: commit of delayed prepared entries (#4894) · 7fd9813b
      Maysam Yabandeh 提交于
      Summary:
      Here is the order of ops in a commit: 1) update commit cache 2) publish seq, 3) RemovePrepared. In case of a delayed prepared, there will be a gap between when the commit is visible to snapshots until delayed_prepared_ is cleaned up. To tell apart this case from a delayed uncommitted txn from, the commit entry of a delayed prepared is also stored in delayed_prepared_commits_, which is updated before publishing the commit.
      Also logic in GetSnapshotInternal that ensures that each new snapshot is always larger than max_evicted_seq_ is updated to check against the upcoming value of max_evicted_seq_ rather than its current one. This is because AdvanceMaxEvictedSeq gets the list of snapshots lower than the new max, before updating max_evicted_seq_.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4894
      
      Differential Revision: D13726988
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 1e70d78061b50c944c9816bf4b6dac405ab4ccd3
      7fd9813b
  5. 18 1月, 2019 2 次提交
  6. 17 1月, 2019 3 次提交
  7. 16 1月, 2019 5 次提交
    • Y
      WritePrepared: Fix visible key compacted out by compaction (#4883) · 5d4fddfa
      Yi Wu 提交于
      Summary:
      With WritePrepared transaction, flush/compaction can contain uncommitted keys, and those keys can get committed during compaction. If a snapshot is taken before the key is committed, it should not see the key. On the other hand, compaction grab the list of snapshots at its beginning, and only consider those snapshots to dedup keys. Consider the case:
      ```
      seq = 1: put "foo" = "bar"
      seq = 2: transaction T: delete "foo", prepare
      seq = 3: compaction start
      seq = 4: take snapshot S
      seq = 5: transaction T: commit.
      ...
      seq = N: compaction iterator reached key "foo".
      ```
      When compaction start, the list of snapshot is empty. Compaction doesn't take snapshot S into account. When it reached "foo", transaction T is committed. Compaction may think the value "foo=bar" is not visible by any snapshot (which is wrong), and compact the value out.
      
      The fix is to explicitly take a snapshot before compaction grabbing the list of snapshots. Compaction will then has to keep keys visible to this snapshot.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4883
      
      Differential Revision: D13668775
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 1cab9615f94b7d3e8522cc3d44c3a14c7d4720e4
      5d4fddfa
    • M
      WritePrepared: snapshot should be larger than max_evicted_seq_ (#4886) · cad99a60
      Maysam Yabandeh 提交于
      Summary:
      The AdvanceMaxEvictedSeq algorithm assumes that new snapshots always have sequence number larger than the last max_evicted_seq_. To enforce this assumption we make two changes:
      i) max is not advanced beyond the last published seq, with the exception that the evicted commit entry itself is not published yet, which is quite rare.
      ii) When obtaining the snapshot if the max_evicted_seq_ is not published yet, commit a dummy entry so that it waits for it to be published and also increased the latest published seq by one above the max.
      To test these non-realistic corner cases we create a commit cache with size 1 so that every single commit results into eviction.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4886
      
      Differential Revision: D13685270
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 5461bc09c2a9b75798bfcb9853a256c81cdac0b0
      cad99a60
    • S
      Improve Error Message When wal_dir doesn't exist (#4874) · 7d13f307
      Siying Dong 提交于
      Summary:
      Right now the error mesage when options.wal_dir doesn't exist is not helpful to users. Be more specific
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4874
      
      Differential Revision: D13642425
      
      Pulled By: siying
      
      fbshipit-source-id: 9a3172ed0f799af233b0f3b2e5e35bc7ce04c7b5
      7d13f307
    • S
      Correct the comment about inlined blob option (#4887) · 55e03b67
      Sagar Vemuri 提交于
      Summary:
      - Corrected a comment asserting that the values "smaller" than a min_blob_size will be inlined in the base db.
      - Also fixed the type of ttl_range_secs while dumping blobdb options.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4887
      
      Differential Revision: D13680163
      
      Pulled By: sagar0
      
      fbshipit-source-id: 306c8cf2daa52210ffc334a6924ef44ffdedf887
      55e03b67
    • Y
      WritePrepared: Fix SmallestUnCommittedSeq() doesn't check delayed_prepared (#4867) · d50c10ed
      Yi Wu 提交于
      Summary:
      When prepared_txns_ heap is empty, SmallestUnCommittedSeq() should check delayed_prepared_ set as well.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4867
      
      Differential Revision: D13632134
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: b0423bb0a58dc95f1e636d5ed3f6e619df801fb7
      d50c10ed
  8. 12 1月, 2019 5 次提交
  9. 11 1月, 2019 1 次提交
  10. 10 1月, 2019 2 次提交
    • M
      Remove duplicates from SnapshotList::GetAll (#4860) · d56ac22b
      Maysam Yabandeh 提交于
      Summary:
      The vector returned by SnapshotList::GetAll could have duplicate entries if two separate snapshots have the same sequence number. However, when this vector is used in compaction the duplicate entires are of no use and could be safely ignored. Moreover not having duplicate entires simplifies reasoning in the compaction_iterator.cc code. For example when searching for the previous_snap we currently use the snapshot before the current one but the way the code uses that it expects it to be also less than the current snapshot, which would be simpler to read if there is no duplicate entry in the snapshot list.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4860
      
      Differential Revision: D13615502
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: d45bf01213ead5f39db811f951802da6fcc3332b
      d56ac22b
    • Y
      Initialize two members in PerfContext (#4859) · 75714b4c
      Yanqin Jin 提交于
      Summary:
      as titled.
      Currently it's possible to create a local object of type PerfContext since it's
      part of public API. Then it's safe to initialize the two members to 0.
      If PerfContext is created as thread-local object, then all members are
      zero-initialized according to C++ standard.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4859
      
      Differential Revision: D13614504
      
      Pulled By: riversand963
      
      fbshipit-source-id: 406ff548e105a074f379ad1054d56fece5f524a0
      75714b4c
  11. 09 1月, 2019 4 次提交
    • Y
      Free memory after use · ffc9f846
      Yanqin Jin 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4857
      
      Differential Revision: D13602688
      
      Pulled By: riversand963
      
      fbshipit-source-id: 993419a6afb982a7a701ff71daebebb4b4a6b265
      ffc9f846
    • M
      WritePrepared: Report released snapshots in IsInSnapshot (#4856) · f3a99e8a
      Maysam Yabandeh 提交于
      Summary:
      Previously IsInSnapshot assumed that the snapshot is valid at the time that the function is called. However there are cases where that might not be valid. Example is background compactions where the compaction algorithm operates with a list of snapshots some of which might be released by the time they are being passed to IsInSnapshot. The patch make two changes to enable the caller to tell difference: i) any live snapshot below max is added to max_committed_seq_, which allows IsInSnapshot to confidently tell whether the passed snapshot is invalid if it below max, ii) extends IsInSnapshot API with a "released" variable that is set true when IsInSnapshot find no such snapshot below max and also find no other way to give a certain return value. In such cases the return value is true but the caller should also check the "released" boolean after the call.
      In short here is the changes in the API:
      i) If the snapshot is valid, no change is required.
      ii) If the snapshot might be invalid, a reference to "released" boolean must be passed to IsInSnapshot.
      ii-a) If snapshot is above max, IsInSnapshot can figure the return valid using the commit cache.
      ii-b) otherwise if snapshot is in old_commit_map_, IsInSnapshot can use that to tell if value was visible to the snapshot.
      ii-c) otherwise it sets "released" to true and returns true as well.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4856
      
      Differential Revision: D13599847
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 1752be28667f886a1efec8cae5714b9b7a8f1e0f
      f3a99e8a
    • S
      Non-initial file preloading should always prefetch index and filter (#4852) · 8641e9ad
      Siying Dong 提交于
      Summary:
      https://github.com/facebook/rocksdb/pull/3340 introduces preloading when max_open_files != -1.
      It doesn't preload index and filter in non-initial file loading case. This is a little bit too
      complicated to understand. We observed in one MyRocks use case where the filter is expected to be
      preloaded but is not. To simplify the use case, we simply always prefetch the index and filter.
      They anyway is expected to be loaded in the file verification phase anyway.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4852
      
      Differential Revision: D13595402
      
      Pulled By: siying
      
      fbshipit-source-id: d4d8624eb3e849e20aeb990df2100502d85aff31
      8641e9ad
    • M
      WritePrepared: improve IsInSnapshotEmptyMapTest (#4853) · cd227d74
      Maysam Yabandeh 提交于
      Summary:
      IsInSnapshotEmptyMapTest tests that IsInSnapshot returns correct value for existing data after a recovery, where max is not zero and yet commit cache is empty. The existing test was preliminary which is improved in this patch. It also increases the db sequence after recovery so that there the snapshot immediately taken after recovery would have a sequence number different than that of max_evicted_seq. This simplifies the logic in IsInSnapshot by not having to consider the special case that an old snapshot might be equal to max_evicted_seq and yet not present in old_commit_map.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4853
      
      Differential Revision: D13595223
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 77c12ca8a3f61a47479a93bef2038ff502dc3322
      cd227d74
  12. 08 1月, 2019 4 次提交