1. 13 2月, 2019 1 次提交
  2. 12 2月, 2019 8 次提交
    • A
      Reduce scope of compression dictionary to single SST (#4952) · 62f70f6d
      Andrew Kryczka 提交于
      Summary:
      Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio.
      
      So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include:
      
      - The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called.
      - After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up.
      - Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952
      
      Differential Revision: D13967980
      
      Pulled By: ajkr
      
      fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f
      62f70f6d
    • P
      Increment NUMBER_BLOCK_NOT_COMPRESSED when !GoodCompressionRatio (#4929) · 79496d71
      Peter (Stig) Edwards 提交于
      Summary:
      See https://github.com/facebook/rocksdb/issues/4884
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4929
      
      Differential Revision: D14028333
      
      Pulled By: sagar0
      
      fbshipit-source-id: eed12bceae85385a34aaa6dd303bf0f53c4c7b06
      79496d71
    • M
      Enhance transaction_test_util with delays (#4970) · d6b9b3b8
      Maysam Yabandeh 提交于
      Summary:
      Enhance ::Insert and ::Verify test functions to add artificial delay between prepare and commit, and take snapshot and reads respectively.  A future PR will make use of these to improve stress tests to test against long-running transactions as well as long-running backup jobs. Also randomly sets set_snapshot to false for inserters to skip setting the snapshot in the initialization phase and let the snapshot be taken later explicitly.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4970
      
      Differential Revision: D14031342
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: b52b453751f0b25b81b23c48892bc1d152464cab
      d6b9b3b8
    • M
      WritePrepared: relax assert in compaction iterator (#4969) · 576d2d6c
      Maysam Yabandeh 提交于
      Summary:
      If IsInSnapshot(seq2, snapshot) determines that the snapshot is released, the future queries IsInSnapshot(seq1, snapshot) could still return a definitive answer of true if for example seq1 is too old that is determined visible in all snapshots. This violates a recently added assert statement to compaction iterator. The patch relaxes the assert.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4969
      
      Differential Revision: D14030998
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 6db53db0e37d0a20e8997ef2c1004b8627614ab9
      576d2d6c
    • A
      Fix `compression_zstd_max_train_bytes` coverage in stress test (#4957) · 1218704b
      Andrew Kryczka 提交于
      Summary:
      Previously `finalize_and_sanitize` function was always zeroing out `compression_zstd_max_train_bytes`. It was only supposed to do that when non-ZSTD compression was used. But since `--compression_type` was an unknown argument (i.e., one that `db_crashtest.py` does not recognize and blindly forwards to `db_stress`), `finalize_and_sanitize` could not tell whether ZSTD was used. This PR fixes it simply by making `--compression_type` a known argument with snappy as default (same as `db_stress`).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4957
      
      Differential Revision: D13994302
      
      Pulled By: ajkr
      
      fbshipit-source-id: 1b0baea7331397822830970d3698642eb7a7df65
      1218704b
    • M
      WritePrepared: add private options to TransactionDBOptions (#4966) · 9144d1f1
      Maysam Yabandeh 提交于
      Summary:
      WritePreparedTransactionDB operates with more options which should not be configurable to avoid complicating it for the users. For testing purposes however we need to change the default value of this parameters. This patch makes these parameters private fields in TransactionDBOptions so that the existing ::Open API could use them seamlessly without however exposing them to the users.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4966
      
      Differential Revision: D14015986
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 13037efa7dfdd6f73ec7a19414b66571e044c633
      9144d1f1
    • Y
      Checksum properties block for block-based table (#4956) · 2d049ab7
      Yanqin Jin 提交于
      Summary:
      Always enable properties block checksum verification for block-based table. For external SST file ingested with 'write_global_seqno==true', we use 'DecodeEntrySlow' to parse its blocks' contents so that the process will not die upon failing the assertion possibly caused by corruption.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4956
      
      Differential Revision: D14012741
      
      Pulled By: riversand963
      
      fbshipit-source-id: 8b766e6f54b36f8f9e074c0e19e0926ec3cce186
      2d049ab7
    • S
      Add a unit test to Ignorable manfiest record (#4964) · 5d9a623e
      Siying Dong 提交于
      Summary:
      https://github.com/facebook/rocksdb/pull/4960 introduced ignorable manfiest
      record. Adding a test to it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4964
      
      Differential Revision: D14012667
      
      Pulled By: siying
      
      fbshipit-source-id: e5f10ecc68dec2716e178d44f0fe2b76c3d857ef
      5d9a623e
  3. 09 2月, 2019 4 次提交
    • T
      Implement trace sampling (#4963) · 08809f5e
      tang-jianfeng 提交于
      Summary:
      Implement trace sampling to allow user to specify the sampling frequency, i.e. save one per how many requests, so that a user does not need to log all if he/she is interested in only a sampled set.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4963
      
      Differential Revision: D14011190
      
      Pulled By: tang-jianfeng
      
      fbshipit-source-id: 078b631d9319b67cb089dd2c30e21d0df8dc406a
      08809f5e
    • M
      WritePrepared: fix ValidateSnapshot with long-running txn (#4961) · 10d14693
      Maysam Yabandeh 提交于
      Summary:
      ValidateSnapshot checks if another txn has committed a value to about-to-be-locked key since a particular snapshot. It applies an optimization of looking into only the memtable if snapshot seq is larger than the earliest seq in the memtables. With a long-running txn in WritePrepared, the prepared value might be flushed out to the disk and yet it commits after the snapshot, which breaks this optimization. The patch fixes that by disabling this optimization when the min_uncomitted seq at the time the snapshot was taken is lower than earliest seq in the memtables.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4961
      
      Differential Revision: D14009947
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 1d11679950326f7c4094b433e6b821b729f08850
      10d14693
    • M
      Reset size_ to 0 in PinnableSlice::Reset (#4962) · 39fb88f1
      Maysam Yabandeh 提交于
      Summary:
      It would avoid bugs if the reused PinnableSlice is not actually reassigned and yet the programmer makes conclusions based on the size of the Slice.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4962
      
      Differential Revision: D14012710
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 23f4e173386b5461fd5650f44cde470805f4e816
      39fb88f1
    • S
      Add a placeholder in manifest indicating ignorable record (#4960) · 1a761e6a
      Siying Dong 提交于
      Summary:
      We want to reserve some right that some extra information added manifest
      in the future can be forward compatible by previous versions. Now we create a
      place holder for that. A bit in tag is added to indicate that a field can be
      safely ignored.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4960
      
      Differential Revision: D14000484
      
      Pulled By: siying
      
      fbshipit-source-id: cbf5bad3f9d5ec798f789806f244d1c20d3b66d6
      1a761e6a
  4. 08 2月, 2019 2 次提交
    • S
      Deprecate CompactionFilter::IgnoreSnapshots() = false (#4954) · f48758e9
      Siying Dong 提交于
      Summary:
      We found that the behavior of CompactionFilter::IgnoreSnapshots() = false isn't
      what we have expected. We thought that snapshot will always be preserved.
      However, we just realized that, if no snapshot is created while compaction
      starts, and a snapshot is created after that, the data seen from the snapshot
      can successfully be dropped by the compaction. This creates a strange behavior
      to the feature, which is hard to explain. Like what is documented in code
      comment, this feature is not very useful with snapshot anyway. The decision
      is to deprecate the feature.
      
      We keep the function to avoid to break users code. However, we will fail
      compactions if false is returned.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4954
      
      Differential Revision: D13981900
      
      Pulled By: siying
      
      fbshipit-source-id: 2db8c2c3865acd86a28dca625945d1481b1d1e36
      f48758e9
    • S
      Remove cuckoo hash memtable (#4953) · cf3a6717
      Siying Dong 提交于
      Summary:
      Cuckoo Hash is less useful than we initially expected. Remove it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4953
      
      Differential Revision: D13979264
      
      Pulled By: siying
      
      fbshipit-source-id: 2a60afdaa989f045357398b43a1cc5d46f4492ed
      cf3a6717
  5. 07 2月, 2019 1 次提交
    • M
      WritePrepared: non-atomic commit of delayed prepared (#4947) · 199fabc1
      Maysam Yabandeh 提交于
      Summary:
      Commit of delayed prepared has two non-atomic steps: add to commit cache, remove from delayed_prepared_. Similarly in ::IsInSnapshot we read from commit cache first and then look into delayed_prepared_. Due to non-atomicity thus the reader might not find the
      prep_seq that is just committed neither in commit cache nor in delayed_prepared_. To fix that i)
      we check if there was any delayed prepared BEFORE looking into commit
      cache, ii) if there was, we complete the search steps to be these: i)
      commit cache, ii) delayed prepared, commit cache again. In this way if
      the first query to commit cache missed the commit, the 2nd will catch it. The cost of the redundant read from commit cache is paid only if delayed_prepared_ is nonempty which should be a very rare scenario.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4947
      
      Differential Revision: D13952754
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 8f47826b13f8ce154398d842028342423f4ca2b2
      199fabc1
  6. 06 2月, 2019 6 次提交
  7. 05 2月, 2019 1 次提交
    • M
      WritePrepared: release snapshot equal to max (#4944) · dcb73e77
      Maysam Yabandeh 提交于
      Summary:
      WritePrepared maintains a list of snapshots that are <= max_evicted_seq_. Based on this list, old_commit_map_ is updated if an evicted commit entry overlaps with such snapshot. Such lists are garbage collected when the release of snapshot is reported to WritePreparedTxnDB, which is the next time max_evicted_seq_ is updated and yet the snapshot is not found is the list returned from DB. This logic was broken since ReleaseSnapshotInternal was using "< max_evicted_seq_" to cleanup old_commit_map_, which would leave a snapshot uncleaned if it "= max_evicted_seq_". The patch fixes that and adds a unit test to check for the bug.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4944
      
      Differential Revision: D13945000
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 0c904294f735911f52348a148bf1f945282fc17c
      dcb73e77
  8. 02 2月, 2019 2 次提交
  9. 01 2月, 2019 4 次提交
    • Y
      fix for nvme device path (#4866) · 4091597c
      Young Tack Jin 提交于
      Summary:
      nvme device path doesn't have "block" as like "nvme/nvme0/nvme0n1"
      or "nvme/nvme0/nvme0n1/nvme0n1p1". the last directory such as
      "nvme0n1p1" should be removed if nvme drive is partitioned.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4866
      
      Differential Revision: D13627824
      
      Pulled By: riversand963
      
      fbshipit-source-id: 09ab968f349f3dbb890beea20193f1359b17d317
      4091597c
    • Y
      Use correct FileMeta for atomic flush result install (#4932) · 842cdc11
      Yanqin Jin 提交于
      Summary:
      1. this commit fixes our handling of a combination of two separate edge
      cases. If a flush job does not pick any memtable to flush (because another
      flush job has already picked the same memtables), and the column family
      assigned to the flush job is dropped right before RocksDB calls
      rocksdb::InstallMemtableAtomicFlushResults, our original code passes
      a FileMetaData object whose file number is 0, failing the assertion in
      rocksdb::InstallMemtableAtomicFlushResults (assert(m->GetFileNumber() > 0)).
      2. Also piggyback a small change: since we already create a local copy of column family's mutable CF options to eliminate potential race condition with `SetOptions` call, we might as well use the local copy in other function calls in the same scope.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4932
      
      Differential Revision: D13901322
      
      Pulled By: riversand963
      
      fbshipit-source-id: b936580af7c127ea0c6c19ea10cd5fcede9fb0f9
      842cdc11
    • A
      Fix `WriteBatchBase::DeleteRange` API comment (#4935) · 0ea57115
      Andrew Kryczka 提交于
      Summary:
      The `DeleteRange` end key is exclusive, not inclusive. Updated API comment accordingly.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4935
      
      Differential Revision: D13905406
      
      Pulled By: ajkr
      
      fbshipit-source-id: f577db841a279427991ecf9005cd56b30c8eb3c7
      0ea57115
    • M
      Take snapshots once for all cf flushes (#4934) · 35e5689e
      Maysam Yabandeh 提交于
      Summary:
      FlushMemTablesToOutputFiles calls FlushMemTableToOutputFile for each column family. The patch moves the take-snapshot logic to outside FlushMemTableToOutputFile so that it does it once for all the flushes. This also addresses a deadlock issue for resetting the managed snapshot of job_snapshot in the 2nd call to FlushMemTableToOutputFile.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4934
      
      Differential Revision: D13900747
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f3cd650c5fff24cf95c1aaf8a10c149d42bf042c
      35e5689e
  10. 30 1月, 2019 3 次提交
  11. 29 1月, 2019 5 次提交
  12. 26 1月, 2019 2 次提交
  13. 25 1月, 2019 1 次提交