1. 20 3月, 2019 2 次提交
  2. 19 3月, 2019 1 次提交
    • S
      Feature for sampling and reporting compressibility (#4842) · b45b1cde
      Shobhit Dayal 提交于
      Summary:
      This is a feature to sample data-block compressibility and and report them as stats. 1 in N (tunable) blocks is sampled for compressibility using two algorithms:
      1. lz4 or snappy for fast compression
      2. zstd or zlib for slow but higher compression.
      
      The stats are reported to the caller as raw-bytes and compressed-bytes. The block continues to be compressed for storage using the specified CompressionType.
      
      The db_bench_tool how has a command line option for specifying the sampling rate. It's default value is 0 (no sampling). To test the overhead for a certain value, users can compare the performance of db_bench_tool, varying the sampling rate. It is unlikely to have a noticeable impact for high values like 20.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4842
      
      Differential Revision: D13629011
      
      Pulled By: shobhitdayal
      
      fbshipit-source-id: 14ca668bcab6499b2a1734edf848eb62a4f4fafa
      b45b1cde
  3. 16 3月, 2019 1 次提交
    • A
      Update bg_error when log flush fails in SwitchMemtable() (#5072) · b4fa51df
      anand76 提交于
      Summary:
      There is a potential failure case in DBImpl::SwitchMemtable() that is not handled properly. The call to cur_log_writer->WriteBuffer() can fail due to an IO error. In that case, we need to call SetBGError() in order set the background error since the WriteBuffer() failure may result in data loss.
      
      Also, the asserts for !new_mem and !new_log are incorrect, as those would have been allocated by the time this failure is detected.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5072
      
      Differential Revision: D14461384
      
      Pulled By: anand1976
      
      fbshipit-source-id: fb59bce9d61378f37d2dfcd28c0b704b0f43c3cf
      b4fa51df
  4. 02 3月, 2019 2 次提交
  5. 01 3月, 2019 2 次提交
    • M
      Call PreReleaseCallback between WAL and memtable write (#5015) · 77ebc82b
      Maysam Yabandeh 提交于
      Summary:
      PreReleaseCallback meant to be called before the writes are visible to the readers. Since the sequence number is known after the WAL write, there is no reason to delay calling PreReleaseCallback to after the memtable write, which would complicates the reader's logic in presence of our memtable writes that are made visible by the other write thread.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5015
      
      Differential Revision: D14221670
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: a504dd665cf923226d7af09cc8e9c7739a25edc6
      77ebc82b
    • S
      Add two more StatsLevel (#5027) · 5e298f86
      Siying Dong 提交于
      Summary:
      Statistics cost too much CPU for some use cases. Add two stats levels
      so that people can choose to skip two types of expensive stats, timers and
      histograms.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5027
      
      Differential Revision: D14252765
      
      Pulled By: siying
      
      fbshipit-source-id: 75ecec9eaa44c06118229df4f80c366115346592
      5e298f86
  6. 27 2月, 2019 1 次提交
    • M
      WritePrepared: optimize read path by avoiding virtual (#5018) · a661c0d2
      Maysam Yabandeh 提交于
      Summary:
      The read path includes a callback function, ReadCallback, which would eventually calls IsInSnapshot to figure if a particular seq is in the reading snapshot or not. This callback is virtual, which adds the cost of multiple virtual function call to each read. The first few checks in IsInSnapshot, however, are quite trivial and take care of majority of the cases. The patch moves those to a non-virtual function in the the parent class, ReadCallback, to lower the virtual callback cost.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5018
      
      Differential Revision: D14226562
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 6feed5b34f3b082e52092c5ef143e29b49c46b44
      a661c0d2
  7. 21 2月, 2019 1 次提交
    • Z
      add GetStatsHistory to retrieve stats snapshots (#4748) · c4f5d0aa
      Zhongyi Xie 提交于
      Summary:
      This PR adds public `GetStatsHistory` API to retrieve stats history in the form of an std map. The key of the map is the timestamp in microseconds when the stats snapshot is taken, the value is another std map from stats name to stats value (stored in std string). Two DBOptions are introduced: `stats_persist_period_sec` (default 10 minutes) controls the intervals between two snapshots are taken; `max_stats_history_count` (default 10) controls the max number of history snapshots to keep in memory. RocksDB will stop collecting stats snapshots if `stats_persist_period_sec` is set to 0.
      
      (This PR is the in-memory part of https://github.com/facebook/rocksdb/pull/4535)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4748
      
      Differential Revision: D13961471
      
      Pulled By: miasantreble
      
      fbshipit-source-id: ac836d401ecb84ea92216bf9966f969dedf4ad04
      c4f5d0aa
  8. 20 2月, 2019 3 次提交
  9. 16 2月, 2019 3 次提交
  10. 15 2月, 2019 2 次提交
    • M
      Apply modernize-use-override (2nd iteration) · ca89ac2b
      Michael Liu 提交于
      Summary:
      Use C++11’s override and remove virtual where applicable.
      Change are automatically generated.
      
      Reviewed By: Orvid
      
      Differential Revision: D14090024
      
      fbshipit-source-id: 1e9432e87d2657e1ff0028e15370a85d1739ba2a
      ca89ac2b
    • A
      Dictionary compression for files written by SstFileWriter (#4978) · c8c8104d
      Andrew Kryczka 提交于
      Summary:
      If `CompressionOptions::max_dict_bytes` and/or `CompressionOptions::zstd_max_train_bytes` are set, `SstFileWriter` will now generate files respecting those options.
      
      I refactored the logic a bit for deciding when to use dictionary compression. Previously we plumbed `is_bottommost_level` down to the table builder and used that. However it was kind of confusing in `SstFileWriter`'s context since we don't know what level the file will be ingested to. Instead, now the higher-level callers (e.g., flush, compaction, file writer) are responsible for building the right `CompressionOptions` to give the table builder.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4978
      
      Differential Revision: D14060763
      
      Pulled By: ajkr
      
      fbshipit-source-id: dc802c327896df2b319dc162d6acc82b9cdb452a
      c8c8104d
  11. 14 2月, 2019 1 次提交
  12. 13 2月, 2019 3 次提交
    • Y
      Atomic ingest (#4895) · a69d4dee
      Yanqin Jin 提交于
      Summary:
      Make file ingestion atomic.
      
       as title.
      Ingesting external SST files into multiple column families should be atomic. If
      a crash occurs and db reopens, either all column families have successfully
      ingested the files before the crash, or non of the ingestions have any effect
      on the state of the db.
      
      Also add unit tests for atomic ingestion.
      
      Note that the unit test here does not cover the case of incomplete atomic group
      in the MANIFEST, which is covered in VersionSetTest already.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4895
      
      Differential Revision: D13718245
      
      Pulled By: riversand963
      
      fbshipit-source-id: 7df97cc483af73ad44dd6993008f99b083852198
      a69d4dee
    • S
      Stats should be logged in INFO level (#4977) · 49ddd7ec
      Siying Dong 提交于
      Summary:
      Previously, stats were logged in warning level. This was done in that way because
      people reported that it wasn't logged in MyRocks. However, later we learned that it turns
      out to be due to a bug in MyRocks, which is fixed in
      https://github.com/facebook/mysql-5.6/commit/79bb705e74b239d7030b724ea6bbd635eceec531
      
      Now we revert the stats logging to INFO level, so that it doesn't pollute the warning
      level logging.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4977
      
      Differential Revision: D14058485
      
      Pulled By: siying
      
      fbshipit-source-id: 19fab323c19d9bc88184287f209551f9a77ca0e6
      49ddd7ec
    • Y
      Avoid fsync on the same directory in atomic flush (#4817) · c5a64cff
      Yanqin Jin 提交于
      Summary:
      In `DBImpl::AtomicFlushMemTablesToOutputFiles`, we need to call fsync only once
      on the same data directory. If two column families share a common directory for
      their data, we call fsync only once.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4817
      
      Differential Revision: D13543689
      
      Pulled By: riversand963
      
      fbshipit-source-id: 4701d77c96a47802fbf6cb9f3337ee65d46b95f5
      c5a64cff
  13. 12 2月, 2019 4 次提交
    • A
      Reduce scope of compression dictionary to single SST (#4952) · 62f70f6d
      Andrew Kryczka 提交于
      Summary:
      Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio.
      
      So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include:
      
      - The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called.
      - After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up.
      - Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952
      
      Differential Revision: D13967980
      
      Pulled By: ajkr
      
      fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f
      62f70f6d
    • M
      WritePrepared: relax assert in compaction iterator (#4969) · 576d2d6c
      Maysam Yabandeh 提交于
      Summary:
      If IsInSnapshot(seq2, snapshot) determines that the snapshot is released, the future queries IsInSnapshot(seq1, snapshot) could still return a definitive answer of true if for example seq1 is too old that is determined visible in all snapshots. This violates a recently added assert statement to compaction iterator. The patch relaxes the assert.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4969
      
      Differential Revision: D14030998
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 6db53db0e37d0a20e8997ef2c1004b8627614ab9
      576d2d6c
    • Y
      Checksum properties block for block-based table (#4956) · 2d049ab7
      Yanqin Jin 提交于
      Summary:
      Always enable properties block checksum verification for block-based table. For external SST file ingested with 'write_global_seqno==true', we use 'DecodeEntrySlow' to parse its blocks' contents so that the process will not die upon failing the assertion possibly caused by corruption.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4956
      
      Differential Revision: D14012741
      
      Pulled By: riversand963
      
      fbshipit-source-id: 8b766e6f54b36f8f9e074c0e19e0926ec3cce186
      2d049ab7
    • S
      Add a unit test to Ignorable manfiest record (#4964) · 5d9a623e
      Siying Dong 提交于
      Summary:
      https://github.com/facebook/rocksdb/pull/4960 introduced ignorable manfiest
      record. Adding a test to it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4964
      
      Differential Revision: D14012667
      
      Pulled By: siying
      
      fbshipit-source-id: e5f10ecc68dec2716e178d44f0fe2b76c3d857ef
      5d9a623e
  14. 09 2月, 2019 2 次提交
  15. 08 2月, 2019 2 次提交
    • S
      Deprecate CompactionFilter::IgnoreSnapshots() = false (#4954) · f48758e9
      Siying Dong 提交于
      Summary:
      We found that the behavior of CompactionFilter::IgnoreSnapshots() = false isn't
      what we have expected. We thought that snapshot will always be preserved.
      However, we just realized that, if no snapshot is created while compaction
      starts, and a snapshot is created after that, the data seen from the snapshot
      can successfully be dropped by the compaction. This creates a strange behavior
      to the feature, which is hard to explain. Like what is documented in code
      comment, this feature is not very useful with snapshot anyway. The decision
      is to deprecate the feature.
      
      We keep the function to avoid to break users code. However, we will fail
      compactions if false is returned.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4954
      
      Differential Revision: D13981900
      
      Pulled By: siying
      
      fbshipit-source-id: 2db8c2c3865acd86a28dca625945d1481b1d1e36
      f48758e9
    • S
      Remove cuckoo hash memtable (#4953) · cf3a6717
      Siying Dong 提交于
      Summary:
      Cuckoo Hash is less useful than we initially expected. Remove it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4953
      
      Differential Revision: D13979264
      
      Pulled By: siying
      
      fbshipit-source-id: 2a60afdaa989f045357398b43a1cc5d46f4492ed
      cf3a6717
  16. 06 2月, 2019 5 次提交
    • Z
      exclude test CompactFilesShouldTriggerAutoCompaction from ROCKSDB_LITE (#4950) · 71cae59a
      Zhongyi Xie 提交于
      Summary:
      This will fix the following build error:
      
      > db/db_test.cc: In member function ‘virtual void rocksdb::DBTest_CompactFilesShouldTriggerAutoCompaction_Test::TestBody()’:
      > db/db_test.cc:5462:8: error: ‘class rocksdb::DB’ has no member named ‘GetColumnFamilyMetaData’
      >    db_->GetColumnFamilyMetaData(db_->DefaultColumnFamily(), &cf_meta_data);
      > db/db_test.cc:5490:8: error: ‘class rocksdb::DB’ has no member named ‘GetColumnFamilyMetaData’
      >    db_->GetColumnFamilyMetaData(db_->DefaultColumnFamily(), &cf_meta_data);
      > db/db_test.cc:5499:8: error: ‘class rocksdb::DB’ has no member named ‘GetColumnFamilyMetaData’
      >    db_->GetColumnFamilyMetaData(db_->DefaultColumnFamily(), &cf_meta_data);
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4950
      
      Differential Revision: D13965378
      
      Pulled By: miasantreble
      
      fbshipit-source-id: a975435476fe555b1cd9d5da263ee3da3acdea56
      71cae59a
    • Z
      Allow copy for PerfContext objects (#4919) · 00ed41da
      Zhongyi Xie 提交于
      Summary:
      Existing implementation of PerfContext does not define copy constructor or assignment operator, which could potentially cause problems when user create copies and resets the builtin one. This PR address the issue by providing these two constructors with deep copy semantics.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4919
      
      Differential Revision: D13960406
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 36aab5aaee65d4480f537e4e22148faa45e8e334
      00ed41da
    • J
      Fix potential DB hang while using CompactFiles (#4940) · c9a52cbd
      Jay Zhuang 提交于
      Summary:
      CompactFiles() may block auto compaction which could cuase DB hang when it
      reachs level0_stop_writes_trigger.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4940
      
      Differential Revision: D13929648
      
      Pulled By: cooldoger
      
      fbshipit-source-id: 10842df38df3bebf862cd1a120a88ce961fdd381
      c9a52cbd
    • S
      BYTES_READ stats miscount for NotFound cases (#4938) · 8fe07332
      Siying Dong 提交于
      Summary:
      In NotFound cases, stats BYTES_READ and perf_context.get_read_bytes is still be increased. The amount increased will be
      whatever size of the string or PinnableSlice that users passed in as the output data structure. This is wrong. Fix this by not
      increasing these two counters.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4938
      
      Differential Revision: D13908963
      
      Pulled By: siying
      
      fbshipit-source-id: 60bce42e4fbb9862bba3da36dbc27b2963ea6162
      8fe07332
    • Y
      Properly set upper bound of subcompaction output (#4879) (#4898) · 31221bb7
      yangzhijia 提交于
      Summary:
      Fix the ouput overlap bug when using subcompactions, the upper bound of output
      file was extended incorrectly.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4898
      
      Differential Revision: D13736107
      
      Pulled By: ajkr
      
      fbshipit-source-id: 21dca09f81d5f07bf2766bf566f9b50dcab7d8e3
      31221bb7
  17. 02 2月, 2019 2 次提交
  18. 01 2月, 2019 2 次提交
    • Y
      Use correct FileMeta for atomic flush result install (#4932) · 842cdc11
      Yanqin Jin 提交于
      Summary:
      1. this commit fixes our handling of a combination of two separate edge
      cases. If a flush job does not pick any memtable to flush (because another
      flush job has already picked the same memtables), and the column family
      assigned to the flush job is dropped right before RocksDB calls
      rocksdb::InstallMemtableAtomicFlushResults, our original code passes
      a FileMetaData object whose file number is 0, failing the assertion in
      rocksdb::InstallMemtableAtomicFlushResults (assert(m->GetFileNumber() > 0)).
      2. Also piggyback a small change: since we already create a local copy of column family's mutable CF options to eliminate potential race condition with `SetOptions` call, we might as well use the local copy in other function calls in the same scope.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4932
      
      Differential Revision: D13901322
      
      Pulled By: riversand963
      
      fbshipit-source-id: b936580af7c127ea0c6c19ea10cd5fcede9fb0f9
      842cdc11
    • M
      Take snapshots once for all cf flushes (#4934) · 35e5689e
      Maysam Yabandeh 提交于
      Summary:
      FlushMemTablesToOutputFiles calls FlushMemTableToOutputFile for each column family. The patch moves the take-snapshot logic to outside FlushMemTableToOutputFile so that it does it once for all the flushes. This also addresses a deadlock issue for resetting the managed snapshot of job_snapshot in the 2nd call to FlushMemTableToOutputFile.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4934
      
      Differential Revision: D13900747
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f3cd650c5fff24cf95c1aaf8a10c149d42bf042c
      35e5689e
  19. 30 1月, 2019 1 次提交