1. 01 2月, 2020 2 次提交
    • S
      Fix DBTest2.ChangePrefixExtractor LITE build (#6356) · 800d24dd
      sdong 提交于
      Summary:
      DBTest2.ChangePrefixExtractor fails in LITE build because LITE build doesn't support adaptive build. Fix it by removing the stats check but only check correctness.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6356
      
      Test Plan: Run the test with both of LITE and non-LITE build.
      
      Differential Revision: D19669537
      
      fbshipit-source-id: 6d7dd6c8a79f18e80ca1636864b9c71922030d8e
      800d24dd
    • S
      Add a unit test for prefix extractor changes (#6323) · ec496347
      sdong 提交于
      Summary:
      Add a unit test for prefix extractor change, including a check that fails due to a bug.
      Also comment out the partitioned filter case which will fail the test too.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6323
      
      Test Plan: Run the test and it passes (and fails if the SeekForPrev() part is uncommented)
      
      Differential Revision: D19509744
      
      fbshipit-source-id: 678202ca97b5503e9de73b54b90de9e5ba822b72
      ec496347
  2. 30 1月, 2020 1 次提交
  3. 29 1月, 2020 1 次提交
    • S
      Add ReadOptions.auto_prefix_mode (#6314) · 8f2bee67
      sdong 提交于
      Summary:
      Add a new option ReadOptions.auto_prefix_mode. When set to true, iterator should return the same result as total order seek, but may choose to do prefix seek internally, based on iterator upper bounds. Also fix two previous bugs when handling prefix extrator changes: (1) reverse iterator should not rely on upper bound to determine prefix. Fix it with skipping prefix check. (2) block-based filter is not handled properly.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6314
      
      Test Plan: (1) add a unit test; (2) add the check to stress test and run see whether it can pass at least one run.
      
      Differential Revision: D19458717
      
      fbshipit-source-id: 51c1bcc5cdd826c2469af201979a39600e779bce
      8f2bee67
  4. 17 1月, 2020 2 次提交
    • S
      Fix another bug caused by recent hash index fix (#6305) · d87cffae
      sdong 提交于
      Summary:
      Recent bug fix related to hash index introduced a new bug: hash index can return NotFound but it is not handled by BlockBasedTable::Get(). The end result is that Get() stops being executed too early. Fix it by ignoring NotFound code in Get().
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6305
      
      Test Plan: A problematic DB used to return NotFound incorrectly, and now able to return correct result. Will try to construct a unit test too.0
      
      Differential Revision: D19438925
      
      fbshipit-source-id: e751afa8c13728d56511cfeb1bc811ecb99f3217
      d87cffae
    • S
      Fix a bug caused by recent fix of Prefix Hash (#6302) · f8b5ef85
      sdong 提交于
      Summary:
      Recent fix to Prefix Hash https://github.com/facebook/rocksdb/pull/6292 caused a bug that the newly created NotFound status in hash index is never reset. This causes reseek or implict reseek to return wrong results sometimes.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6302
      
      Test Plan:
      Add a unit test that would fail. Not fix.
      crash test with hash test would fail in several seconds. With the fix, it will run about several minutes before failing with another failure.
      
      Differential Revision: D19424572
      
      fbshipit-source-id: c5276f36a95fd0e2837e30190476d2fe21ed8566
      f8b5ef85
  5. 16 1月, 2020 1 次提交
    • S
      Fix kHashSearch bug with SeekForPrev (#6297) · d2b4d42d
      sdong 提交于
      Summary:
      When prefix is enabled the expected behavior when the prefix of the target does not exist is for Seek is to seek to any key larger than target and SeekToPrev to any key less than the target.
      Currently. the prefix index (kHashSearch) returns OK status but sets Invalid() to indicate two cases: a prefix of the searched key does not exist, ii) the key is beyond the range of the keys in SST file. The SeekForPrev implementation in BlockBasedTable thus does not have enough information to know when it should set the index key to first (to return a key smaller than target). The patch fixes that by returning NotFound status for cases that the prefix does not exist. SeekForPrev in BlockBasedTable accordingly SeekToFirst instead of SeekToLast on the index iterator.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6297
      
      Test Plan: SeekForPrev of non-exsiting prefix is added to block_test.cc, and a test case is added in db_test2, which fails without the fix.
      
      Differential Revision: D19404695
      
      fbshipit-source-id: cafbbf95f8f60ff9ede9ccc99d25bfa1cf6fcdc3
      d2b4d42d
  6. 15 1月, 2020 1 次提交
  7. 14 1月, 2020 1 次提交
    • S
      Bug when multiple files at one level contains the same smallest key (#6285) · 894c6d21
      sdong 提交于
      Summary:
      The fractional cascading index is not correctly generated when two files at the same level contains the same smallest or largest user key.
      The result would be that it would hit an assertion in debug mode and lower level files might be skipped.
      This might cause wrong results when the same user keys are of merge operands and Get() is called using the exact user key. In that case, the lower files would need to further checked.
      The fix is to fix the fractional cascading index.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6285
      
      Test Plan: Add a unit test which would cause the assertion which would be fixed.
      
      Differential Revision: D19358426
      
      fbshipit-source-id: 39b2b1558075fd95e99491d462a67f9f2298c48e
      894c6d21
  8. 11 1月, 2020 1 次提交
  9. 08 1月, 2020 1 次提交
    • Y
      Fix test in LITE mode (#6267) · a8b1085a
      Yanqin Jin 提交于
      Summary:
      Currently, the recently-added test DBTest2.SwitchMemtableRaceWithNewManifest
      fails in LITE mode since SetOptions() returns "Not supported". I do not want to
      put `#ifndef ROCKSDB_LITE` because it reduces test coverage. Instead, just
      trigger compaction on a different column family. The bg compaction thread
      calling LogAndApply() may race with thread calling SwitchMemtable().
      
      Test Plan (dev server):
      make check
      OPT=-DROCKSDB_LITE make check
      
      or run DBTest2.SwitchMemtableRaceWithNewManifest 100 times.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6267
      
      Differential Revision: D19301309
      
      Pulled By: riversand963
      
      fbshipit-source-id: 88cedcca2f985968ed3bb234d324ffa2aa04ca50
      a8b1085a
  10. 07 1月, 2020 1 次提交
    • Y
      Fix a data race for cfd->log_number_ (#6249) · 1aaa1458
      Yanqin Jin 提交于
      Summary:
      A thread calling LogAndApply may release db mutex when calling
      WriteCurrentStateToManifest() which reads cfd->log_number_. Another thread can
      call SwitchMemtable() and writes to cfd->log_number_.
      Solution is to cache the cfd->log_number_ before releasing mutex in
      LogAndApply.
      
      Test Plan (on devserver):
      ```
      $COMPILE_WITH_TSAN=1 make db_stress
      $./db_stress --acquire_snapshot_one_in=10000 --avoid_unnecessary_blocking_io=1 --block_size=16384 --bloom_bits=16 --bottommost_compression_type=zstd --cache_index_and_filter_blocks=1 --cache_size=1048576 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_ttl=0 --compression_max_dict_bytes=16384 --compression_type=zstd --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --db=/dev/shm/rocksdb/rocksdb_crashtest_blackbox --db_write_buffer_size=1048576 --delpercent=5 --delrangepercent=0 --destroy_db_initially=0 --enable_pipelined_write=0  --flush_one_in=1000000 --format_version=5 --get_live_files_and_wal_files_one_in=1000000 --index_block_restart_interval=5 --index_type=0 --log2_keys_per_lock=22 --long_running_snapshots=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=1000000 --max_manifest_file_size=16384 --max_write_batch_group_size_bytes=16 --max_write_buffer_number=3 --memtablerep=skip_list --mmap_read=0 --nooverwritepercent=1 --open_files=500000 --ops_per_thread=100000000 --partition_filters=0 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefixpercent=5 --progress_reports=0 --readpercent=45 --recycle_log_file_num=0 --reopen=20 --set_options_one_in=10000 --snapshot_hold_ops=100000 --subcompactions=2 --sync=1 --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_merge=0 --use_multiget=1 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --write_buffer_size=4194304 --write_dbid_to_manifest=1 --writepercent=35
      ```
      Then repeat the following multiple times, e.g. 100 after compiling with tsan.
      ```
      $./db_test2 --gtest_filter=DBTest2.SwitchMemtableRaceWithNewManifest
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6249
      
      Differential Revision: D19235077
      
      Pulled By: riversand963
      
      fbshipit-source-id: 79467b52f48739ce7c27e440caa2447a40653173
      1aaa1458
  11. 03 1月, 2020 1 次提交
    • M
      Prevent an incompatible combination of options (#6254) · 48a678b7
      Maysam Yabandeh 提交于
      Summary:
      allow_concurrent_memtable_write is incompatible with non-zero max_successive_merges. Although we check this at runtime, we currently don't prevent the user from setting this combination in options. This has led to stress tests to fail with this combination is tried in ::SetOptions. The patch fixes that.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6254
      
      Differential Revision: D19265819
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 47f2e2dc26fe0972c7152f4da15dadb9703f1179
      48a678b7
  12. 18 12月, 2019 1 次提交
    • delete superversions in BackgroundCallPurge (#6146) · 39fcaf82
      解轶伦 提交于
      Summary:
      I found that CleanupSuperVersion() may block Get() for 30ms+ (per MemTable is 256MB).
      
      Then I found "delete sv" in ~SuperVersion() takes the time.
      
      The backtrace looks like this
      
      DBImpl::GetImpl() -> DBImpl::ReturnAndCleanupSuperVersion() ->
      DBImpl::CleanupSuperVersion() : delete sv; -> ~SuperVersion()
      
      I think it's better to delete in a background thread,  please review it。
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6146
      
      Differential Revision: D18972066
      
      fbshipit-source-id: 0f7b0b70b9bb1e27ad6fc1c8a408fbbf237ae08c
      39fcaf82
  13. 14 11月, 2019 1 次提交
    • S
      Fix a regression bug on total order seek with prefix enabled and range delete (#6028) · bb23bfe6
      sdong 提交于
      Summary:
      Recent change https://github.com/facebook/rocksdb/pull/5861 mistakely use "prefix_extractor_ != nullptr" as the condition to determine whehter prefix bloom filter isused. It fails to consider read_options.total_order_seek, so it is wrong. The result is that an optimization for non-total-order seek is mistakely applied to total order seek, and introduces a bug in following corner case:
      Because of RangeDelete(), a file's largest key is extended. Seek key falls into the range deleted file, so level iterator seeks into the previous file without getting any key. The correct behavior is to place the iterator to the first key of the next file. However, an optimization is triggered and invalidates the iterator because it is out of the prefix range, causing wrong results. This behavior is reproduced in the unit test added.
      Fix the bug by setting prefix_extractor to be null if total order seek is used.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6028
      
      Test Plan: Add a unit test which fails without the fix.
      
      Differential Revision: D18479063
      
      fbshipit-source-id: ac075f013029fcf69eb3a598f14c98cce3e810b3
      bb23bfe6
  14. 25 10月, 2019 1 次提交
    • Y
      Update column families' log number altogether after flushing during recovery (#5856) · 2309fd63
      Yanqin Jin 提交于
      Summary:
      A bug occasionally shows up in crash test, and https://github.com/facebook/rocksdb/issues/5851 reproduces it.
      The bug can surface in the following way.
      1. Database has multiple column families.
      2. Between one DB restart, the last log file is corrupted in the middle (not the tail)
      3. During restart, DB crashes between flushing between two column families.
      
      Then DB will fail to be opened again with error "SST file is ahead of WALs".
      Solution is to update the log number associated with each column family altogether after flushing all column families' memtables. The version edits should be written to a new MANIFEST. Only after writing to all these version edits succeed does RocksDB (atomically) points the CURRENT file to the new MANIFEST.
      
      Test plan (on devserver):
      ```
      $make all && make check
      ```
      Specifically
      ```
      $make db_test2
      $./db_test2 --gtest_filter=DBTest2.CrashInRecoveryMultipleCF
      ```
      Also checked for compatibility as follows.
      Use this branch, run DBTest2.CrashInRecoveryMultipleCF and preserve the db directory.
      Then checkout 5.4, build ldb, and dump the MANIFEST.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5856
      
      Differential Revision: D17620818
      
      Pulled By: riversand963
      
      fbshipit-source-id: b52ce5969c9a8052cacec2bd805fcfb373589039
      2309fd63
  15. 22 10月, 2019 1 次提交
    • S
      LevelIterator to avoid gap after prefix bloom filters out a file (#5861) · a0cd9200
      sdong 提交于
      Summary:
      Right now, when LevelIterator::Seek() is called, when a file is filtered out by prefix bloom filter, the position is put to the beginning of the next file. This is a confusing internal interface because many keys in the levels are skipped. Avoid this behavior by checking the key of the next file against the seek key, and invalidate the whole iterator if the prefix doesn't match.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5861
      
      Test Plan: Add a new unit test to validate the behavior; run all exsiting tests; run crash_test
      
      Differential Revision: D17918213
      
      fbshipit-source-id: f06b47d937c7cc8919001f18dcc3af5b28c9cdac
      a0cd9200
  16. 12 10月, 2019 1 次提交
    • A
      Fix block cache ID uniqueness for Windows builds (#5844) · b00761ee
      Andrew Kryczka 提交于
      Summary:
      Since we do not evict a file's blocks from block cache before that file
      is deleted, we require a file's cache ID prefix is both unique and
      non-reusable. However, the Windows functionality we were relying on only
      guaranteed uniqueness. That meant a newly created file could be assigned
      the same cache ID prefix as a deleted file. If the newly created file
      had block offsets matching the deleted file, full cache keys could be
      exactly the same, resulting in obsolete data blocks returned from cache
      when trying to read from the new file.
      
      We noticed this when running on FAT32 where compaction was writing out
      of order keys due to reading obsolete blocks from its input files. The
      functionality is documented as behaving the same on NTFS, although I
      wasn't able to repro it there.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5844
      
      Test Plan:
      we had a reliable repro of out-of-order keys on FAT32 that
      was fixed by this change
      
      Differential Revision: D17752442
      
      fbshipit-source-id: 95d983f9196cf415f269e19293b97341edbf7e00
      b00761ee
  17. 27 9月, 2019 1 次提交
    • S
      Add a unit test to reproduce a corruption bug (#5851) · 76e951db
      sdong 提交于
      Summary:
      This is a bug occaionally shows up in crash test, and this unit test is to reproduce it. The bug is following:
      1. Database has multiple CFs.
      2. Between one DB restart, the last log file is corrupted in the middle (not the tail)
      3. During restart, DB crashes between flushes between two CFs.
      The DB will fail to be opened again with error "SST file is ahead of WALs"
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5851
      
      Test Plan: Run the test itself.
      
      Differential Revision: D17614721
      
      fbshipit-source-id: 1b0abce49b203a76a039e38e76bc940429975f20
      76e951db
  18. 20 9月, 2019 1 次提交
  19. 18 9月, 2019 1 次提交
  20. 17 9月, 2019 2 次提交
    • A
      Allow users to stop manual compactions (#3971) · 62268300
      andrew 提交于
      Summary:
      Manual compaction may bring in very high load because sometime the amount of data involved in a compaction could be large, which may affect online service. So it would be good if the running compaction making the server busy can be stopped immediately. In this implementation, stopping manual compaction condition is only checked in slow process. We let deletion compaction and trivial move go through.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3971
      
      Test Plan: add tests at more spots.
      
      Differential Revision: D17369043
      
      fbshipit-source-id: 575a624fb992ce0bb07d9443eb209e547740043c
      62268300
    • M
      Charge block cache for cache internal usage (#5797) · 638d2395
      Maysam Yabandeh 提交于
      Summary:
      For our default block cache, each additional entry has extra memory overhead. It include LRUHandle (72 bytes currently) and the cache key (two varint64, file id and offset). The usage is not negligible. For example for block_size=4k, the overhead accounts for an extra 2% memory usage for the cache. The patch charging the cache for the extra usage, reducing untracked memory usage outside block cache. The feature is enabled by default and can be disabled by passing kDontChargeCacheMetadata to the cache constructor.
      This PR builds up on https://github.com/facebook/rocksdb/issues/4258
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5797
      
      Test Plan:
      - Existing tests are updated to either disable the feature when the test has too much dependency on the old way of accounting the usage or increasing the cache capacity to account for the additional charge of metadata.
      - The Usage tests in cache_test.cc are augmented to test the cache usage under kFullChargeCacheMetadata.
      
      Differential Revision: D17396833
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 7684ccb9f8a40ca595e4f5efcdb03623afea0c6f
      638d2395
  21. 10 8月, 2019 1 次提交
    • Y
      Support loading custom objects in unit tests (#5676) · 5d9a67e7
      Yanqin Jin 提交于
      Summary:
      Most existing RocksDB unit tests run on `Env::Default()`. It will be useful to port the unit tests to non-default environments, e.g. `HdfsEnv`, etc.
      This pull request is one step towards this goal. If RocksDB unit tests are built with a static library exposing a function `RegisterCustomObjects()`, then it is possible to implement custom object registrar logic in the library. RocksDB unit test can call `RegisterCustomObjects()` at the beginning.
      By default, `ROCKSDB_UNITTESTS_WITH_CUSTOM_OBJECTS_FROM_STATIC_LIBS` is not defined, thus this PR has no impact on existing RocksDB because `RegisterCustomObjects()` is a noop.
      Test plan (on devserver):
      ```
      $make clean && COMPILE_WITH_ASAN=1 make -j32 all
      $make check
      ```
      All unit tests must pass.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5676
      
      Differential Revision: D16679157
      
      Pulled By: riversand963
      
      fbshipit-source-id: aca571af3fd0525277cdc674248d0fe06e060f9d
      5d9a67e7
  22. 07 8月, 2019 1 次提交
    • V
      New API to get all merge operands for a Key (#5604) · d150e014
      Vijay Nadimpalli 提交于
      Summary:
      This is a new API added to db.h to allow for fetching all merge operands associated with a Key. The main motivation for this API is to support use cases where doing a full online merge is not necessary as it is performance sensitive. Example use-cases:
      1. Update subset of columns and read subset of columns -
      Imagine a SQL Table, a row is encoded as a K/V pair (as it is done in MyRocks). If there are many columns and users only updated one of them, we can use merge operator to reduce write amplification. While users only read one or two columns in the read query, this feature can avoid a full merging of the whole row, and save some CPU.
      2. Updating very few attributes in a value which is a JSON-like document -
      Updating one attribute can be done efficiently using merge operator, while reading back one attribute can be done more efficiently if we don't need to do a full merge.
      ----------------------------------------------------------------------------------------------------
      API :
      Status GetMergeOperands(
            const ReadOptions& options, ColumnFamilyHandle* column_family,
            const Slice& key, PinnableSlice* merge_operands,
            GetMergeOperandsOptions* get_merge_operands_options,
            int* number_of_operands)
      
      Example usage :
      int size = 100;
      int number_of_operands = 0;
      std::vector<PinnableSlice> values(size);
      GetMergeOperandsOptions merge_operands_info;
      db_->GetMergeOperands(ReadOptions(), db_->DefaultColumnFamily(), "k1", values.data(), merge_operands_info, &number_of_operands);
      
      Description :
      Returns all the merge operands corresponding to the key. If the number of merge operands in DB is greater than merge_operands_options.expected_max_number_of_operands no merge operands are returned and status is Incomplete. Merge operands returned are in the order of insertion.
      merge_operands-> Points to an array of at-least merge_operands_options.expected_max_number_of_operands and the caller is responsible for allocating it. If the status returned is Incomplete then number_of_operands will contain the total number of merge operands found in DB for key.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5604
      
      Test Plan:
      Added unit test and perf test in db_bench that can be run using the command:
      ./db_bench -benchmarks=getmergeoperands --merge_operator=sortlist
      
      Differential Revision: D16657366
      
      Pulled By: vjnadimpalli
      
      fbshipit-source-id: 0faadd752351745224ee12d4ae9ef3cb529951bf
      d150e014
  23. 23 7月, 2019 1 次提交
    • S
      row_cache to share entry for recent snapshots (#5600) · 66b5613d
      sdong 提交于
      Summary:
      Right now, users cannot take advantage of row cache, unless no snapshot is used, or Get() is repeated for the same snapshots. This limits the usage of row cache.
      This change eliminate this restriction in some cases. If the snapshot used is newer than the largest sequence number in the file, and write callback function is not registered, the same row cache key is used as no snapshot is given. We still need the callback function restriction for now because the callback function may filter out different keys for different snapshots even if the snapshots are new.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5600
      
      Test Plan: Add a unit test.
      
      Differential Revision: D16386616
      
      fbshipit-source-id: 6b7d214bd215d191b03ccf55926ad4b703ec2e53
      66b5613d
  24. 31 5月, 2019 1 次提交
    • S
      Fix flaky DBTest2.PresetCompressionDict test (#5378) · 1b59a490
      Sagar Vemuri 提交于
      Summary:
      Fix flaky DBTest2.PresetCompressionDict test.
      
      This PR fixes two issues with the test:
      1. Replaces `GetSstFiles` with `TotalSize`, which is based on `DB::GetColumnFamilyMetaData` so that only the size of the live SST files is taken into consideration when computing the total size of all sst files. Earlier, with `GetSstFiles`, even obsolete files were getting picked up.
      1. In ZSTD compression, it is sometimes possible that using a trained dictionary is not better than using an untrained one. Using a trained dictionary performs well in 99% of the cases, but still in the remaining ~1% of the cases (out of 10000 runs) using an untrained dictionary gets better compression results.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5378
      
      Differential Revision: D15559100
      
      Pulled By: sagar0
      
      fbshipit-source-id: c35adbf13871f520a2cec48f8bad9ff27ff7a0b4
      1b59a490
  25. 02 5月, 2019 1 次提交
  26. 17 4月, 2019 2 次提交
  27. 13 4月, 2019 1 次提交
    • S
      Still implement StatisticsImpl::measureTime() (#5181) · 85b2bde3
      Siying Dong 提交于
      Summary:
      Since Statistics::measureTime() is deprecated, StatisticsImpl::measureTime() is not implemented. We realized that users might have a wrapped Statistics implementation in which measureTime() is implemented as forwarded to StatisticsImpl, and causes assert failure. In order to make the change less intrusive, we implement StatisticsImpl::measureTime(). We will revisit whether we need to remove it after several releases.
      
      Also, add a test to make sure that a Statistics implementation using the old interface still works.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5181
      
      Differential Revision: D14907089
      
      Pulled By: siying
      
      fbshipit-source-id: 29b6202fd04e30ed6f6adcaeb1000e87f10d1e1a
      85b2bde3
  28. 12 4月, 2019 1 次提交
    • S
      Change OptimizeForPointLookup() and OptimizeForSmallDb() (#5165) · ed9f5e21
      Siying Dong 提交于
      Summary:
      Change the behavior of OptimizeForSmallDb() so that it is less likely to go out of memory.
      Change the behavior of OptimizeForPointLookup() to take advantage of the new memtable whole key filter, and move away from prefix extractor as well as hash-based indexing, as they are prone to misuse.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5165
      
      Differential Revision: D14880709
      
      Pulled By: siying
      
      fbshipit-source-id: 9af30e3c9e151eceea6d6b38701a58f1f9fb692d
      ed9f5e21
  29. 04 4月, 2019 1 次提交
  30. 03 4月, 2019 1 次提交
    • M
      WriteUnPrepared: less virtual in iterator callback (#5049) · 14b3f683
      Maysam Yabandeh 提交于
      Summary:
      WriteUnPrepared adds a virtual function, MaxUnpreparedSequenceNumber, to ReadCallback, which returns 0 unless WriteUnPrepared is enabled and the transaction has uncommitted data written to the DB. Together with snapshot sequence number, this determines the last sequence that is visible to reads.
      The patch clarifies the guarantees of the GetIterator API in WriteUnPrepared transactions and make use of that to statically initialize the read callback and thus avoid the virtual call.
      Furthermore it increases the minimum value for min_uncommitted from 0 to 1 as seq 0 is used only for last level keys that are committed in all snapshots.
      
      The following benchmark shows +0.26% higher throughput in seekrandom benchmark.
      
      Benchmark:
      ./db_bench --benchmarks=fillrandom --use_existing_db=0 --num=1000000 --db=/dev/shm/dbbench
      
      ./db_bench --benchmarks=seekrandom[X10] --use_existing_db=1 --db=/dev/shm/dbbench --num=1000000 --duration=60 --seek_nexts=100
      seekrandom [AVG    10 runs] : 20355 ops/sec;  225.2 MB/sec
      seekrandom [MEDIAN 10 runs] : 20425 ops/sec;  225.9 MB/sec
      
      ./db_bench_lessvirtual3 --benchmarks=seekrandom[X10] --use_existing_db=1 --db=/dev/shm/dbbench --num=1000000 --duration=60 --seek_nexts=100
      seekrandom [AVG    10 runs] : 20409 ops/sec;  225.8 MB/sec
      seekrandom [MEDIAN 10 runs] : 20487 ops/sec;  226.6 MB/sec
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5049
      
      Differential Revision: D14366459
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: ebaff8908332a5ae9af7defeadabcb624be660ef
      14b3f683
  31. 27 3月, 2019 1 次提交
  32. 20 3月, 2019 1 次提交
  33. 27 2月, 2019 1 次提交
    • M
      WritePrepared: optimize read path by avoiding virtual (#5018) · a661c0d2
      Maysam Yabandeh 提交于
      Summary:
      The read path includes a callback function, ReadCallback, which would eventually calls IsInSnapshot to figure if a particular seq is in the reading snapshot or not. This callback is virtual, which adds the cost of multiple virtual function call to each read. The first few checks in IsInSnapshot, however, are quite trivial and take care of majority of the cases. The patch moves those to a non-virtual function in the the parent class, ReadCallback, to lower the virtual callback cost.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5018
      
      Differential Revision: D14226562
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 6feed5b34f3b082e52092c5ef143e29b49c46b44
      a661c0d2
  34. 20 2月, 2019 1 次提交
  35. 15 2月, 2019 1 次提交
    • M
      Apply modernize-use-override (2nd iteration) · ca89ac2b
      Michael Liu 提交于
      Summary:
      Use C++11’s override and remove virtual where applicable.
      Change are automatically generated.
      
      Reviewed By: Orvid
      
      Differential Revision: D14090024
      
      fbshipit-source-id: 1e9432e87d2657e1ff0028e15370a85d1739ba2a
      ca89ac2b
  36. 12 2月, 2019 1 次提交
    • A
      Reduce scope of compression dictionary to single SST (#4952) · 62f70f6d
      Andrew Kryczka 提交于
      Summary:
      Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio.
      
      So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include:
      
      - The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called.
      - After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up.
      - Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952
      
      Differential Revision: D13967980
      
      Pulled By: ajkr
      
      fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f
      62f70f6d