1. 19 4月, 2022 2 次提交
  2. 17 4月, 2022 1 次提交
  3. 16 4月, 2022 7 次提交
    • S
      Add Aggregation Merge Operator (#9780) · 4f9c0fd0
      sdong 提交于
      Summary:
      Add a merge operator that allows users to register specific aggregation function so that they can does aggregation based per key using different aggregation types.
      See comments of function CreateAggMergeOperator() for actual usage.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9780
      
      Test Plan: Add a unit test to coverage various cases.
      
      Reviewed By: ltamasi
      
      Differential Revision: D35267444
      
      fbshipit-source-id: 5b02f31c4f3e17e96dd4025cdc49fca8c2868628
      4f9c0fd0
    • L
      Propagate errors from UpdateBoundaries (#9851) · db536ee0
      Levi Tamasi 提交于
      Summary:
      In `FileMetaData`, we keep track of the lowest-numbered blob file
      referenced by the SST file in question for the purposes of BlobDB's
      garbage collection in the `oldest_blob_file_number` field, which is
      updated in `UpdateBoundaries`. However, with the current code,
      `BlobIndex` decoding errors (or invalid blob file numbers) are swallowed
      in this method. The patch changes this by propagating these errors
      and failing the corresponding flush/compaction. (Note that since blob
      references are generated by the BlobDB code and also parsed by
      `CompactionIterator`, in reality this can only happen in the case of
      memory corruption.)
      
      This change necessitated updating some unit tests that involved
      fake/corrupt `BlobIndex` objects. Some of these just used a dummy string like
      `"blob_index"` as a placeholder; these were replaced with real `BlobIndex`es.
      Some were relying on the earlier behavior to simulate corruption; these
      were replaced with `SyncPoint`-based test code that corrupts a valid
      blob reference at read time.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9851
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D35683671
      
      Pulled By: ltamasi
      
      fbshipit-source-id: f7387af9945c48e4d5c4cd864f1ba425c7ad51f6
      db536ee0
    • Y
      Add a `fail_if_not_bottommost_level` to IngestExternalFileOptions (#9849) · be81609b
      Yanqin Jin 提交于
      Summary:
      This new options allows application to specify that files must be
      ingested to bottommost level, otherwise the ingestion will fail instead
      of silently ingesting to a non-bottommost level.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9849
      
      Test Plan: make check
      
      Reviewed By: ajkr
      
      Differential Revision: D35680307
      
      Pulled By: riversand963
      
      fbshipit-source-id: 01cf54ef6c76198f7654dc06b5544631dea1be1e
      be81609b
    • A
      Make initial auto readahead_size configurable (#9836) · 0c7f455f
      Akanksha Mahajan 提交于
      Summary:
      Make initial auto readahead_size configurable
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9836
      
      Test Plan:
      Added new unit test
      Ran regression:
      Without change:
      
      ```
      ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.0
      Date:       Thu Mar 17 13:11:34 2022
      CPU:        24 * Intel Core Processor (Broadwell)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      seekrandom   :  483618.390 micros/op 2 ops/sec;  338.9 MB/s (249 of 249 found)
      ```
      
      With this change:
      ```
       ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
      Set seed to 1649895440554504 because --seed was 0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.2
      Date:       Wed Apr 13 17:17:20 2022
      CPU:        24 * Intel Core Processor (Broadwell)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      ... finished 100 ops
      seekrandom   :  476892.488 micros/op 2 ops/sec;  344.6 MB/s (252 of 252 found)
      ```
      
      Reviewed By: anand1976
      
      Differential Revision: D35632815
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: c8057a88f9294c9d03b1d434b03affe02f74d796
      0c7f455f
    • S
      Upgrade development environment. (#9843) · d5dfa8c6
      sdong 提交于
      Summary:
      It's to support Meta's internal environment platform010. Gcc still doesn't work but USE_CLANG=1 should work.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9843
      
      Test Plan: Try to make and ROCKSDB_FBCODE_BUILD_WITH_PLATFORM010=1 USE_CLANG=1 make
      
      Reviewed By: pdillinger
      
      Differential Revision: D35652507
      
      fbshipit-source-id: a4a14b2fa4a2d6ca6fbf1b65060e81c39f079363
      d5dfa8c6
    • J
      Remove flaky servicelab metrics DBPut P95/P99 (#9844) · e91ec64c
      Jay Zhuang 提交于
      Summary:
      The P95 and P99 metrics are flaky, similar to DBGet ones which removed
      in https://github.com/facebook/rocksdb/issues/9742 .
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9844
      
      Test Plan: `$ ./buckifier/buckify_rocksdb.py`
      
      Reviewed By: ajkr
      
      Differential Revision: D35655531
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: c1409f0fba4e23d461a65f988c27ac5e2ae85d13
      e91ec64c
    • Y
      Add option --decode_blob_index to dump_live_files command (#9842) · 082eb042
      yuzhangyu 提交于
      Summary:
      This change only add decode blob index support to dump_live_files command, which is part of a task to add blob support to a few commands.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9842
      
      Reviewed By: ltamasi
      
      Differential Revision: D35650167
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: a78151b98bc38ac6f52c6e01ca6927a3429ddd14
      082eb042
  4. 15 4月, 2022 4 次提交
    • Y
      Add checks to GetUpdatesSince (#9459) · fe63899d
      Yanqin Jin 提交于
      Summary:
      Make `DB::GetUpdatesSince` return early if told to scan WALs generated by transactions
      with write-prepared or write-unprepared policies (`seq_per_batch` is true), as indicated by
      API comment.
      
      Also add checks to `TransactionLogIterator` to clarify some conditions.
      
      No API change.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9459
      
      Test Plan:
      make check
      
      Closing https://github.com/facebook/rocksdb/issues/1565
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D33821243
      
      Pulled By: riversand963
      
      fbshipit-source-id: c8b155d020ce0980e2d3b3b1da40b96e65b48d79
      fe63899d
    • Y
      CompactionIterator sees consistent view of which keys are committed (#9830) · 0bd4dcde
      Yanqin Jin 提交于
      Summary:
      **This PR does not affect the functionality of `DB` and write-committed transactions.**
      
      `CompactionIterator` uses `KeyCommitted(seq)` to determine if a key in the database is committed.
      As the name 'write-committed' implies, if write-committed policy is used, a key exists in the database only if
      it is committed. In fact, the implementation of `KeyCommitted()` is as follows:
      
      ```
      inline bool KeyCommitted(SequenceNumber seq) {
        // For non-txn-db and write-committed, snapshot_checker_ is always nullptr.
        return snapshot_checker_ == nullptr ||
               snapshot_checker_->CheckInSnapshot(seq, kMaxSequence) == SnapshotCheckerResult::kInSnapshot;
      }
      ```
      
      With that being said, we focus on write-prepared/write-unprepared transactions.
      
      A few notes:
      - A key can exist in the db even if it's uncommitted. Therefore, we rely on `snapshot_checker_` to determine data visibility. We also require that all writes go through transaction API instead of the raw `WriteBatch` + `Write`, thus at most one uncommitted version of one user key can exist in the database.
      - `CompactionIterator` outputs a key as long as the key is uncommitted.
      
      Due to the above reasons, it is possible that `CompactionIterator` decides to output an uncommitted key without
      doing further checks on the key (`NextFromInput()`). By the time the key is being prepared for output, the key becomes
      committed because the `snapshot_checker_(seq, kMaxSequence)` becomes true in the implementation of `KeyCommitted()`.
      Then `CompactionIterator` will try to zero its sequence number and hit assertion error if the key is a tombstone.
      
      To fix this issue, we should make the `CompactionIterator` see a consistent view of the input keys. Note that
      for write-prepared/write-unprepared, the background flush/compaction jobs already take a "job snapshot" before starting
      processing keys. The job snapshot is released only after the entire flush/compaction finishes. We can use this snapshot
      to determine whether a key is committed or not with minor change to `KeyCommitted()`.
      
      ```
      inline bool KeyCommitted(SequenceNumber sequence) {
        // For non-txn-db and write-committed, snapshot_checker_ is always nullptr.
        return snapshot_checker_ == nullptr ||
               snapshot_checker_->CheckInSnapshot(sequence, job_snapshot_) ==
                   SnapshotCheckerResult::kInSnapshot;
      }
      ```
      
      As a result, whether a key is committed or not will remain a constant throughout compaction, causing no trouble
      for `CompactionIterator`s assertions.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9830
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D35561162
      
      Pulled By: riversand963
      
      fbshipit-source-id: 0e00d200c195240341cfe6d34cbc86798b315b9f
      0bd4dcde
    • J
      Fix minimum libzstd version that supports ZSTD_STREAMING (#9841) · 844a3510
      Jonathan Albrecht 提交于
      Summary:
      The minimum libzstd version that has `ZSTD_compressStream2` is
      1.4.0 so only define ZSTD_STREAMING in that case.
      
      Fixes building on Ubuntu 18.04 which has libzstd 1.3.3 as its
      repository version.
      
      Fixes https://github.com/facebook/rocksdb/issues/9795
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9841
      
      Test Plan:
      Build and test on Ubuntu 18.04 with:
        apt-get install libsnappy-dev zlib1g-dev libbz2-dev liblz4-dev \
          libzstd-dev libgflags-dev g++ make curl
      
      Reviewed By: ajkr
      
      Differential Revision: D35648738
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 2a9e969bcc17a7dc10172f3817283409de885811
      844a3510
    • A
      Expose `CacheEntryRole` and map keys for block cache stat collections (#9838) · d6e016be
      Andrew Kryczka 提交于
      Summary:
      This gives users the ability to examine the map populated by `GetMapProperty()` with property `kBlockCacheEntryStats`. It also sets us up for a possible future where cache reservations are configured according to `CacheEntryRole`s rather than flags coupled to roles.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9838
      
      Test Plan:
      - migrated test DBBlockCacheTest.CacheEntryRoleStats to use this API. That test verifies some of the contents are as expected
      - added a DBPropertiesTest to verify the public map keys are present, and nothing else
      
      Reviewed By: hx235
      
      Differential Revision: D35629493
      
      Pulled By: ajkr
      
      fbshipit-source-id: 5c4356b8560e85d1f881fd32c44c15960b02fc68
      d6e016be
  5. 14 4月, 2022 6 次提交
  6. 13 4月, 2022 4 次提交
  7. 12 4月, 2022 6 次提交
    • A
      Remove corrupted WAL files in kPointRecoveryMode with avoid_flush_duing_recovery set true (#9634) · ae82d914
      Akanksha Mahajan 提交于
      Summary:
      1) In case of non-TransactionDB and avoid_flush_during_recovery = true, RocksDB won't
      flush the data from WAL to L0 for all column families if possible. As a
      result, not all column families can increase their log_numbers, and
      min_log_number_to_keep won't change.
      2) For transaction DB (.allow_2pc), even with the flush, there may be old WAL files that it must not delete because they can contain data of uncommitted transactions and min_log_number_to_keep won't change.
      
      If we persist a new MANIFEST with
      advanced log_numbers for some column families, then during a second
      crash after persisting the MANIFEST, RocksDB will see some column
      families' log_numbers larger than the corrupted wal, and the "column family inconsistency" error will be hit, causing recovery to fail.
      
      As a solution,
      1. the corrupted WALs whose numbers are larger than the
      corrupted wal and smaller than the new WAL will be moved to archive folder.
      2. Currently, RocksDB DB::Open() may creates and writes to two new MANIFEST files even before recovery succeeds. This PR buffers the edits in a structure and writes to a new MANIFEST after recovery is successful
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9634
      
      Test Plan:
      1. Added new unit tests
                      2. make crast_test -j
      
      Reviewed By: riversand963
      
      Differential Revision: D34463666
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: e233d3af0ed4e2028ca0cf051e5a334a0fdc9d19
      ae82d914
    • A
      Enable async prefetching for ReadOptions.readahead_size (#9827) · 63e68a4e
      Akanksha Mahajan 提交于
      Summary:
      Currently async prefetching is enabled for implicit internal auto readahead in FilePrefetchBuffer if `ReadOptions.async_io` is set. This PR enables async prefetching for `ReadOptions.readahead_size` when `ReadOptions.async_io` is set true.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9827
      
      Test Plan: Update unit test
      
      Reviewed By: anand1976
      
      Differential Revision: D35552129
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: d9f9a96672852a591375a21eef15355cf3289f5c
      63e68a4e
    • M
      Plugin Registry (#7949) · b7db7eae
      mrambacher 提交于
      Summary:
      Added a Plugin class to the ObjectRegistry.  Enabled compile-time and program-time addition of plugins to the Registry.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7949
      
      Reviewed By: mrambacher
      
      Differential Revision: D33517674
      
      Pulled By: pdillinger
      
      fbshipit-source-id: c3e3270aab76a489bfa9e85d78cdfca951912557
      b7db7eae
    • G
      Prevent double caching in the compressed secondary cache (#9747) · f241d082
      gitbw95 提交于
      Summary:
      ###  **Summary:**
      When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached.
      
      **Changes include:**
      1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions.
      2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache().
      3. Rename LRUSecondaryCache to CompressedSecondaryCache.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747
      
      Test Plan:
      **Test Scripts:**
      1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB.
      ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1
      
      2. overwrite it to a stable state:
      ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1
      
      4. Run read tests with diffeernt cache setting:
      
      T1:
      ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000  --statistics -db=/db_bench_1
      
      T2:
      ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1
      
      T3:
      ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1
      
      T4:
      ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1
      
      **Before this PR**
      | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate |
      |------------|-------------------------------------|----------------|
      |520 MB | 0 MB | 85.5% |
      |320 MB | 400 MB | 96.2% |
      |520 MB | 400 MB | 98.3% |
      |20 MB | 500 MB | 98.8% |
      
      **Before this PR**
      | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate |
      |------------|-------------------------------------|----------------|
      |520 MB | 0 MB | 85.5% |
      |320 MB | 400 MB | 99.9% |
      |520 MB | 400 MB | 99.9% |
      |20 MB | 500 MB | 99.2% |
      
      Reviewed By: anand1976
      
      Differential Revision: D35117499
      
      Pulled By: gitbw95
      
      fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
      f241d082
    • A
      Fix stress test failure in ReadAsync. (#9824) · f3bcac39
      Akanksha Mahajan 提交于
      Summary:
      Fix stress test failure in ReadAsync by ignoring errors
      injected during async read by FaultInjectionFS.
      Failure:
      ```
       WARNING: prefix_size is non-zero but memtablerep != prefix_hash
      Didn't get expected error from MultiGet.
      num_keys 14 Expected 1 errors, seen 0
      Callstack that injected the fault
      Injected error type = 32538
      Message: error;
      #0   ./db_stress() [0x6f7dd4] rocksdb::port::SaveStack(int*, int)	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/port/stack_trace.cc:152
      https://github.com/facebook/rocksdb/issues/1   ./db_stress() [0x7f2bda] rocksdb::FaultInjectionTestFS::InjectThreadSpecificReadError(rocksdb::FaultInjectionTestFS::ErrorOperation, rocksdb::Slice*, bool, char*, bool, bool*)	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/utilities/fault_injection_fs.cc:891
      https://github.com/facebook/rocksdb/issues/2   ./db_stress() [0x7f2e78] rocksdb::TestFSRandomAccessFile::Read(unsigned long, unsigned long, rocksdb::IOOptions const&, rocksdb::Slice*, char*, rocksdb::IODebugContext*) const	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/utilities/fault_injection_fs.cc:367
      https://github.com/facebook/rocksdb/issues/3   ./db_stress() [0x6483d7] rocksdb::(anonymous namespace)::CompositeRandomAccessFileWrapper::Read(unsigned long, unsigned long, rocksdb::Slice*, char*) const	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/env/composite_env.cc:61
      https://github.com/facebook/rocksdb/issues/4   ./db_stress() [0x654564] rocksdb::(anonymous namespace)::LegacyRandomAccessFileWrapper::Read(unsigned long, unsigned long, rocksdb::IOOptions const&, rocksdb::Slice*, char*, rocksdb::IODebugContext*) const	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/env/env.cc:152
      https://github.com/facebook/rocksdb/issues/5   ./db_stress() [0x659b3b] rocksdb::FSRandomAccessFile::ReadAsync(rocksdb::FSReadRequest&, rocksdb::IOOptions const&, std::function<void (rocksdb::FSReadRequest const&, void*)>, void*, void**, std::function<void (void*)>*, rocksdb::IODebugContext*)	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/./include/rocksdb/file_system.h:896
      https://github.com/facebook/rocksdb/issues/6   ./db_stress() [0x8b8bab] rocksdb::RandomAccessFileReader::ReadAsync(rocksdb::FSReadRequest&, rocksdb::IOOptions const&, std::function<void (rocksdb::FSReadRequest const&, void*)>, void*, void**, std::function<void (void*)>*, rocksdb::Env::IOPriority)	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/file/random_access_file_reader.cc:459
      https://github.com/facebook/rocksdb/issues/7   ./db_stress() [0x8b501f] rocksdb::FilePrefetchBuffer::ReadAsync(rocksdb::IOOptions const&, rocksdb::RandomAccessFileReader*, rocksdb::Env::IOPriority, unsigned long, unsigned long, unsigned long, unsigned int)	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/file/file_prefetch_buffer.cc:124
      https://github.com/facebook/rocksdb/issues/8   ./db_stress() [0x8b55fc] rocksdb::FilePrefetchBuffer::PrefetchAsync(rocksdb::IOOptions const&, rocksdb::RandomAccessFileReader*, unsigned long, unsigned long, unsigned long, rocksdb::Env::IOPriority, bool&)	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/file/file_prefetch_buffer.cc:363
      https://github.com/facebook/rocksdb/issues/9   ./db_stress() [0x8b61f8] rocksdb::FilePrefetchBuffer::TryReadFromCacheAsync(rocksdb::IOOptions const&, rocksdb::RandomAccessFileReader*, unsigned long, unsigned long, rocksdb::Slice*, rocksdb::Status*, rocksdb::Env::IOPriority, bool)	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/file/file_prefetch_buffer.cc:482
      https://github.com/facebook/rocksdb/issues/10  ./db_stress() [0x745e04] rocksdb::BlockFetcher::TryGetFromPrefetchBuffer()	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/table/block_fetcher.cc:76
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9824
      
      Test Plan:
      ```
      ./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=1 --allow_concurrent_memtable_write=0 --async_io=1 --atomic_flush=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 -- backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=5.037629726741734 --bottommost_compression_type=lz4hc --cache_index_and_filter_blocks=0 --cache_size=8388608 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_ttl=100 --compression_max_dict_buffer_bytes=1073741823 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=zstd --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --db=/home/akankshamahajan/dev/shm/rocksdb/rocksdb_crashtest_blackbox --db_write_buffer_size=8388608 --delpercent=0 --delrangepercent=0 --destroy_db_initially=0 - detect_filter_construct_corruption=1 --disable_wal=1 --enable_compaction_filter=0 --enable_pipelined_write=0 --expected_values_dir=/home/akankshamahajan/dev/shm/rocksdb/rocksdb_crashtest_expected --experimental_mempurge_threshold=8.772789063014715 --fail_if_options_file_error=0 --file_checksum_impl=crc32c --flush_one_in=1000000 --format_version=3 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=15 --index_type=3 --iterpercent=0 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=False --long_running_snapshots=0 --mark_for_compaction_one_file_in=0 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=25000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=2097152 --memtable_prefix_bloom_size_ratio=0.001 --memtable_whole_key_filtering=1 --memtablerep=skip_list --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=2 --pause_background_one_in=1000000 --periodic_compaction_seconds=1000 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=0 --progress_reports=0 --read_fault_one_in=32 --readpercent=100 --recycle_log_file_num=1 --reopen=0 --reserve_table_reader_memory=1 --ribbon_starting_level=999 --secondary_cache_fault_one_in=0 --set_options_one_in=0 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --subcompactions=2 --sync=0 --sync_fault_injection=False --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=3 --unpartitioned_pinning=2 --use_block_based_filter=0 --use_clock_cache=0 --use_direct_io_for_flush_and_compaction=1 --use_direct_reads=0 --use_full_merge_v1=0 --use_merge=1 --use_multiget=1 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=0
      ```
      
      Reviewed By: anand1976
      
      Differential Revision: D35514566
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: e2a868fdd7422604774c1419738f9926a21e92a4
      f3bcac39
    • Y
      Remove dead code (#9825) · 0ad9ee30
      Yanqin Jin 提交于
      Summary:
      Options `preserve_deletes` and `iter_start_seqnum` have been removed since 7.0.
      
      This PR removes dead code related to these two removed options.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9825
      
      Test Plan: make check
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D35517950
      
      Pulled By: riversand963
      
      fbshipit-source-id: 86282ce5ec4087acb94a06a42a1b6d55b1715482
      0ad9ee30
  8. 09 4月, 2022 1 次提交
  9. 08 4月, 2022 2 次提交
  10. 07 4月, 2022 7 次提交
    • A
      Fix reseting of async_read_in_progress_ variable in FilePrefetchBuffer to call Poll API (#9815) · 7ea26abb
      Akanksha Mahajan 提交于
      Summary:
      Currently RocksDB reset async_read_in_progress_ in callback
      due to which underlying filesystem relying on Poll API won't be called
      leading to stale memory access.
      In order to fix it, async_read_in_progress_ will be reset after Poll API
      is called to make sure underlying file_system waiting on Poll can clear
      its state or take appropriate action.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9815
      
      Test Plan: CircleCI tests
      
      Reviewed By: anand1976
      
      Differential Revision: D35451534
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: b70ef6251a7aa9ed4876ba5e5100baa33d7d474c
      7ea26abb
    • S
      L0 Subcompaction to trim input files (#9802) · e03f8a0c
      sdong 提交于
      Summary:
      When sub compaction is decided for L0->L1 compaction, most of the cases, all L0 files will be involved in all sub compactions. However, it is not always the case. When files are generally (but not strictly) inserted in sequential order, there can be a subset of L0 files invovled. Yet RocksDB always open all those L0 files, and build an iterator, read many of the files' first of last block with expensive readahead. We trim some input files to reduce overhead a little bit.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9802
      
      Test Plan: Add a unit test to cover this case and manually validate the behavior while running the test.
      
      Reviewed By: ajkr
      
      Differential Revision: D35371031
      
      fbshipit-source-id: 701ed7375b5cbe41672e93b38fe8a1503dad08b6
      e03f8a0c
    • P
      Tests for filter compatibility (#9773) · 8ce7cea9
      Peter Dillinger 提交于
      Summary:
      This change adds two unit tests that would each catch the
      regression fixed in https://github.com/facebook/rocksdb/issues/9736
      
      * TableMetaIndexKeys - detects any churn in metaindex block keys
      generated by SST files using standard db_test_util configurations.
      * BloomFilterCompatibility - this detects if any common built-in
      FilterPolicy configurations fail to read filters generated by another.
      (The regression bug caused NewRibbonFilterPolicy not to read filters
      from NewBloomFilterPolicy and vice-versa.) This replaces some previous
      tests that didn't really appear to be testing much of anything except
      basic data correctness, which doesn't tell you a filter is being used.
      
      Light refactoring in meta_blocks.cc/h to support inspecting metaindex
      keys.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9773
      
      Test Plan:
      this is the test. Verified that 7.0.2 fails both tests and 7.0.3 passes.
      With backporting for intentional API changes in 7.0, 6.29 also passes.
      
      Reviewed By: ajkr
      
      Differential Revision: D35236248
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 493dfe9ad7e27524bf7c6c1af8a4b8c31bc6ef5a
      8ce7cea9
    • A
      Add WAL compression to stress tests (#9811) · c3d7e162
      anand76 提交于
      Summary:
      Add the WAL compression feature to the stress test.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9811
      
      Reviewed By: riversand963
      
      Differential Revision: D35414316
      
      Pulled By: anand1976
      
      fbshipit-source-id: 0c17b1ec55679a52f088ad368798b57139bd921a
      c3d7e162
    • P
      Remove public rocksdb-lego-determinator (#9803) · ad32646e
      Peter Dillinger 提交于
      Summary:
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9803
      
      Only use Meta-internal version now. precommit_checker.py also now obsolete
      
      Bring back `make commit_prereq` in follow-up work
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35372283
      
      fbshipit-source-id: 7428438ca51f878802c301d0d5591675e551a113
      ad32646e
    • A
      Update stats for Read and ReadAsync in random_access_file_reader for async prefetching (#9810) · 0b8f8859
      Akanksha Mahajan 提交于
      Summary:
      Update stats in random_access_file_reader for Read and
      ReadAsync API to take into account the read latency for async
      prefetching.
      
      It also fixes ERROR_HANDLER_AUTORESUME_RETRY_COUNT stat whose value was
      incorrect in portal.h
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9810
      
      Test Plan: Update unit test
      
      Reviewed By: anand1976
      
      Differential Revision: D35433081
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: aeec3901270e58a003ce6b5214bd25ddcb3a12a9
      0b8f8859
    • H
      Account memory of big memory users in BlockBasedTable in global memory limit (#9748) · 49623f9c
      Hui Xiao 提交于
      Summary:
      **Context:**
      Through heap profiling, we discovered that `BlockBasedTableReader` objects can accumulate and lead to high memory usage (e.g, `max_open_file = -1`). These memories are currently not saved, not tracked, not constrained and not cache evict-able. As a first step to improve this, similar to https://github.com/facebook/rocksdb/pull/8428,  this PR is to track an estimate of `BlockBasedTableReader` object's memory in block cache and fail future creation if the memory usage exceeds the available space of cache at the time of creation.
      
      **Summary:**
      - Approximate big memory users  (`BlockBasedTable::Rep` and `TableProperties` )' memory usage in addition to the existing estimated ones (filter block/index block/un-compression dictionary)
      - Charge all of these memory usages to block cache on `BlockBasedTable::Open()` and release them on `~BlockBasedTable()` as there is no memory usage fluctuation of concern in between
      - Refactor on CacheReservationManager (and its call-sites) to add concurrent support for BlockBasedTable  used in this PR.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9748
      
      Test Plan:
      - New unit tests
      - db bench: `OpenDb` : **-0.52% in ms**
        - Setup `./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -disable_auto_compactions=1 -write_buffer_size=1048576`
        - Repeated run with pre-change w/o feature and post-change with feature, benchmark `OpenDb`:  `./db_bench -benchmarks=readrandom -use_existing_db=1 -db=/dev/shm/testdb -reserve_table_reader_memory=true (remove this when running w/o feature) -file_opening_threads=3 -open_files=-1 -report_open_timing=true| egrep 'OpenDb:'`
      
      #-run | (feature-off) avg milliseconds | std milliseconds | (feature-on) avg milliseconds | std milliseconds | change (%)
      -- | -- | -- | -- | -- | --
      10 | 11.4018 | 5.95173 | 9.47788 | 1.57538 | -16.87382694
      20 | 9.23746 | 0.841053 | 9.32377 | 1.14074 | 0.9343477536
      40 | 9.0876 | 0.671129 | 9.35053 | 1.11713 | 2.893283155
      80 | 9.72514 | 2.28459 | 9.52013 | 1.0894 | -2.108041632
      160 | 9.74677 | 0.991234 | 9.84743 | 1.73396 | 1.032752389
      320 | 10.7297 | 5.11555 | 10.547 | 1.97692 | **-1.70275031**
      640 | 11.7092 | 2.36565 | 11.7869 | 2.69377 | **0.6635807741**
      
      -  db bench on write with cost to cache in WriteBufferManager (just in case this PR's CRM refactoring accidentally slows down anything in WBM) : `fillseq` : **+0.54% in micros/op**
      `./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -disable_auto_compactions=1 -cost_write_buffer_to_cache=true -write_buffer_size=10000000000 | egrep 'fillseq'`
      
      #-run | (pre-PR) avg micros/op | std micros/op | (post-PR)  avg micros/op | std micros/op | change (%)
      -- | -- | -- | -- | -- | --
      10 | 6.15 | 0.260187 | 6.289 | 0.371192 | 2.260162602
      20 | 7.28025 | 0.465402 | 7.37255 | 0.451256 | 1.267813605
      40 | 7.06312 | 0.490654 | 7.13803 | 0.478676 | **1.060579461**
      80 | 7.14035 | 0.972831 | 7.14196 | 0.92971 | **0.02254791432**
      
      -  filter bench: `bloom filter`: **-0.78% in ms/key**
          - ` ./filter_bench -impl=2 -quick -reserve_table_builder_memory=true | grep 'Build avg'`
      
      #-run | (pre-PR) avg ns/key | std ns/key | (post-PR)  ns/key | std ns/key | change (%)
      -- | -- | -- | -- | -- | --
      10 | 26.4369 | 0.442182 | 26.3273 | 0.422919 | **-0.4145720565**
      20 | 26.4451 | 0.592787 | 26.1419 | 0.62451 | **-1.1465262**
      
      - Crash test `python3 tools/db_crashtest.py blackbox --reserve_table_reader_memory=1 --cache_size=1` killed as normal
      
      Reviewed By: ajkr
      
      Differential Revision: D35136549
      
      Pulled By: hx235
      
      fbshipit-source-id: 146978858d0f900f43f4eb09bfd3e83195e3be28
      49623f9c