1. 30 3月, 2022 2 次提交
  2. 29 3月, 2022 1 次提交
  3. 26 3月, 2022 5 次提交
    • A
      Fix some errors in async prefetching in FilePrefetchBuffer (#9734) · 33f8a08a
      Akanksha Mahajan 提交于
      Summary:
      In ReadOption `async_io` which prefetches the data asynchronously, db_bench and db_stress runs were failing  because wrong data was prefetched which resulted in Error: Checksum mismatched. Wrong data was copied because capacity was less than actual size needed. It has been fixed in this PR.
      
      Since there are two separate methods for async and sync prefetching, these changes are in async prefetching methods and any changes would not effect normal prefetching. I ran the regressions to make sure normal prefetching is fine.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9734
      
      Test Plan:
      1. CircleCI jobs
      
      2.  Ran db_bench
      ```
      . /db_bench -use_existing_db=true
      -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32
      -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680
      -duration=120 -ops_between_duration_checks=1 -async_io=1 -adaptive_readahead=1
      
      ```
      3. Ran db_stress test
      ```
      export CRASH_TEST_EXT_ARGS=" --async_io=1 --adaptive_readahead=1"
      make crash_test -j
      ```
      
      4. Run regressions for async_io disabled.
      
      Old flow without any async changes:
      ```
      ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.0
      Date:       Thu Mar 17 13:11:34 2022
      CPU:        24 * Intel Core Processor (Broadwell)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      seekrandom   :  483618.390 micros/op 2 ops/sec;  338.9 MB/s (249 of 249 found)
      ```
      
      With async prefetching changes and async_io disabled to make sure in normal prefetching there is no regression.
       ```
       ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 --async_io=0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.1
      Date:       Wed Mar 23 15:56:37 2022
      CPU:        24 * Intel Core Processor (Broadwell)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      seekrandom   :  481819.816 micros/op 2 ops/sec;  340.2 MB/s (250 of 250 found)
      ```
      
      Reviewed By: riversand963
      
      Differential Revision: D35058471
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 9233a1e6d97cea0c7a8111bfb9e8ac3251c341ce
      33f8a08a
    • M
      Correctly set ThreadState::tid (#9757) · 37de4e1d
      Mark Callaghan 提交于
      Summary:
      Fixes a bug introduced by me in https://github.com/facebook/rocksdb/pull/9733
      That PR added a counter so that the per-thread seeds in ThreadState would
      be unique even when --benchmarks had more than one test. But it incorrectly
      used this counter as the value for ThreadState::tid as well.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9757
      
      Test Plan:
      Confirm that unexpectedly good QPS results on the regression tests return
      to normal with this fix. I have confirmed that the QPS increase starts with
      the PR 9733 diff.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35149303
      
      Pulled By: mdcallag
      
      fbshipit-source-id: dee5cc36b7faaba6c3be6d6a253d3c2eaad72864
      37de4e1d
    • H
      Clarify Options::rate_limiter api doc for #9607 Rate-limit automatic WAL flush... · e2cb9aa2
      Hui Xiao 提交于
      Clarify Options::rate_limiter api doc for #9607 Rate-limit automatic WAL flush after each user write (#9745)
      
      Summary:
      As title for https://github.com/facebook/rocksdb/pull/9607
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9745
      
      Test Plan: No code change
      
      Reviewed By: ajkr
      
      Differential Revision: D35096901
      
      Pulled By: hx235
      
      fbshipit-source-id: 6bd3671baecfdc04579b0a81a957bfaa7bed81e1
      e2cb9aa2
    • J
      jni: uniformly use GetByteArrayRegion() to copy bytes (#9380) · b83263bb
      Jermy Li 提交于
      Summary:
      Uniformly use GetByteArrayRegion() instead of GetByteArrayElements()
      to copy bytes.
      In addition, it can avoid an inefficient ReleaseByteArrayElements()
      operation.
      Some benefits of GetByteArrayRegion() can be referred to:
      https://stackoverflow.com/a/2480493
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9380
      
      Reviewed By: ajkr
      
      Differential Revision: D35135474
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: a32c1774d37f2d22b9bcd105d83e0bb984b71b54
      b83263bb
    • M
      db_bench should use a good seed when --seed is not set or set to 0 (#9740) · 1a130fa3
      Mark Callaghan 提交于
      Summary:
      This is for https://github.com/facebook/rocksdb/issues/9737
      
      I have wasted more than a few hours running db_bench benchmarks where --seed was not set
      and getting better than expected results because cache hit rates are great because
      multiple invocations of db_bench used the same value for --seed or did not set it,
      and then all used 0. The result is that all see the same sequence of keys.
      
      Others have done the same. The problem is worse in that it is easy to miss and the result is a benchmark with results that are misleading.
      
      A good way to avoid this is to set it to the equivalent of gettimeofday() when either
      --seed is not set or it is set to 0 (the default).
      
      With this change the actual seed is printed when it was 0 at process start:
        Set seed to 1647992570365606 because --seed was 0
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9740
      
      Test Plan:
      Perf results:
      
      ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000
        readrandom   :       6.469 micros/op 154583 ops/sec;   17.1 MB/s (4000000 of 4000000 found)
      
      ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=0
        readrandom   :       6.565 micros/op 152321 ops/sec;   16.9 MB/s (4000000 of 4000000 found)
      
      ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=1
        readrandom   :       6.461 micros/op 154777 ops/sec;   17.1 MB/s (4000000 of 4000000 found)
      
      ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=2
        readrandom   :       6.525 micros/op 153244 ops/sec;   17.0 MB/s (4000000 of 4000000 found)
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35145361
      
      Pulled By: mdcallag
      
      fbshipit-source-id: 2b35b153ccec46b27d7c9405997523555fc51267
      1a130fa3
  4. 25 3月, 2022 9 次提交
    • M
      Enable READ_BLOCK_COMPACTION_MICROS to track stats (#9722) · 98130c5a
      myasuka 提交于
      Summary:
      After commit [d642c60b](https://github.com/facebook/rocksdb/commit/d642c60bdc100f7509ca77b383cd47b51d80d810), the stats `READ_BLOCK_COMPACTION_MICROS` cannot record any compaction read duration, and it always report zero.
      
      This PR targets to distinguish `READ_BLOCK_COMPACTION_MICROS` with `READ_BLOCK_GET_MICROS` so that `READ_BLOCK_COMPACTION_MICROS` could record the correct stats.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9722
      
      Reviewed By: ajkr
      
      Differential Revision: D35021870
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: f1a804994265e51465de64c2a08f2e0eeb6fc5a3
      98130c5a
    • J
      Fix make clean fail after java build (#9710) · 81d1cdca
      Jay Zhuang 提交于
      Summary:
      Seems clean-rocksjava and clean-rocks conflict.
      Also remove unnecessary step in java CI build, otherwise it will rebuild
      the code again as java make sample do clean up first.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9710
      
      Test Plan: `make rocksdbjava && make clean` should return success
      
      Reviewed By: riversand963
      
      Differential Revision: D35122872
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 2a15b83e7a763c0fc0e42e1f35aac9551f951ece
      81d1cdca
    • M
      Add --slow_usecs option to determine when long op message is printed (#9732) · 409635cb
      Mark Callaghan 提交于
      Summary:
      This adds the --slow_usecs option with a default value of 1M. Operations that
      take this much time have a message printed when --histogram=1, --stats_interval=0
      and --stats_interval_seconds=0. The current code hardwired this to 20,000 usecs
      and for some stress tests that reduced throughput by 20% or more.
      
      This is for https://github.com/facebook/rocksdb/issues/9620
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9732
      
      Test Plan:
      ./db_bench --benchmarks=fillrandom,readrandom --compression_type=lz4 --slow_usecs=100 --histogram=1
      ./db_bench --benchmarks=fillrandom,readrandom --compression_type=lz4 --slow_usecs=100000 --histogram=1
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35121522
      
      Pulled By: mdcallag
      
      fbshipit-source-id: daf27f937efd748980545d6395db332712fc078b
      409635cb
    • P
      Fix heap use-after-free race with DropColumnFamily (#9730) · cad80997
      Peter Dillinger 提交于
      Summary:
      Although ColumnFamilySet comments say that DB mutex can be
      freed during iteration, as long as you hold a ref while releasing DB
      mutex, this is not quite true because UnrefAndTryDelete might delete cfd
      right before it is needed to get ->next_ for the next iteration of the
      loop.
      
      This change solves the problem by making a wrapper class that makes such
      iteration easier while handling the tricky details of UnrefAndTryDelete
      on the previous cfd only after getting next_ in operator++.
      
      FreeDeadColumnFamilies should already have been obsolete; this removes
      it for good. Similarly, ColumnFamilySet::iterator doesn't need to check
      for cfd with 0 refs, because those are immediately deleted.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9730
      
      Test Plan:
      was reported with ASAN on unit tests like
      DBLogicalBlockSizeCacheTest.CreateColumnFamily (very rare); keep watching
      
      Reviewed By: ltamasi
      
      Differential Revision: D35038143
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 0a5478d5be96c135343a00603711b7df43ae19c9
      cad80997
    • A
      Extend Java RocksDB iterators to support indirect Byte Buffers (#9222) · dec144f1
      Alan Paxton 提交于
      Summary:
      Extend Java RocksDB iterators to support indirect byte buffers, to add to the existing support for direct byte buffers.
      Code to distinguish direct/indirect buffers is switched in Java, and a 2nd separate JNI call implemented to support indirect
      buffers. Indirect support passes contained buffers using byte[]
      
      There are some Java subclasses of iterator (WBWIIterator, SstFileReaderIterator) which also now have parallel JNI support functions implemented, along with direct/indirect switches in Java methods.
      
      Closes https://github.com/facebook/rocksdb/issues/6282
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9222
      
      Reviewed By: ajkr
      
      Differential Revision: D35115283
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: f8d5d20b975aef700560fbcc99f707bb028dc42e
      dec144f1
    • A
      Add new checksum type kXXH3 to Java API (#9749) · 8ae0c33a
      Alan Paxton 提交于
      Summary:
      Fix https://github.com/facebook/rocksdb/issues/9720
      
      And make a couple of incidental tests test the thing they were meant to test.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9749
      
      Reviewed By: ajkr
      
      Differential Revision: D35115298
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: d687d1f070d29216be9693601c71131bbea87c79
      8ae0c33a
    • M
      db_bench should fail on bad values for --compaction_fadvice and... · f219e3d5
      Mark Callaghan 提交于
      db_bench should fail on bad values for --compaction_fadvice and --value_size_distribution_type (#9741)
      
      Summary:
      db_bench quietly parses and ignores bad values for --compaction_fadvice and --value_size_distribution_type
      I prefer that it fail for them as it does for bad option values in most other cases. Otherwise a benchmark
      result will be provided for the wrong configuration and the result will be misleading.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9741
      
      Test Plan:
      These now fail:
      ./db_bench --compaction_fadvice=noney
      Unknown compaction fadvice:noney
      
      ./db_bench --value_size_distribution_type=norma
      Cannot parse distribution type 'norma'
      
      While correct values continue to work:
       ./db_bench --value_size_distribution_type=normal
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      
      ./db_bench --compaction_fadvice=none
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      
      Reviewed By: siying
      
      Differential Revision: D35115973
      
      Pulled By: mdcallag
      
      fbshipit-source-id: c2b10de5c2d1ea7c7539e676f5bd556351f5d370
      f219e3d5
    • Y
      Add two new targets to determinator (#9753) · 862304a1
      Yanqin Jin 提交于
      Summary:
      Test plan
      ```
      build_tools/rocksdb-lego-determinator stress_crash_with_multiops_wc_txn
      build_tools/rocksdb-lego-determinator stress_crash_with_multiops_wp_txn
      ```
      
      Spot check the printed job spec.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9753
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35117116
      
      Pulled By: riversand963
      
      fbshipit-source-id: a7ed82e8cb9bc2fd13f4f00291c6a39457415fb0
      862304a1
    • J
      Remove DBGet P95/P99 benchmark metrics (#9742) · 18463f8c
      Jay Zhuang 提交于
      Summary:
      DBGet p95 and p99 have high variation, remove them for now.
      Also increase the iteration to 3 to avoid false positive.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9742
      
      Test Plan: Internal CI
      
      Reviewed By: ajkr
      
      Differential Revision: D35082820
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: facc1d56b94e54aa8c8852c207aae2ae4e4924b0
      18463f8c
  5. 24 3月, 2022 8 次提交
    • M
      Avoid seed reuse when --benchmarks has more than one test (#9733) · d583d23d
      Mark Callaghan 提交于
      Summary:
      When --benchmarks has more than one test then the threads in one benchmark
      will use the same set of seeds as the threads in the previous benchmark.
      This diff fixe that.
      
      This fixes https://github.com/facebook/rocksdb/issues/9632
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9733
      
      Test Plan:
      For this command line the block cache is 8GB, so it caches at most 1024 8KB blocks. Note that without
      this diff the second run of readrandom has a much better response time because seed reuse means the
      second run reads the same 1000 blocks as the first run and they are cached at that point. But with
      this diff that does not happen.
      
      ./db_bench --benchmarks=fillseq,flush,compact0,waitforcompaction,levelstats,readrandom,readrandom --compression_type=zlib --num=10000000 --reads=1000 --block_size=8192
      
      ...
      
      ```
      Level Files Size(MB)
      --------------------
        0        0        0
        1       11      238
        2        9      253
        3        0        0
        4        0        0
        5        0        0
        6        0        0
      ```
      
       --- perf results without this diff
      
      DB path: [/tmp/rocksdbtest-2260/dbbench]
      readrandom   :      46.212 micros/op 21618 ops/sec;    2.4 MB/s (1000 of 1000 found)
      
      DB path: [/tmp/rocksdbtest-2260/dbbench]
      readrandom   :      21.963 micros/op 45450 ops/sec;    5.0 MB/s (1000 of 1000 found)
      
       --- perf results with this diff
      
      DB path: [/tmp/rocksdbtest-2260/dbbench]
      readrandom   :      47.213 micros/op 21126 ops/sec;    2.3 MB/s (1000 of 1000 found)
      
      DB path: [/tmp/rocksdbtest-2260/dbbench]
      readrandom   :      42.880 micros/op 23299 ops/sec;    2.6 MB/s (1000 of 1000 found)
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35089763
      
      Pulled By: mdcallag
      
      fbshipit-source-id: 1b50143a07afe876b8c8e5fa50dd94a8ce57fc6b
      d583d23d
    • P
      Revise history of 7.1.0 for patch (#9746) · 727d11ce
      Peter Dillinger 提交于
      Summary:
      This updates main branch with a HISTORY update going into
      7.1.fb branch before tagging 7.1.0.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9746
      
      Test Plan: HISTORY.md only
      
      Reviewed By: ajkr, hx235
      
      Differential Revision: D35099194
      
      Pulled By: pdillinger
      
      fbshipit-source-id: b74ea8b626118dac235e387038420829850b8da2
      727d11ce
    • Y
      Add new determinators for multiops transactions stress test (#9708) · c18c4a08
      Yanqin Jin 提交于
      Summary:
      Add determinators for multiops transactions stress test with
      write-committed and write-prepared policies.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9708
      
      Test Plan: Internal CI
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D34967263
      
      Pulled By: riversand963
      
      fbshipit-source-id: 170a0842d56dccb6ed6bc0c5adfd33849acd6b31
      c18c4a08
    • Y
      Fix a race condition in WAL tracking causing DB open failure (#9715) · e0c84aa0
      Yanqin Jin 提交于
      Summary:
      There is a race condition if WAL tracking in the MANIFEST is enabled in a database that disables 2PC.
      
      The race condition is between two background flush threads trying to install flush results to the MANIFEST.
      
      Consider an example database with two column families: "default" (cfd0) and "cf1" (cfd1). Initially,
      both column families have one mutable (active) memtable whose data backed by 6.log.
      
      1. Trigger a manual flush for "cf1", creating a 7.log
      2. Insert another key to "default", and trigger flush for "default", creating 8.log
      3. BgFlushThread1 finishes writing 9.sst
      4. BgFlushThread2 finishes writing 10.sst
      
      ```
      Time  BgFlushThread1                                    BgFlushThread2
       |    mutex_.Lock()
       |    precompute min_wal_to_keep as 6
       |    mutex_.Unlock()
       |                                                     mutex_.Lock()
       |                                                     precompute min_wal_to_keep as 6
       |                                                     join MANIFEST write queue and mutex_.Unlock()
       |    write to MANIFEST
       |    mutex_.Lock()
       |    cfd1->log_number = 7
       |    Signal bg_flush_2 and mutex_.Unlock()
       |                                                     wake up and mutex_.Lock()
       |                                                     cfd0->log_number = 8
       |                                                     FindObsoleteFiles() with job_context->log_number == 7
       |                                                     mutex_.Unlock()
       |                                                     PurgeObsoleteFiles() deletes 6.log
       V
      ```
      
      As shown in the above, BgFlushThread2 thinks that the min wal to keep is 6.log because "cf1" has unflushed data in 6.log (cf1.log_number=6).
      Similarly, BgThread1 thinks that min wal to keep is also 6.log because "default" has unflushed data (default.log_number=6).
      No WAL deletion will be written to MANIFEST because 6 is equal to `versions_->wals_.min_wal_number_to_keep`,
      due to https://github.com/facebook/rocksdb/blob/7.1.fb/db/memtable_list.cc#L513:L514.
      The bg flush thread that finishes last will perform file purging. `job_context.log_number` will be evaluated as 7, i.e.
      the min wal that contains unflushed data, causing 6.log to be deleted. However, MANIFEST thinks 6.log should still exist.
      If you close the db at this point, you won't be able to re-open it if `track_and_verify_wal_in_manifest` is true.
      
      We must handle the case of multiple bg flush threads, and it is difficult for one bg flush thread to know
      the correct min wal number until the other bg flush threads have finished committing to the manifest and updated
      the `cfd::log_number`.
      To fix this issue, we rename an existing variable `min_log_number_to_keep_2pc` to `min_log_number_to_keep`,
      and use it to track WAL file deletion in non-2pc mode as well.
      This variable is updated only 1) during recovery with mutex held, or 2) in the MANIFEST write thread.
      `min_log_number_to_keep` means RocksDB will delete WALs below it, although there may be WALs
      above it which are also obsolete. Formally, we will have [min_wal_to_keep, max_obsolete_wal]. During recovery, we
      make sure that only WALs above max_obsolete_wal are checked and added back to `alive_log_files_`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9715
      
      Test Plan:
      ```
      make check
      ```
      Also ran stress test below (with asan) to make sure it completes successfully.
      ```
      TEST_TMPDIR=/dev/shm/rocksdb OPT=-g ASAN_OPTIONS=disable_coredump=0 \
      CRASH_TEST_EXT_ARGS=--compression_type=zstd SKIP_FORMAT_BUCK_CHECKS=1 \
      make J=52 -j52 blackbox_asan_crash_test
      ```
      
      Reviewed By: ltamasi
      
      Differential Revision: D34984412
      
      Pulled By: riversand963
      
      fbshipit-source-id: c7b21a8d84751bb55ea79c9f387103d21b231005
      e0c84aa0
    • Y
      Return invalid argument if batch is null (#9744) · 29bec740
      Yanqin Jin 提交于
      Summary:
      Originally, a corruption will be returned by `DBImpl::WriteImpl(batch...)` if batch is
      null. This is inaccurate since there is no data corruption.
      Return `Status::InvalidArgument()` instead.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9744
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D35086268
      
      Pulled By: riversand963
      
      fbshipit-source-id: 677397b007a53bc25210eac0178d49c9797b5951
      29bec740
    • M
      db_bench should fail when an option uses an invalid compression type (#9729) · 6904fd0c
      Mark Callaghan 提交于
      Summary:
      This changes db_bench to fail at startup for invalid compression types. It had been
      changing them to Snappy. For other invalid options it fails at startup.
      
      This is for https://github.com/facebook/rocksdb/issues/9621
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9729
      
      Test Plan:
      This continues to work:
      ./db_bench --benchmarks=fillrandom --compression_type=lz4
      
      This now fails rather than changing the compression type to Snappy
      ./db_bench --benchmarks=fillrandom --compression_type=lz44
      Cannot parse compression type 'lz44'
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35081323
      
      Pulled By: mdcallag
      
      fbshipit-source-id: 9b38c835abddce11aa7feb235df63f53cf829981
      6904fd0c
    • P
      Fix a major performance bug in 7.0 re: filter compatibility (#9736) · 91687d70
      Peter Dillinger 提交于
      Summary:
      Bloom filters generated by pre-7.0 releases are not read by
      7.0.x releases (and vice-versa) due to changes to FilterPolicy::Name()
      in https://github.com/facebook/rocksdb/issues/9590. This can severely impact read performance and read I/O on
      upgrade or downgrade with existing DB, but not data correctness.
      
      To fix, we go back using the old, unified name in SST metadata but (for
      a while anyway) recognize the aliases that could be generated by early
      7.0.x releases. This unfortunately requires a public API change to avoid
      interfering with all the good changes from https://github.com/facebook/rocksdb/issues/9590, but the API change
      only affects users with custom FilterPolicy, which should be very few.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9736
      
      Test Plan:
      manual
      
      Generate DBs with
      ```
      ./db_bench.7.0 -db=/dev/shm/rocksdb.7.0 -bloom_bits=10 -cache_index_and_filter_blocks=1 -benchmarks=fillrandom -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0
      ```
      and similar. Compare with
      ```
      for IMPL in 6.29 7.0 fixed; do for DB in 6.29 7.0 fixed; do echo "Testing $IMPL on $DB:"; ./db_bench.$IMPL -db=/dev/shm/rocksdb.$DB -use_existing_db -readonly -bloom_bits=10 -benchmarks=readrandom -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -duration=10 2>&1 | grep micros/op; done; done
      ```
      
      Results:
      ```
      Testing 6.29 on 6.29:
      readrandom   :      34.381 micros/op 29085 ops/sec;    3.2 MB/s (291999 of 291999 found)
      Testing 6.29 on 7.0:
      readrandom   :     190.443 micros/op 5249 ops/sec;    0.6 MB/s (52999 of 52999 found)
      Testing 6.29 on fixed:
      readrandom   :      40.148 micros/op 24907 ops/sec;    2.8 MB/s (249999 of 249999 found)
      Testing 7.0 on 6.29:
      readrandom   :     229.430 micros/op 4357 ops/sec;    0.5 MB/s (43999 of 43999 found)
      Testing 7.0 on 7.0:
      readrandom   :      33.348 micros/op 29986 ops/sec;    3.3 MB/s (299999 of 299999 found)
      Testing 7.0 on fixed:
      readrandom   :     152.734 micros/op 6546 ops/sec;    0.7 MB/s (65999 of 65999 found)
      Testing fixed on 6.29:
      readrandom   :      32.024 micros/op 31224 ops/sec;    3.5 MB/s (312999 of 312999 found)
      Testing fixed on 7.0:
      readrandom   :      33.990 micros/op 29390 ops/sec;    3.3 MB/s (294999 of 294999 found)
      Testing fixed on fixed:
      readrandom   :      28.714 micros/op 34825 ops/sec;    3.9 MB/s (348999 of 348999 found)
      ```
      
      Just paying attention to order of magnitude of ops/sec (short test
      durations, lots of noise), it's clear that with the fix we can read <= 6.29
      & >= 7.0 at full speed, where neither 6.29 nor 7.0 can on both. And 6.29
      release can properly read fixed DB at full speed.
      
      Reviewed By: siying, ajkr
      
      Differential Revision: D35057844
      
      Pulled By: pdillinger
      
      fbshipit-source-id: a46893a6af4bf084375ebe4728066d00eb08f050
      91687d70
    • M
      Add number of running flushes & compactions to --stats_per_interval output (#9726) · d71e5a5b
      Mark Callaghan 提交于
      Summary:
      This is for https://github.com/facebook/rocksdb/issues/9709 and add two lines to the end of DB Stats
      for num-running-compactions and num-running-flushes.
      
      For example ...
      
      ** DB Stats **
      Uptime(secs): 6.0 total, 1.0 interval
      Cumulative writes: 915K writes, 915K keys, 915K commit groups, 1.0 writes per commit group, ingest: 0.11 GB, 18.95 MB/s
      Cumulative WAL: 915K writes, 0 syncs, 915000.00 writes per sync, written: 0.11 GB, 18.95 MB/s
      Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
      Interval writes: 133K writes, 133K keys, 133K commit groups, 1.0 writes per commit group, ingest: 16.62 MB, 16.53 MB/s
      Interval WAL: 133K writes, 0 syncs, 133000.00 writes per sync, written: 0.02 GB, 16.53 MB/s
      Interval stall: 00:00:0.000 H:M:S, 0.0 percent
      num-running-compactions: 0
      num-running-flushes: 0
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9726
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35066759
      
      Pulled By: mdcallag
      
      fbshipit-source-id: c161fadd3c15c5aa715a820dab6bfedb46dc099b
      d71e5a5b
  6. 23 3月, 2022 4 次提交
    • Y
      Print information about all column families when using ldb (#9719) · 3bd150c4
      Yanqin Jin 提交于
      Summary:
      Before this PR, the following command prints only the default column
      family's information in the end:
      ```
      ldb --db=. --hex manifest_dump --verbose
      ```
      
      We should print all column families instead.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9719
      
      Test Plan:
      `make check` makes sure nothing breaks.
      
      Generate a DB, use the above command to verify all column families are
      printed.
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D34992453
      
      Pulled By: riversand963
      
      fbshipit-source-id: de1d38c4539cd89f74e1a6240ad7a6e2416bf198
      3bd150c4
    • A
      Add async_io read option in db_bench (#9735) · f07eec1b
      Akanksha Mahajan 提交于
      Summary:
      Add async_io Read option in db_bench
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9735
      
      Test Plan:
      ./db_bench -use_existing_db=true
      -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32
      -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680
      -duration=120 -ops_between_duration_checks=1 -async_io=1
      
      Reviewed By: riversand963
      
      Differential Revision: D35058482
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 1522b638c79f6d85bb7408c67f6ab76dbabeeee7
      f07eec1b
    • M
      For db_bench --benchmarks=fillseq with --num_multi_db load databases … (#9713) · 63a284a6
      Mark Callaghan 提交于
      Summary:
      …in order
      
      This fixes https://github.com/facebook/rocksdb/issues/9650
      For db_bench --benchmarks=fillseq --num_multi_db=X it loads databases in sequence
      rather than randomly choosing a database per Put. The benefits are:
      1) avoids long delays between flushing memtables
      2) avoids flushing memtables for all of them at the same point in time
      3) puts same number of keys per database so that query tests will find keys as expected
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9713
      
      Test Plan:
      Using db_bench.1 without the change and db_bench.2 with the change:
      
      for i in 1 2; do rm -rf /data/m/rx/* ; time ./db_bench.$i --db=/data/m/rx --benchmarks=fillseq --num_multi_db=4 --num=10000000; du -hs /data/m/rx ; done
      
       --- without the change
          fillseq      :       3.188 micros/op 313682 ops/sec;   34.7 MB/s
          real    2m7.787s
          user    1m52.776s
          sys     0m46.549s
          2.7G    /data/m/rx
      
       --- with the change
      
          fillseq      :       3.149 micros/op 317563 ops/sec;   35.1 MB/s
          real    2m6.196s
          user    1m51.482s
          sys     0m46.003s
          2.7G    /data/m/rx
      
          Also, temporarily added a printf to confirm that the code switches to the next database at the right time
          ZZ switch to db 1 at 10000000
          ZZ switch to db 2 at 20000000
          ZZ switch to db 3 at 30000000
      
      for i in 1 2; do rm -rf /data/m/rx/* ; time ./db_bench.$i --db=/data/m/rx --benchmarks=fillseq,readrandom --num_multi_db=4 --num=100000; du -hs /data/m/rx ; done
      
       --- without the change, smaller database, note that not all keys are found by readrandom because databases have < and > --num keys
      
          fillseq      :       3.176 micros/op 314805 ops/sec;   34.8 MB/s
          readrandom   :       1.913 micros/op 522616 ops/sec;   57.7 MB/s (99873 of 100000 found)
      
       --- with the change, smaller database, note that all keys are found by readrandom
      
          fillseq      :       3.110 micros/op 321566 ops/sec;   35.6 MB/s
          readrandom   :       1.714 micros/op 583257 ops/sec;   64.5 MB/s (100000 of 100000 found)
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35030168
      
      Pulled By: mdcallag
      
      fbshipit-source-id: 2a18c4ec571d954cf5a57b00a11802a3608823ee
      63a284a6
    • G
      Update Cache::Release param from force_erase to erase_if_last_ref (#9728) · 8102690a
      gitbw95 提交于
      Summary:
      The param name force_erase may be misleading, since the handle is erased only if it has last reference even if the param is set true.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9728
      
      Reviewed By: pdillinger
      
      Differential Revision: D35038673
      
      Pulled By: gitbw95
      
      fbshipit-source-id: 0d16d1e8fed17b97eba7fb53207119332f659a5f
      8102690a
  7. 22 3月, 2022 5 次提交
    • H
      Update HISTORY.md and version.h for 7.1 release (#9727) · b360d25d
      Hui Xiao 提交于
      Summary:
      As title
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9727
      
      Test Plan: no code change
      
      Reviewed By: ajkr
      
      Differential Revision: D35034541
      
      Pulled By: hx235
      
      fbshipit-source-id: ae839f23db1bdb9e5f787ca653a7685beb2ada68
      b360d25d
    • M
      Make mixgraph easier to use (#9711) · 1ca1562e
      Mark Callaghan 提交于
      Summary:
      Changes:
      * improves monitoring by displaying average size of a Put value and average scan length
      * forces the minimum value size to be 10. Before this it was 0 if you didn't set the distribution parameters.
      * uses reasonable defaults for the distribution parameters that determine value size and scan length
      * includes seeks in "reads ... found" message, before this they were missing
      
      This is for https://github.com/facebook/rocksdb/issues/9672
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9711
      
      Test Plan:
      Before this change:
      
      ./db_bench --benchmarks=fillseq,mixgraph --mix_get_ratio=50 --mix_put_ratio=25 --mix_seek_ratio=25 --num=100000 --value_k=0.2615 --value_sigma=25.45 --iter_k=2.517 --iter_sigma=14.236
      fillseq      :       4.289 micros/op 233138 ops/sec;   25.8 MB/s
      mixgraph     :      18.461 micros/op 54166 ops/sec;  755.0 MB/s ( Gets:50164 Puts:24919 Seek:24917 of 50164 in 75081 found)
      
      After this change:
      
      ./db_bench --benchmarks=fillseq,mixgraph --mix_get_ratio=50 --mix_put_ratio=25 --mix_seek_ratio=25 --num=100000 --value_k=0.2615 --value_sigma=25.45 --iter_k=2.517 --iter_sigma=14.236
      fillseq      :       3.974 micros/op 251553 ops/sec;   27.8 MB/s
      mixgraph     :      16.722 micros/op 59795 ops/sec;  833.5 MB/s ( Gets:50164 Puts:24919 Seek:24917, reads 75081 in 75081 found, avg size: 36.0 value, 504.9 scan)
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35030190
      
      Pulled By: mdcallag
      
      fbshipit-source-id: d8f555f28d869f752ddb674a524108884511b151
      1ca1562e
    • K
      Fix a bug in PosixClock (#9695) · cb4d188a
      KNOEEE 提交于
      Summary:
      Multiplier here should be 1e6 to get microseconds.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9695
      
      Reviewed By: ajkr
      
      Differential Revision: D34897086
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 9c1d0811ea740ba0a007edc2da199edbd000b88b
      cb4d188a
    • D
      fix a bug, c api, if enable inplace_update_support, and use create sn… (#9471) · cbe303c1
      duyuqi 提交于
      Summary:
      c api release snapshot will core dump when enable inplace_update_support and create snapshot
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9471
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D34965103
      
      Pulled By: riversand963
      
      fbshipit-source-id: c3aeeb9ea7126c2eda1466102794fecf57b6ab77
      cbe303c1
    • J
      Enable detect_stack_use_after_return for ASAN (#9714) · 661e0329
      Jay Zhuang 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9714
      
      Reviewed By: ajkr
      
      Differential Revision: D34983675
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 0252ec6ee38a0b960df4c92791c7c2bcbfba5ad8
      661e0329
  8. 21 3月, 2022 1 次提交
    • A
      Provide implementation to prefetch data asynchronously in FilePrefetchBuffer (#9674) · 49a10feb
      Akanksha Mahajan 提交于
      Summary:
      In FilePrefetchBuffer if reads are sequential, after prefetching call ReadAsync API to prefetch data asynchronously so that in next prefetching data will be available. Data prefetched asynchronously will be readahead_size/2. It uses two buffers, one for synchronous prefetching and one for asynchronous. In case, the data is overlapping, the data is copied from both buffers to third buffer to make it continuous.
      This feature is under ReadOptions::async_io and is under experimental.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9674
      
      Test Plan:
      1. Add new unit tests
      2. Run **db_stress** to make sure nothing crashes.
      
          -   Normal prefetch without `async_io` ran successfully:
      ```
      export CRASH_TEST_EXT_ARGS=" --async_io=0"
       make crash_test -j
       ```
      
      3. **Run Regressions**.
         i) Main branch without any change for normal prefetching with async_io disabled:
      
       ```
       ./db_bench -db=/tmp/prefix_scan_prefetch_main -benchmarks="fillseq" -key_size=32 -value_size=512 -num=5000000 -
                 use_direct_io_for_flush_and_compaction=true -target_file_size_base=16777216
       ```
      
      ```
      ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.0
      Date:       Thu Mar 17 13:11:34 2022
      CPU:        24 * Intel Core Processor (Broadwell)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      seekrandom   :  483618.390 micros/op 2 ops/sec;  338.9 MB/s (249 of 249 found)
      ```
      
        ii) normal prefetching after changes with async_io disable:
      
      ```
      ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_withchange -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.0
      Date:       Thu Mar 17 14:11:31 2022
      CPU:        24 * Intel Core Processor (Broadwell)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_withchange]
      seekrandom   :  471347.227 micros/op 2 ops/sec;  348.1 MB/s (255 of 255 found)
      ```
      
      Reviewed By: anand1976
      
      Differential Revision: D34731543
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 8e23aa93453d5fe3c672b9231ad582f60207937f
      49a10feb
  9. 19 3月, 2022 3 次提交
    • P
      Add manifest fix-up utility for file temperatures (#9683) · a8a422e9
      Peter Dillinger 提交于
      Summary:
      The goal of this change is to allow changes to the "current" (in
      FileSystem) file temperatures to feed back into DB metadata, so that
      they can inform decisions and stats reporting. In part because of
      modular code factoring, it doesn't seem easy to do this automagically,
      where opening an SST file and observing current Temperature different
      from expected would trigger a change in metadata and DB manifest write
      (essentially giving the deep read path access to the write path). It is also
      difficult to do this while the DB is open because of the limitations of
      LogAndApply.
      
      This change allows updating file temperature metadata on a closed DB
      using an experimental utility function UpdateManifestForFilesState()
      or `ldb update_manifest --update_temperatures`. This should suffice for
      "migration" scenarios where outside tooling has placed or re-arranged DB
      files into a (different) tiered configuration without going through
      RocksDB itself (currently, only compaction can change temperature
      metadata).
      
      Some details:
      * Refactored and added unit test for `ldb unsafe_remove_sst_file` because
      of shared functionality
      * Pulled in autovector.h changes from https://github.com/facebook/rocksdb/issues/9546 to fix SuperVersionContext
      move constructor (related to an older draft of this change)
      
      Possible follow-up work:
      * Support updating manifest with file checksums, such as when a
      new checksum function is used and want existing DB metadata updated
      for it.
      * It's possible that for some repair scenarios, lighter weight than
      full repair, we might want to support UpdateManifestForFilesState() to
      modify critical file details like size or checksum using same
      algorithm. But let's make sure these are differentiated from modifying
      file details in ways that don't suspect corruption (or require extreme
      trust).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9683
      
      Test Plan: unit tests added
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D34798828
      
      Pulled By: pdillinger
      
      fbshipit-source-id: cfd83e8fb10761d8c9e7f9c020d68c9106a95554
      a8a422e9
    • Y
      Fix assertion error by doing comparison with mutex (#9717) · b2aacaf9
      Yanqin Jin 提交于
      Summary:
      On CircleCI MacOS instances, we have been seeing the following assertion error:
      ```
      Assertion failed: (alive_log_files_tail_ == alive_log_files_.rbegin()), function WriteToWAL, file /Users/distiller/project/db/db_impl/db_impl_write.cc, line 1213.
      Received signal 6 (Abort trap: 6)
      #0   0x1
      https://github.com/facebook/rocksdb/issues/1   abort (in libsystem_c.dylib) + 120
      https://github.com/facebook/rocksdb/issues/2   err (in libsystem_c.dylib) + 0
      https://github.com/facebook/rocksdb/issues/3   rocksdb::DBImpl::WriteToWAL(rocksdb::WriteBatch const&, rocksdb::log::Writer*, unsigned long long*, unsigned long long*, rocksdb::Env::IOPriority, bool, bool) (in librocksdb.7.0.0.dylib) (db_impl_write.cc:1213)
      https://github.com/facebook/rocksdb/issues/4   rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long long*, bool, bool, unsigned long long) (in librocksdb.7.0.0.dylib) (db_impl_write.cc:1251)
      https://github.com/facebook/rocksdb/issues/5   rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long long*, unsigned long long, bool, unsigned long long*, unsigned long, rocksdb::PreReleaseCallback*) (in librocksdb.7.0.0.dylib) (db_impl_	rite.cc:421)
      https://github.com/facebook/rocksdb/issues/6   rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*) (in librocksdb.7.0.0.dylib) (db_impl_write.cc:109)
      https://github.com/facebook/rocksdb/issues/7   rocksdb::DB::Put(rocksdb::WriteOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const&) (in librocksdb.7.0.0.dylib) (db_impl_write.cc:2159)
      https://github.com/facebook/rocksdb/issues/8   rocksdb::DBImpl::Put(rocksdb::WriteOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const&) (in librocksdb.7.0.0.dylib) (db_impl_write.cc:37)
      https://github.com/facebook/rocksdb/issues/9   rocksdb::DB::Put(rocksdb::WriteOptions const&, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const&) (in librocksdb.7.0.0.dylib) (db.h:382)
      https://github.com/facebook/rocksdb/issues/10  rocksdb::DBBasicTestWithTimestampPrefixSeek_IterateWithPrefix_Test::TestBody() (in db_with_timestamp_basic_test) (db_with_timestamp_basic_test.cc:2926)
      https://github.com/facebook/rocksdb/issues/11  void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in db_with_timestamp_basic_test) (gtest-all.cc:3899)
      https://github.com/facebook/rocksdb/issues/12  void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in db_with_timestamp_basic_test) (gtest-all.cc:3935)
      https://github.com/facebook/rocksdb/issues/13  testing::Test::Run() (in db_with_timestamp_basic_test) (gtest-all.cc:3980)
      https://github.com/facebook/rocksdb/issues/14  testing::TestInfo::Run() (in db_with_timestamp_basic_test) (gtest-all.cc:4153)
      https://github.com/facebook/rocksdb/issues/15  testing::TestCase::Run() (in db_with_timestamp_basic_test) (gtest-all.cc:4266)
      https://github.com/facebook/rocksdb/issues/16  testing::internal::UnitTestImpl::RunAllTests() (in db_with_timestamp_basic_test) (gtest-all.cc:6632)
      https://github.com/facebook/rocksdb/issues/17  bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (in db_with_timestamp_basic_test) (gtest-all.cc:3899)
      https://github.com/facebook/rocksdb/issues/18  bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (in db_with_timestamp_basic_test) (gtest-all.cc:3935)
      https://github.com/facebook/rocksdb/issues/19  testing::UnitTest::Run() (in db_with_timestamp_basic_test) (gtest-all.cc:6242)
      https://github.com/facebook/rocksdb/issues/20  RUN_ALL_TESTS() (in db_with_timestamp_basic_test) (gtest.h:22110)
      https://github.com/facebook/rocksdb/issues/21  main (in db_with_timestamp_basic_test) (db_with_timestamp_basic_test.cc:3150)
      https://github.com/facebook/rocksdb/issues/22  start (in libdyld.dylib) + 1
      ```
      
      It's likely caused by concurrent, unprotected access to the deque, even though `back()` is never popped,
      and we are comparing `rbegin()` with a cached `riterator`. To be safe, do the comparison only if we have mutex.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9717
      
      Test Plan:
      One example
      Ssh to one CircleCI MacOS instance.
      ```
      gtest-parallel -r 1000 -w 8 ./db_test --gtest_filter=DBTest.FlushesInParallelWithCompactRange
      ```
      
      Reviewed By: pdillinger
      
      Differential Revision: D34990696
      
      Pulled By: riversand963
      
      fbshipit-source-id: 62dd48ae6fedbda53d0a64d73de9b948b4c26eee
      b2aacaf9
    • P
      New backup meta schema, with file temperatures (#9660) · cff0d1e8
      Peter Dillinger 提交于
      Summary:
      The primary goal of this change is to add support for backing up and
      restoring (applying on restore) file temperature metadata, without
      committing to either the DB manifest or the FS reported "current"
      temperatures being exclusive "source of truth".
      
      To achieve this goal, we need to add temperature information to backup
      metadata, which requires updated backup meta schema. Fortunately I
      prepared for this in https://github.com/facebook/rocksdb/issues/8069, which began forward compatibility in version
      6.19.0 for this kind of schema update. (Previously, backup meta schema
      was not extensible! Making this schema update public will allow some
      other "nice to have" features like taking backups with hard links, and
      avoiding crc32c checksum computation when another checksum is already
      available.) While schema version 2 is newly public, the default schema
      version is still 1. Until we change the default, users will need to set
      to 2 to enable features like temperature data backup+restore. New
      metadata like temperature information will be ignored with a warning
      in versions before this change and since 6.19.0. The metadata is
      considered ignorable because a functioning DB can be restored without
      it.
      
      Some detail:
      * Some renaming because "future schema" is now just public schema 2.
      * Initialize some atomics in TestFs (linter reported)
      * Add temperature hint support to SstFileDumper (used by BackupEngine)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9660
      
      Test Plan:
      related unit test majorly updated for the new functionality,
      including some shared testing support for tracking temperatures in a FS.
      
      Some other tests and testing hooks into production code also updated for
      making the backup meta schema change public.
      
      Reviewed By: ajkr
      
      Differential Revision: D34686968
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 3ac1fa3e67ee97ca8a5103d79cc87d872c1d862a
      cff0d1e8
  10. 18 3月, 2022 2 次提交
    • Y
      Fix race condition caused by concurrent accesses to forceMmapOff_ when opening... · 3bdbf67e
      Yanqin Jin 提交于
      Fix race condition caused by concurrent accesses to forceMmapOff_ when opening Posix WritableFile (#9685)
      
      Summary:
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9685
      
      Our TSAN reports a race condition as follows when running test
      ```
      gtest-parallel -r 100 ./external_sst_file_test --gtest_filter=ExternalSSTFileTest.MultiThreaded
      ```
      leads to the following
      
      ```
      WARNING: ThreadSanitizer: data race (pid=2683148)
        Write of size 1 at 0x556fede63340 by thread T7:
          #0 rocksdb::(anonymous namespace)::PosixFileSystem::OpenWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::FileOptions const&, bool, std::unique_ptr<rocksdb::FSWritableFile, std::default_delete<rocksdb::FSWritableFile> >*, rocksdb::IODebugContext*) internal_repo_rocksdb/repo/env/fs_posix.cc:334 (external_sst_file_test+0xb61ac4)
          #1 rocksdb::(anonymous namespace)::PosixFileSystem::ReopenWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::FileOptions const&, std::unique_ptr<rocksdb::FSWritableFile, std::default_delete<rocksdb::FSWritableFile> >*, rocksdb::IODebugContext*) internal_repo_rocksdb/repo/env/fs_posix.cc:382 (external_sst_file_test+0xb5ba96)
          #2 rocksdb::CompositeEnv::ReopenWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unique_ptr<rocksdb::WritableFile, std::default_delete<rocksdb::WritableFile> >*, rocksdb::EnvOptions const&) internal_repo_rocksdb/repo/env/composite_env.cc:334 (external_sst_file_test+0xa6ab7f)
          #3 rocksdb::EnvWrapper::ReopenWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unique_ptr<rocksdb::WritableFile, std::default_delete<rocksdb::WritableFile> >*, rocksdb::EnvOptions const&) internal_repo_rocksdb/repo/include/rocksdb/env.h:1428 (external_sst_file_test+0x561f3e)
      Previous read of size 1 at 0x556fede63340 by thread T4:
          #0 rocksdb::(anonymous namespace)::PosixFileSystem::OpenWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::FileOptions const&, bool, std::unique_ptr<rocksdb::FSWritableFile, std::default_delete<rocksdb::FSWritableFile> >*, rocksdb::IODebugContext*) internal_repo_rocksdb/repo/env/fs_posix.cc:328 (external_sst_file_test+0xb61a70)
          #1 rocksdb::(anonymous namespace)::PosixFileSystem::ReopenWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator
      ...
      ```
      
      Fix by making sure the following block gets executed only once:
      ```
            if (!checkedDiskForMmap_) {
              // this will be executed once in the program's lifetime.
              // do not use mmapWrite on non ext-3/xfs/tmpfs systems.
              if (!SupportsFastAllocate(fname)) {
                forceMmapOff_ = true;
              }
              checkedDiskForMmap_ = true;
            }
      ```
      
      Reviewed By: pdillinger
      
      Differential Revision: D34780308
      
      fbshipit-source-id: b761f66b24c8b5b8389d86ea371c8542b8d869d5
      3bdbf67e
    • J
      Deflake DeleteSchedulerTest.StartBGEmptyTrashMultipleTimes (#9706) · f0fca81f
      Jay Zhuang 提交于
      Summary:
      The designed sync point may not be hit if trash file is generated faster
      than deleting. Then the file will be deleted directly instead of waiting
      for background trash empty thread to do it.
      Increase SstFileManager Trash/DB ratio to avoid that.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9706
      
      Test Plan:
      `gtest-parallel ./delete_scheduler_test
      --gtest_filter=DeleteSchedulerTest.StartBGEmptyTrashMultipleTimes -r
      10000 -w 100`
      It was likely to happen on one of the host.
      
      Reviewed By: riversand963
      
      Differential Revision: D34964735
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: bb78015489b5f6b3f11783aae7e5853ea197702c
      f0fca81f