1. 26 5月, 2022 2 次提交
  2. 25 5月, 2022 8 次提交
    • Y
      Enable checkpoint and backup in db_stress when timestamp is enabled (#10047) · 9901e7f6
      Yanqin Jin 提交于
      Summary:
      After https://github.com/facebook/rocksdb/issues/10030 and https://github.com/facebook/rocksdb/issues/10004, we can enable checkpoint and backup in stress tests when
      user-defined timestamp is enabled.
      
      This PR has no production risk.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10047
      
      Test Plan:
      ```
      TEST_TMPDIR=/dev/shm make crash_test_with_ts
      ```
      
      Reviewed By: jowlyzhang
      
      Differential Revision: D36641565
      
      Pulled By: riversand963
      
      fbshipit-source-id: d86c9d87efcc34c32d1aa176af691d32b897644a
      9901e7f6
    • L
      Fix potential ambiguities in/around port/sys_time.h (#10045) · af7ae912
      Levi Tamasi 提交于
      Summary:
      There are some time-related POSIX APIs that are not available on Windows
      (e.g. `localtime_r`), which we have worked around by providing our own
      implementations in `port/sys_time.h`. This workaround actually relies on
      some ambiguity: on Windows, a call to `localtime_r` calls
      `ROCKSDB_NAMESPACE::port::localtime_r` (which is pulled into
      `ROCKSDB_NAMESPACE` by a using-declaration), while on other platforms
      it calls the global `localtime_r`. This works fine as long as there is only one
      candidate function; however, it breaks down when there is more than one
      `localtime_r` visible in a scope.
      
      The patch fixes this by introducing `ROCKSDB_NAMESPACE::port::{TimeVal, GetTimeOfDay, LocalTimeR}`
      to eliminate any ambiguity.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10045
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D36639372
      
      Pulled By: ltamasi
      
      fbshipit-source-id: fc13dbfa421b7c8918111a6d9e24ce77e91a7c50
      af7ae912
    • J
      Fix ApproximateOffsetOfCompressed test (#10048) · a96a4a2f
      Jay Zhuang 提交于
      Summary:
      https://github.com/facebook/rocksdb/issues/9857 introduced new an option `use_zstd_dict_trainer`, which
      is stored in SST as text, e.g.:
      ```
      ...  zstd_max_train_bytes=0; enabled=0;...
      ```
      it increased the sst size a little bit and cause
      `ApproximateOffsetOfCompressed` test to fail:
      ```
      Value 7053 is not in range [4000, 7050]
      table/table_test.cc:4019: Failure
      Value of: Between(c.ApproximateOffsetOf("xyz"), 4000, 7050)
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10048
      
      Test Plan: verified the test pass after the change
      
      Reviewed By: cbi42
      
      Differential Revision: D36643688
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: bf12d211f6ae71937259ef21b1226bd06e8da717
      a96a4a2f
    • J
      Skip ZSTD dict tests if the version doesn't support it (#10046) · 23f34c7a
      Jay Zhuang 提交于
      Summary:
      For example, the default ZSTD version for ubuntu20 is 1.4.4, which will
      fail the test `PresetCompressionDict`:
      
      ```
      db/db_test_util.cc:607: Failure
      Invalid argument: zstd finalizeDictionary cannot be used because ZSTD 1.4.5+ is not linked with the binary.
      terminate called after throwing an instance of 'testing::internal::GoogleTestFailureException'
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10046
      
      Test Plan: test pass with old zstd
      
      Reviewed By: cbi42
      
      Differential Revision: D36640067
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: b1c49fb7295f57f4515ce4eb3a52ae7d7e45da86
      23f34c7a
    • S
      Avoid malloc_usable_size() call inside LRU Cache mutex (#10026) · c78a87cd
      sdong 提交于
      Summary:
      In LRU Cache mutex, we sometimes call malloc_usable_size() to calculate memory used by the metadata object. We prevent it by saving the charge + metadata size, rather than charge, inside the metadata itself. Within the mutex, usually only total charge is needed so we don't need to repeat.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10026
      
      Test Plan: Run existing tests.
      
      Reviewed By: pdillinger
      
      Differential Revision: D36556253
      
      fbshipit-source-id: f60c96d13cde3af77732e5548e4eac4182fa9801
      c78a87cd
    • Y
      Add timestamp support to CompactedDBImpl (#10030) · d4081bf0
      Yu Zhang 提交于
      Summary:
      This PR is the second and last part for adding user defined timestamp support to read only DB. Specifically, the change in this PR includes:
      
      - `options.timestamp` respected by `CompactedDBImpl::Get` and `CompactedDBImpl::MultiGet` to return results visible up till that timestamp.
      - `CompactedDBImpl::Get(...,std::string* timestsamp)` and `CompactedDBImpl::MultiGet(std::vector<std::string>* timestamps)` return the timestamp(s) associated with the key(s).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10030
      
      Test Plan:
      ```
      $COMPILE_WITH_ASAN=1 make -j24 all
      $./db_readonly_with_timestamp_test --gtest_filter="DBReadOnlyTestWithTimestamp.CompactedDB*"
      $./db_basic_test --gtest_filter="DBBasicTest.CompactedDB*"
      $make all check
      ```
      
      Reviewed By: riversand963
      
      Differential Revision: D36613926
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: 5b7ed7fef822708c12e2caf7a8d2deb6a696f0f0
      d4081bf0
    • C
      Support read rate-limiting in SequentialFileReader (#9973) · 8515bd50
      Changyu Bi 提交于
      Summary:
      Added rate limiter and read rate-limiting support to SequentialFileReader. I've updated call sites to SequentialFileReader::Read with appropriate IO priority (or left a TODO and specified IO_TOTAL for now).
      
      The PR is separated into four commits: the first one added the rate-limiting support, but with some fixes in the unit test since the number of request bytes from rate limiter in SequentialFileReader are not accurate (there is overcharge at EOF). The second commit fixed this by allowing SequentialFileReader to check file size and determine how many bytes are left in the file to read. The third commit added benchmark related code. The fourth commit moved the logic of using file size to avoid overcharging the rate limiter into backup engine (the main user of SequentialFileReader).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9973
      
      Test Plan:
      - `make check`, backup_engine_test covers usage of SequentialFileReader with rate limiter.
      - Run db_bench to check if rate limiting is throttling as expected: Verified that reads and writes are together throttled at 2MB/s, and at 0.2MB chunks that are 100ms apart.
        - Set up: `./db_bench --benchmarks=fillrandom -db=/dev/shm/test_rocksdb`
        - Benchmark:
      ```
      strace -ttfe read,write ./db_bench --benchmarks=backup -db=/dev/shm/test_rocksdb --backup_rate_limit=2097152 --use_existing_db
      strace -ttfe read,write ./db_bench --benchmarks=restore -db=/dev/shm/test_rocksdb --restore_rate_limit=2097152 --use_existing_db
      ```
      - db bench on backup and restore to ensure no performance regression.
        - backup (avg over 50 runs): pre-change: 1.90443e+06 micros/op; post-change: 1.8993e+06 micros/op (improve by 0.2%)
        - restore (avg over 50 runs): pre-change: 1.79105e+06 micros/op; post-change: 1.78192e+06 micros/op (improve by 0.5%)
      
      ```
      # Set up
      ./db_bench --benchmarks=fillrandom -db=/tmp/test_rocksdb -num=10000000
      
      # benchmark
      TEST_TMPDIR=/tmp/test_rocksdb
      NUM_RUN=50
      for ((j=0;j<$NUM_RUN;j++))
      do
         ./db_bench -db=$TEST_TMPDIR -num=10000000 -benchmarks=backup -use_existing_db | egrep 'backup'
        # Restore
        #./db_bench -db=$TEST_TMPDIR -num=10000000 -benchmarks=restore -use_existing_db
      done > rate_limit.txt && awk -v NUM_RUN=$NUM_RUN '{sum+=$3;sum_sqrt+=$3^2}END{print sum/NUM_RUN, sqrt(sum_sqrt/NUM_RUN-(sum/NUM_RUN)^2)}' rate_limit.txt >> rate_limit_2.txt
      ```
      
      Reviewed By: hx235
      
      Differential Revision: D36327418
      
      Pulled By: cbi42
      
      fbshipit-source-id: e75d4307cff815945482df5ba630c1e88d064691
      8515bd50
    • J
      Fix failed VerifySstUniqueIds unittests (#10043) · fd24e447
      Jay Zhuang 提交于
      Summary:
      which should use UniqueId64x2 instead of string.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10043
      
      Test Plan: unittest
      
      Reviewed By: pdillinger
      
      Differential Revision: D36620366
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: cf937a1da362018472fa4396848225e48893848b
      fd24e447
  3. 24 5月, 2022 7 次提交
    • A
      Expose unix time in rocksdb::Snapshot (#9923) · 700d597b
      Akanksha Mahajan 提交于
      Summary:
      RocksDB snapshot already has a member unix_time_ set after
      snapshot is taken. It is now exposed through GetSnapshotTime() API.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9923
      
      Test Plan: Update unit tests
      
      Reviewed By: riversand963
      
      Differential Revision: D36048275
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 825210ec287deb0bc3aaa9b8e1f079f07ad686fa
      700d597b
    • A
      Fix fbcode internal build failure (#10041) · 8e9d9156
      anand76 提交于
      Summary:
      The build failed due to different namespaces for coroutines (std::experimental vs std) based on compiler version.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10041
      
      Reviewed By: ltamasi
      
      Differential Revision: D36617212
      
      Pulled By: anand1976
      
      fbshipit-source-id: dfb25320788d32969317d5651173059e2cbd8bd5
      8e9d9156
    • L
      Update version on main to 7.4 and add 7.3 to the format compatibility checks (#10038) · 253ae017
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10038
      
      Reviewed By: riversand963
      
      Differential Revision: D36604533
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 54ccd0a4b32a320b5640a658ea6846ee897065d1
      253ae017
    • A
      Fix stress test failure "Corruption: checksum mismatch" or "Iterator Diverged"... · a479c2c2
      Akanksha Mahajan 提交于
      Fix stress test failure "Corruption: checksum mismatch" or "Iterator Diverged" with async_io enabled (#10032)
      
      Summary:
      In case of non sequential reads with `async_io`, `FilePRefetchBuffer::TryReadFromCacheAsync` can be called for previous blocks with `offset < bufs_[curr_].offset_` which wasn't handled correctly resulting wrong data being returned from buffer.
      
      Since `FilePRefetchBuffer::PrefetchAsync` can be called for any data block, it sets `prev_len_` to 0  indicating `FilePRefetchBuffer::TryReadFromCacheAsync` to go for the prefetching even though offset < bufs_[curr_].offset_  This is because async prefetching is always done in second buffer (to avoid mutex) even though curr_ is empty leading to  offset < bufs_[curr_].offset_ in some cases.
      If prev_len_ is non zero then `TryReadFromCacheAsync` returns false if `offset < bufs_[curr_].offset_ && prev_len != 0` indicating reads are not sequential and previous call wasn't PrefetchAsync.
      
      -  This PR also simplifies `FilePRefetchBuffer::TryReadFromCacheAsync` as it was getting complicated covering different scenarios based on `async_io` enabled/disabled. If `for_compaction` is set true, it now calls `FilePRefetchBufferTryReadFromCache` following synchronous flow as before. Its decided in BlockFetcher.cc
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10032
      
      Test Plan:
      1.  export CRASH_TEST_EXT_ARGS=" --async_io=1"
           make crash_test -j completed successfully locally
      2. make crash_test -j completed successfully locally
      3. Reran CircleCi mini crashtest job 4 - 5 times.
      4. Updated prefetch_test for more coverage.
      
      Reviewed By: anand1976
      
      Differential Revision: D36579858
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 0c428d62b45e12e082a83acf533a5e37a584bedf
      a479c2c2
    • S
      Move three info logging within DB Mutex to use log buffer (#10029) · bea5831b
      sdong 提交于
      Summary:
      info logging with DB Mutex could potentially invoke I/O and cause performance issues. Move three of the cases to use log buffer.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10029
      
      Test Plan: Run existing tests.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36561694
      
      fbshipit-source-id: cabb93fea299001a6b4c2802fcba3fde27fa062c
      bea5831b
    • P
      Java build: finish compiling before testing (etc) (#10034) · 1e4850f6
      Peter Dillinger 提交于
      Summary:
      Lack of ordering dependencies could lead to random
      build-linux-java failures with "Truncated class file" because tests
      started before compilation was finished. (Fix to java/Makefile)
      
      Also:
      * export SHA256_CMD to save copy-paste
      * Actually fail if Java sample build fails--which it was in CircleCI
      * Don't require Snappy for Java sample build (for more compatibility)
      * Remove check_all_python from jtest because it's running in `make
      check` builds in CircleCI
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10034
      
      Test Plan: CI, some manual
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36596541
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 230d79db4b7ae93a366871ff09d0a88e8e1c8af3
      1e4850f6
    • T
      Add plugin header install in CMakeLists.txt (#10025) · cb858600
      Tom Blamer 提交于
      Summary:
      Fixes https://github.com/facebook/rocksdb/issues/9987.
      - Plugin specific headers can be specified by setting ${PLUGIN_NAME}_HEADERS in ${PLUGIN_NAME}.mk in the plugin directory.
      - This is supported by the Makefile based build, but was missing from CMakeLists.txt.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10025
      
      Test Plan:
      - Add a plugin with ${PLUGIN_NAME}_HEADERS set in both ${PLUGIN_NAME}.mk and CMakeLists.txt
      - Run Makefile based install and CMake based install and verify installed headers match
      
      Reviewed By: riversand963
      
      Differential Revision: D36584908
      
      Pulled By: ajkr
      
      fbshipit-source-id: 5ea0137205ccbf0d36faacf45d712c5604065bb5
      cb858600
  4. 23 5月, 2022 1 次提交
    • A
      Minimum macOS version needed to build v7.2.2 and up is 10.13 (#9976) · 56ce3aef
      Adam Retter 提交于
      Summary:
      Some C++ code changes between version 7.1.2 and 7.2.2 now seem to require at least macOS 10.13 (2017) to build successfully, previously we needed 10.12 (2016). I haven't been able to identify the exact commit.
      
      **NOTE**: This needs to be merged to both `main` and `7.2.fb` branches.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9976
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36303226
      
      Pulled By: ajkr
      
      fbshipit-source-id: 589ce3ecf821db3402b0876e76d37b407896c945
      56ce3aef
  5. 21 5月, 2022 7 次提交
    • L
      Update HISTORY for 7.3 release (#10031) · bed40e72
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10031
      
      Reviewed By: riversand963
      
      Differential Revision: D36567741
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 058f8cc856d276db6e1aed07a89ac0b7118c4435
      bed40e72
    • Y
      Point libprotobuf-mutator to the latest verified commit hash (#10028) · 899db56a
      Yanqin Jin 提交于
      Summary:
      Recent updates to https://github.com/google/libprotobuf-mutator has caused link errors for RocksDB
      CircleCI job 'build-fuzzers'. This PR points the CI to a specific, most recent verified commit hash.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10028
      
      Test Plan: watch for CI to finish.
      
      Reviewed By: pdillinger, jay-zhuang
      
      Differential Revision: D36562517
      
      Pulled By: riversand963
      
      fbshipit-source-id: ba5ef0f9ed6ea6a75aa5dd2768bd5f389ac14f46
      899db56a
    • Y
      Fix a bug of not setting enforce_single_del_contracts (#10027) · f648915b
      Yanqin Jin 提交于
      Summary:
      Before this PR, BuildDBOptions() does not set a newly-added option, i.e.
      enforce_single_del_contracts, causing OPTIONS files to contain incorrect
      information.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10027
      
      Test Plan:
      make check
      Manually check OPTIONS file.
      
      Reviewed By: ltamasi
      
      Differential Revision: D36556125
      
      Pulled By: riversand963
      
      fbshipit-source-id: e1074715b22c328b68c19e9ad89aa5d67d864bb5
      f648915b
    • A
      Seek parallelization (#9994) · 2db6a4a1
      Akanksha Mahajan 提交于
      Summary:
      The RocksDB iterator is a hierarchy of iterators. MergingIterator maintains a heap of LevelIterators, one for each L0 file and for each non-zero level. The Seek() operation naturally lends itself to parallelization, as it involves positioning every LevelIterator on the correct data block in the correct SST file. It lookups a level for a target key, to find the first key that's >= the target key. This typically involves reading one data block that is likely to contain the target key, and scan forward to find the first valid key. The forward scan may read more data blocks. In order to find the right data block, the iterator may read some metadata blocks (required for opening a file and searching the index).
      This flow can be parallelized.
      
      Design: Seek will be called two times under async_io option. First seek will send asynchronous request to prefetch the data blocks at each level and second seek will follow the normal flow and in FilePrefetchBuffer::TryReadFromCacheAsync it will wait for the Poll() to get the results and add the iterator to min_heap.
      - Status::TryAgain is passed down from FilePrefetchBuffer::PrefetchAsync to block_iter_.Status indicating asynchronous request has been submitted.
      - If for some reason asynchronous request returns error in submitting the request, it will fallback to sequential reading of blocks in one pass.
      - If the data already exists in prefetch_buffer, it will return the data without prefetching further and it will be treated as single pass of seek.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9994
      
      Test Plan:
      - **Run Regressions.**
      ```
      ./db_bench -db=/tmp/prefix_scan_prefetch_main -benchmarks="fillseq" -key_size=32 -value_size=512 -num=5000000 -use_direct_io_for_flush_and_compaction=true -target_file_size_base=16777216
      ```
      i) Previous release 7.0 run for normal prefetching with async_io disabled:
      ```
      ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.0
      Date:       Thu Mar 17 13:11:34 2022
      CPU:        24 * Intel Core Processor (Broadwell)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      seekrandom   :  483618.390 micros/op 2 ops/sec;  338.9 MB/s (249 of 249 found)
      ```
      
      ii) normal prefetching after changes with async_io disable:
      ```
      ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
      Set seed to 1652922591315307 because --seed was 0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.3
      Date:       Wed May 18 18:09:51 2022
      CPU:        32 * Intel Xeon Processor (Skylake)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      seekrandom   :  483080.466 micros/op 2 ops/sec 120.287 seconds 249 operations;  340.8 MB/s (249 of 249 found)
      ```
      iii) db_bench with async_io enabled completed succesfully
      
      ```
      ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 -async_io=1 -adaptive_readahead=1
      Set seed to 1652924062021732 because --seed was 0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.3
      Date:       Wed May 18 18:34:22 2022
      CPU:        32 * Intel Xeon Processor (Skylake)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      seekrandom   :  553913.576 micros/op 1 ops/sec 120.199 seconds 217 operations;  293.6 MB/s (217 of 217 found)
      ```
      
      - db_stress with async_io disabled completed succesfully
      ```
       export CRASH_TEST_EXT_ARGS=" --async_io=0"
       make crash_test -j
      ```
      
      I**n Progress**: db_stress with async_io is failing and working on debugging/fixing it.
      
      Reviewed By: anand1976
      
      Differential Revision: D36459323
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: abb1cd944abe712bae3986ae5b16704b3338917c
      2db6a4a1
    • A
      Fix crash due to MultiGet async IO and direct IO (#10024) · e015206d
      anand76 提交于
      Summary:
      MultiGet with async IO is not officially supported with Posix yet. Avoid a crash by using synchronous MultiRead when direct IO is enabled.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10024
      
      Test Plan: Run db_crashtest.py manually
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D36551053
      
      Pulled By: anand1976
      
      fbshipit-source-id: 72190418fa92dd0397e87825df618b12c9bdecda
      e015206d
    • C
      Support using ZDICT_finalizeDictionary to generate zstd dictionary (#9857) · cc23b46d
      Changyu Bi 提交于
      Summary:
      An untrained dictionary is currently simply the concatenation of several samples. The ZSTD API, ZDICT_finalizeDictionary(), can improve such a dictionary's effectiveness at low cost. This PR changes how dictionary is created by calling the ZSTD ZDICT_finalizeDictionary() API instead of creating raw content dictionary (when max_dict_buffer_bytes > 0), and pass in all buffered uncompressed data blocks as samples.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9857
      
      Test Plan:
      #### db_bench test for cpu/memory of compression+decompression and space saving on synthetic data:
      Set up: change the parameter [here](https://github.com/facebook/rocksdb/blob/fb9a167a55e0970b1ef6f67c1600c8d9c4c6114f/tools/db_bench_tool.cc#L1766) to 16384 to make synthetic data more compressible.
      ```
      # linked local ZSTD with version 1.5.2
      # DEBUG_LEVEL=0 ROCKSDB_NO_FBCODE=1 ROCKSDB_DISABLE_ZSTD=1  EXTRA_CXXFLAGS="-DZSTD_STATIC_LINKING_ONLY -DZSTD -I/data/users/changyubi/install/include/" EXTRA_LDFLAGS="-L/data/users/changyubi/install/lib/ -l:libzstd.a" make -j32 db_bench
      
      dict_bytes=16384
      train_bytes=1048576
      echo "========== No Dictionary =========="
      TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=0 -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
      TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=0 -block_size=4096 2>&1 | grep elapsed
      du -hc /dev/shm/dbbench/*sst | grep total
      
      echo "========== Raw Content Dictionary =========="
      TEST_TMPDIR=/dev/shm ./db_bench_main -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
      TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench_main -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -block_size=4096 2>&1 | grep elapsed
      du -hc /dev/shm/dbbench/*sst | grep total
      
      echo "========== FinalizeDictionary =========="
      TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
      TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false -block_size=4096 2>&1 | grep elapsed
      du -hc /dev/shm/dbbench/*sst | grep total
      
      echo "========== TrainDictionary =========="
      TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
      TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -block_size=4096 2>&1 | grep elapsed
      du -hc /dev/shm/dbbench/*sst | grep total
      
      # Result: TrainDictionary is much better on space saving, but FinalizeDictionary seems to use less memory.
      # before compression data size: 1.2GB
      dict_bytes=16384
      max_dict_buffer_bytes =  1048576
                          space   cpu/memory
      No Dictionary       468M    14.93user 1.00system 0:15.92elapsed 100%CPU (0avgtext+0avgdata 23904maxresident)k
      Raw Dictionary      251M    15.81user 0.80system 0:16.56elapsed 100%CPU (0avgtext+0avgdata 156808maxresident)k
      FinalizeDictionary  236M    11.93user 0.64system 0:12.56elapsed 100%CPU (0avgtext+0avgdata 89548maxresident)k
      TrainDictionary     84M     7.29user 0.45system 0:07.75elapsed 100%CPU (0avgtext+0avgdata 97288maxresident)k
      ```
      
      #### Benchmark on 10 sample SST files for spacing saving and CPU time on compression:
      FinalizeDictionary is comparable to TrainDictionary in terms of space saving, and takes less time in compression.
      ```
      dict_bytes=16384
      train_bytes=1048576
      
      for sst_file in `ls ../temp/myrock-sst/`
      do
        echo "********** $sst_file **********"
        echo "========== No Dictionary =========="
        ./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD
      
        echo "========== Raw Content Dictionary =========="
        ./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes
      
        echo "========== FinalizeDictionary =========="
        ./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes --compression_zstd_max_train_bytes=$train_bytes --compression_use_zstd_finalize_dict
      
        echo "========== TrainDictionary =========="
        ./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes --compression_zstd_max_train_bytes=$train_bytes
      done
      
                               010240.sst (Size/Time) 011029.sst              013184.sst              021552.sst              185054.sst              185137.sst              191666.sst              7560381.sst             7604174.sst             7635312.sst
      No Dictionary           28165569 / 2614419      32899411 / 2976832      32977848 / 3055542      31966329 / 2004590      33614351 / 1755877      33429029 / 1717042      33611933 / 1776936      33634045 / 2771417      33789721 / 2205414      33592194 / 388254
      Raw Content Dictionary  28019950 / 2697961      33748665 / 3572422      33896373 / 3534701      26418431 / 2259658      28560825 / 1839168      28455030 / 1846039      28494319 / 1861349      32391599 / 3095649      33772142 / 2407843      33592230 / 474523
      FinalizeDictionary      27896012 / 2650029      33763886 / 3719427      33904283 / 3552793      26008225 / 2198033      28111872 / 1869530      28014374 / 1789771      28047706 / 1848300      32296254 / 3204027      33698698 / 2381468      33592344 / 517433
      TrainDictionary         28046089 / 2740037      33706480 / 3679019      33885741 / 3629351      25087123 / 2204558      27194353 / 1970207      27234229 / 1896811      27166710 / 1903119      32011041 / 3322315      32730692 / 2406146      33608631 / 570593
      ```
      
      #### Decompression/Read test:
      With FinalizeDictionary/TrainDictionary, some data structure used for decompression are in stored in dictionary, so they are expected to be faster in terms of decompression/reads.
      ```
      dict_bytes=16384
      train_bytes=1048576
      echo "No Dictionary"
      TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=0 > /dev/null 2>&1
      TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=0 2>&1 | grep MB/s
      
      echo "Raw Dictionary"
      TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes > /dev/null 2>&1
      TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd  -compression_max_dict_bytes=$dict_bytes 2>&1 | grep MB/s
      
      echo "FinalizeDict"
      TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false  > /dev/null 2>&1
      TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false 2>&1 | grep MB/s
      
      echo "Train Dictionary"
      TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes > /dev/null 2>&1
      TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes 2>&1 | grep MB/s
      
      No Dictionary
      readrandom   :      12.183 micros/op 82082 ops/sec 12.183 seconds 1000000 operations;    9.1 MB/s (1000000 of 1000000 found)
      Raw Dictionary
      readrandom   :      12.314 micros/op 81205 ops/sec 12.314 seconds 1000000 operations;    9.0 MB/s (1000000 of 1000000 found)
      FinalizeDict
      readrandom   :       9.787 micros/op 102180 ops/sec 9.787 seconds 1000000 operations;   11.3 MB/s (1000000 of 1000000 found)
      Train Dictionary
      readrandom   :       9.698 micros/op 103108 ops/sec 9.699 seconds 1000000 operations;   11.4 MB/s (1000000 of 1000000 found)
      ```
      
      Reviewed By: ajkr
      
      Differential Revision: D35720026
      
      Pulled By: cbi42
      
      fbshipit-source-id: 24d230fdff0fd28a1bb650658798f00dfcfb2a1f
      cc23b46d
    • D
      Bump nokogiri from 1.13.4 to 1.13.6 in /docs (#10019) · 6255ac72
      dependabot[bot] 提交于
      Summary:
      Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.13.4 to 1.13.6.
      <details>
      <summary>Release notes</summary>
      <p><em>Sourced from <a href="https://github.com/sparklemotion/nokogiri/releases">nokogiri's releases</a>.</em></p>
      <blockquote>
      <h2>1.13.6 / 2022-05-08</h2>
      <h3>Security</h3>
      <ul>
      <li>[CRuby] Address <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-29181">CVE-2022-29181</a>, improper handling of unexpected data types, related to untrusted inputs to the SAX parsers. See <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-xh29-r2w5-wx8m">GHSA-xh29-r2w5-wx8m</a> for more information.</li>
      </ul>
      <h3>Improvements</h3>
      <ul>
      <li><code>{HTML4,XML}::SAX::{Parser,ParserContext}</code> constructor methods now raise <code>TypeError</code> instead of segfaulting when an incorrect type is passed.</li>
      </ul>
      <hr />
      <p>sha256:</p>
      <pre><code>58417c7c10f78cd1c0e1984f81538300d4ea98962cfd3f46f725efee48f9757a  nokogiri-1.13.6-aarch64-linux.gem
      a2b04ec3b1b73ecc6fac619b41e9fdc70808b7a653b96ec97d04b7a23f158dbc  nokogiri-1.13.6-arm64-darwin.gem
      4437f2d03bc7da8854f4aaae89e24a98cf5c8b0212ae2bc003af7e65c7ee8e27  nokogiri-1.13.6-java.gem
      99d3e212bbd5e80aa602a1f52d583e4f6e917ec594e6aa580f6aacc253eff984  nokogiri-1.13.6-x64-mingw-ucrt.gem
      a04f6154a75b6ed4fe2d0d0ff3ac02f094b54e150b50330448f834fa5726fbba  nokogiri-1.13.6-x64-mingw32.gem
      a13f30c2863ef9e5e11240dd6d69ef114229d471018b44f2ff60bab28327de4d  nokogiri-1.13.6-x86-linux.gem
      63a2ca2f7a4f6bd9126e1695037f66c8eb72ed1e1740ef162b4480c57cc17dc6  nokogiri-1.13.6-x86-mingw32.gem
      2b266e0eb18030763277b30dc3d64337f440191e2bd157027441ac56a59d9dfe  nokogiri-1.13.6-x86_64-darwin.gem
      3fa37b0c3b5744af45f9da3e4ae9cbd89480b35e12ae36b5e87a0452e0b38335  nokogiri-1.13.6-x86_64-linux.gem
      b1512fdc0aba446e1ee30de3e0671518eb363e75fab53486e99e8891d44b8587  nokogiri-1.13.6.gem
      </code></pre>
      <h2>1.13.5 / 2022-05-04</h2>
      <h3>Security</h3>
      <ul>
      <li>[CRuby] Vendored libxml2 is updated to address <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-29824">CVE-2022-29824</a>. See <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-cgx6-hpwq-fhv5">GHSA-cgx6-hpwq-fhv5</a> for more information.</li>
      </ul>
      <h3>Dependencies</h3>
      <ul>
      <li>[CRuby] Vendored libxml2 is updated from v2.9.13 to <a href="https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.9.14">v2.9.14</a>.</li>
      </ul>
      <h3>Improvements</h3>
      <ul>
      <li>[CRuby] The libxml2 HTML4 parser no longer exhibits quadratic behavior when recovering some broken markup related to start-of-tag and bare <code>&lt;</code> characters.</li>
      </ul>
      <h3>Changed</h3>
      <ul>
      <li>[CRuby] The libxml2 HTML4 parser in v2.9.14 recovers from some broken markup differently. Notably, the XML CDATA escape sequence <code>&lt;![CDATA[</code> and incorrectly-opened comments will result in HTML text nodes starting with <code>&amp;lt;!</code> instead of skipping the invalid tag. This behavior is a direct result of the <a href="https://gitlab.gnome.org/GNOME/libxml2/-/commit/798bdf1">quadratic-behavior fix</a> noted above. The behavior of downstream sanitizers relying on this behavior will also change. Some tests describing the changed behavior are in <a href="https://github.com/sparklemotion/nokogiri/blob/3ed5bf2b5a367cb9dc6e329c5a1c512e1dd4565d/test/html4/test_comments.rb#L187-L204"><code>test/html4/test_comments.rb</code></a>.</li>
      </ul>
      
      </blockquote>
      <p>... (truncated)</p>
      </details>
      <details>
      <summary>Changelog</summary>
      <p><em>Sourced from <a href="https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md">nokogiri's changelog</a>.</em></p>
      <blockquote>
      <h2>1.13.6 / 2022-05-08</h2>
      <h3>Security</h3>
      <ul>
      <li>[CRuby] Address <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-29181">CVE-2022-29181</a>, improper handling of unexpected data types, related to untrusted inputs to the SAX parsers. See <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-xh29-r2w5-wx8m">GHSA-xh29-r2w5-wx8m</a> for more information.</li>
      </ul>
      <h3>Improvements</h3>
      <ul>
      <li><code>{HTML4,XML}::SAX::{Parser,ParserContext}</code> constructor methods now raise <code>TypeError</code> instead of segfaulting when an incorrect type is passed.</li>
      </ul>
      <h2>1.13.5 / 2022-05-04</h2>
      <h3>Security</h3>
      <ul>
      <li>[CRuby] Vendored libxml2 is updated to address <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-29824">CVE-2022-29824</a>. See <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-cgx6-hpwq-fhv5">GHSA-cgx6-hpwq-fhv5</a> for more information.</li>
      </ul>
      <h3>Dependencies</h3>
      <ul>
      <li>[CRuby] Vendored libxml2 is updated from v2.9.13 to <a href="https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.9.14">v2.9.14</a>.</li>
      </ul>
      <h3>Improvements</h3>
      <ul>
      <li>[CRuby] The libxml2 HTML parser no longer exhibits quadratic behavior when recovering some broken markup related to start-of-tag and bare <code>&lt;</code> characters.</li>
      </ul>
      <h3>Changed</h3>
      <ul>
      <li>[CRuby] The libxml2 HTML parser in v2.9.14 recovers from some broken markup differently. Notably, the XML CDATA escape sequence <code>&lt;![CDATA[</code> and incorrectly-opened comments will result in HTML text nodes starting with <code>&amp;lt;!</code> instead of skipping the invalid tag. This behavior is a direct result of the <a href="https://gitlab.gnome.org/GNOME/libxml2/-/commit/798bdf1">quadratic-behavior fix</a> noted above. The behavior of downstream sanitizers relying on this behavior will also change. Some tests describing the changed behavior are in <a href="https://github.com/sparklemotion/nokogiri/blob/3ed5bf2b5a367cb9dc6e329c5a1c512e1dd4565d/test/html4/test_comments.rb#L187-L204"><code>test/html4/test_comments.rb</code></a>.</li>
      </ul>
      </blockquote>
      </details>
      <details>
      <summary>Commits</summary>
      <ul>
      <li><a href="https://github.com/sparklemotion/nokogiri/commit/b7817b6a62ac210203a451d1a691a824288e9eab"><code>b7817b6</code></a> version bump to v1.13.6</li>
      <li><a href="https://github.com/sparklemotion/nokogiri/commit/61b1a395cd512af2e0595a8e369465415e574fe8"><code>61b1a39</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2530">https://github.com/facebook/rocksdb/issues/2530</a> from sparklemotion/flavorjones-check-parse-memory-ty...</li>
      <li><a href="https://github.com/sparklemotion/nokogiri/commit/83cc451c3f29df397caa890afc3b714eae6ab8f7"><code>83cc451</code></a> fix: {HTML4,XML}::SAX::{Parser,ParserContext} check arg types</li>
      <li><a href="https://github.com/sparklemotion/nokogiri/commit/22c9e5b300c27a377fdde37c17eb9d07dd7322d0"><code>22c9e5b</code></a> version bump to v1.13.5</li>
      <li><a href="https://github.com/sparklemotion/nokogiri/commit/615588192572f7cfcb43eabbb070a6e07bf9e731"><code>6155881</code></a> doc: update CHANGELOG for v1.13.5</li>
      <li><a href="https://github.com/sparklemotion/nokogiri/commit/c519a47ab11f5e8fce77328fcb01a7b3befc2b9e"><code>c519a47</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2527">https://github.com/facebook/rocksdb/issues/2527</a> from sparklemotion/2525-update-libxml-2_9_14-v1_13_x</li>
      <li><a href="https://github.com/sparklemotion/nokogiri/commit/66c2886e78f6801def83a549c3e6581ac48e61e8"><code>66c2886</code></a> dep: update libxml2 to v2.9.14</li>
      <li><a href="https://github.com/sparklemotion/nokogiri/commit/b7c4cc35de38fcfdde4da1203d79ae38bc4324bf"><code>b7c4cc3</code></a> test: unpend the LIBXML_LOADED_VERSION test on freebsd</li>
      <li><a href="https://github.com/sparklemotion/nokogiri/commit/eac793487183a5e72464e53cccd260971d5f29b5"><code>eac7934</code></a> dev: require yaml</li>
      <li><a href="https://github.com/sparklemotion/nokogiri/commit/f3521ba3d38922d76dd5ed59705eab3988213712"><code>f3521ba</code></a> style(rubocop): pend Style/FetchEnvVar for now</li>
      <li>Additional commits viewable in <a href="https://github.com/sparklemotion/nokogiri/compare/v1.13.4...v1.13.6">compare view</a></li>
      </ul>
      </details>
      <br />
      
      [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=nokogiri&package-manager=bundler&previous-version=1.13.4&new-version=1.13.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
      
      Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`.
      
      [//]: # (dependabot-automerge-start)
      [//]: # (dependabot-automerge-end)
      
       ---
      
      <details>
      <summary>Dependabot commands and options</summary>
      <br />
      
      You can trigger Dependabot actions by commenting on this PR:
      - `dependabot rebase` will rebase this PR
      - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
      - `dependabot merge` will merge this PR after your CI passes on it
      - `dependabot squash and merge` will squash and merge this PR after your CI passes on it
      - `dependabot cancel merge` will cancel a previously requested merge and block automerging
      - `dependabot reopen` will reopen this PR if it is closed
      - `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
      - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
      - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
      - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
      - `dependabot use these labels` will set the current labels as the default for future PRs for this repo and language
      - `dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language
      - `dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language
      - `dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language
      
      You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/facebook/rocksdb/network/alerts).
      
      </details>
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10019
      
      Reviewed By: riversand963
      
      Differential Revision: D36536897
      
      Pulled By: ajkr
      
      fbshipit-source-id: 368c24e86d5d39f0a3adc08a397ae074b1b18b1a
      6255ac72
  6. 20 5月, 2022 5 次提交
    • Y
      Add timestamp support to DBImplReadOnly (#10004) · 16bdb1f9
      Yu Zhang 提交于
      Summary:
      This PR adds timestamp support to a read only DB instance opened as `DBImplReadOnly`. A follow up PR will add the same support to `CompactedDBImpl`.
      
       With this, read only database has these timestamp related APIs:
      
      `ReadOptions.timestamp` : read should return the latest data visible to this specified timestamp
      `Iterator::timestamp()` : returns the timestamp associated with the key, value
      `DB:Get(..., std::string* timestamp)` : returns the timestamp associated with the key, value in `timestamp`
      
      Test plan (on devserver):
      
      ```
      $COMPILE_WITH_ASAN=1 make -j24 all
      $./db_with_timestamp_basic_test --gtest_filter=DBBasicTestWithTimestamp.ReadOnlyDB*
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10004
      
      Reviewed By: riversand963
      
      Differential Revision: D36434422
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: 5d949e65b1ffb845758000e2b310fdd4aae71cfb
      16bdb1f9
    • A
      Multi file concurrency in MultiGet using coroutines and async IO (#9968) · 57997dda
      anand76 提交于
      Summary:
      This PR implements a coroutine version of batched MultiGet in order to concurrently read from multiple SST files in a level using async IO, thus reducing the latency of the MultiGet. The API from the user perspective is still synchronous and single threaded, with the RocksDB part of the processing happening in the context of the caller's thread. In Version::MultiGet, the decision is made whether to call synchronous or coroutine code.
      
      A good way to review this PR is to review the first 4 commits in order - de773b3, 70c2f70, 10b50e1, and 377a597 - before reviewing the rest.
      
      TODO:
      1. Figure out how to build it in CircleCI (requires some dependencies to be installed)
      2. Do some stress testing with coroutines enabled
      
      No regression in synchronous MultiGet between this branch and main -
      ```
      ./db_bench -use_existing_db=true --db=/data/mysql/rocksdb/prefix_scan -benchmarks="readseq,multireadrandom" -key_size=32 -value_size=512 -num=5000000 -batch_size=64 -multiread_batched=true -use_direct_reads=false -duration=60 -ops_between_duration_checks=1 -readonly=true -adaptive_readahead=true -threads=16 -cache_size=10485760000 -async_io=false -multiread_stride=40000 -statistics
      ```
      Branch - ```multireadrandom :       4.025 micros/op 3975111 ops/sec 60.001 seconds 238509056 operations; 2062.3 MB/s (14767808 of 14767808 found)```
      
      Main - ```multireadrandom :       3.987 micros/op 4013216 ops/sec 60.001 seconds 240795392 operations; 2082.1 MB/s (15231040 of 15231040 found)```
      
      More benchmarks in various scenarios are given below. The measurements were taken with ```async_io=false``` (no coroutines) and ```async_io=true``` (use coroutines). For an IO bound workload (with every key requiring an IO), the coroutines version shows a clear benefit, being ~2.6X faster. For CPU bound workloads, the coroutines version has ~6-15% higher CPU utilization, depending on how many keys overlap an SST file.
      
      1. Single thread IO bound workload on remote storage with sparse MultiGet batch keys (~1 key overlap/file) -
      No coroutines - ```multireadrandom :     831.774 micros/op 1202 ops/sec 60.001 seconds 72136 operations;    0.6 MB/s (72136 of 72136 found)```
      Using coroutines - ```multireadrandom :     318.742 micros/op 3137 ops/sec 60.003 seconds 188248 operations;    1.6 MB/s (188248 of 188248 found)```
      
      2. Single thread CPU bound workload (all data cached) with ~1 key overlap/file -
      No coroutines - ```multireadrandom :       4.127 micros/op 242322 ops/sec 60.000 seconds 14539384 operations;  125.7 MB/s (14539384 of 14539384 found)```
      Using coroutines - ```multireadrandom :       4.741 micros/op 210935 ops/sec 60.000 seconds 12656176 operations;  109.4 MB/s (12656176 of 12656176 found)```
      
      3. Single thread CPU bound workload with ~2 key overlap/file -
      No coroutines - ```multireadrandom :       3.717 micros/op 269000 ops/sec 60.000 seconds 16140024 operations;  139.6 MB/s (16140024 of 16140024 found)```
      Using coroutines - ```multireadrandom :       4.146 micros/op 241204 ops/sec 60.000 seconds 14472296 operations;  125.1 MB/s (14472296 of 14472296 found)```
      
      4. CPU bound multi-threaded (16 threads) with ~4 key overlap/file -
      No coroutines - ```multireadrandom :       4.534 micros/op 3528792 ops/sec 60.000 seconds 211728728 operations; 1830.7 MB/s (12737024 of 12737024 found) ```
      Using coroutines - ```multireadrandom :       4.872 micros/op 3283812 ops/sec 60.000 seconds 197030096 operations; 1703.6 MB/s (12548032 of 12548032 found) ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9968
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D36348563
      
      Pulled By: anand1976
      
      fbshipit-source-id: c0ce85a505fd26ebfbb09786cbd7f25202038696
      57997dda
    • B
      Address comments for PR #9988 and #9996 (#10020) · 5be1579e
      Bo Wang 提交于
      Summary:
      1. The latest change of DecideRateLimiterPriority in https://github.com/facebook/rocksdb/pull/9988 is reverted.
      2. For https://github.com/facebook/rocksdb/blob/main/db/builder.cc#L345-L349
        2.1. Remove `we will regrad this verification as user reads` from the comments.
        2.2. `Do not set` the read_options.rate_limiter_priority to Env::IO_USER . Flush should be a background job.
        2.3. Update db_rate_limiter_test.cc.
      3. In IOOptions, mark `prio` as deprecated for future removal.
      4. In `file_system.h`, mark `IOPriority` as deprecated for future removal.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10020
      
      Test Plan: Unit tests.
      
      Reviewed By: ajkr
      
      Differential Revision: D36525317
      
      Pulled By: gitbw95
      
      fbshipit-source-id: 011ba421822f8a124e6d25a2661c4e242df6ad36
      5be1579e
    • P
      Fix auto_prefix_mode performance with partitioned filters (#10012) · 280b9f37
      Peter Dillinger 提交于
      Summary:
      Essentially refactored the RangeMayExist implementation in
      FullFilterBlockReader to FilterBlockReaderCommon so that it applies to
      partitioned filters as well. (The function is not called for the
      block-based filter case.) RangeMayExist is essentially a series of checks
      around a possible PrefixMayExist, and I'm confident those checks should
      be the same for partitioned as for full filters. (I think it's likely
      that bugs remain in those checks, but this change is overall a simplifying
      one.)
      
      Added auto_prefix_mode support to db_bench
      
      Other small fixes as well
      
      Fixes https://github.com/facebook/rocksdb/issues/10003
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10012
      
      Test Plan:
      Expanded unit test that uses statistics to check for filter
      optimization, fails without the production code changes here
      
      Performance: populate two DBs with
      ```
      TEST_TMPDIR=/dev/shm/rocksdb_nonpartitioned ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8
      TEST_TMPDIR=/dev/shm/rocksdb_partitioned ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 -partition_index_and_filters
      ```
      
      Observe no measurable change in non-partitioned performance
      ```
      TEST_TMPDIR=/dev/shm/rocksdb_nonpartitioned ./db_bench -benchmarks=seekrandom[-X1000] -num=10000000 -readonly -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 -auto_prefix_mode -cache_index_and_filter_blocks=1 -cache_size=1000000000 -duration 20
      ```
      Before: seekrandom [AVG 15 runs] : 11798 (± 331) ops/sec
      After: seekrandom [AVG 15 runs] : 11724 (± 315) ops/sec
      
      Observe big improvement with partitioned (also supported by bloom use statistics)
      ```
      TEST_TMPDIR=/dev/shm/rocksdb_partitioned ./db_bench -benchmarks=seekrandom[-X1000] -num=10000000 -readonly -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 -partition_index_and_filters -auto_prefix_mode -cache_index_and_filter_blocks=1 -cache_size=1000000000 -duration 20
      ```
      Before: seekrandom [AVG 12 runs] : 2942 (± 57) ops/sec
      After: seekrandom [AVG 12 runs] : 7489 (± 184) ops/sec
      
      Reviewed By: siying
      
      Differential Revision: D36469796
      
      Pulled By: pdillinger
      
      fbshipit-source-id: bcf1e2a68d347b32adb2b27384f945434e7a266d
      280b9f37
    • J
      Track SST unique id in MANIFEST and verify (#9990) · c6d326d3
      Jay Zhuang 提交于
      Summary:
      Start tracking SST unique id in MANIFEST, which is used to verify with
      SST properties to make sure the SST file is not overwritten or
      misplaced. A DB option `try_verify_sst_unique_id` is introduced to
      enable/disable the verification, if enabled, it opens all SST files
      during DB-open to read the unique_id from table properties (default is
      false), so it's recommended to use it with `max_open_files = -1` to
      pre-open the files.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9990
      
      Test Plan: unittests, format-compatible test, mini-crash
      
      Reviewed By: anand1976
      
      Differential Revision: D36381863
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 89ea2eb6b35ed3e80ead9c724eb096083eaba63f
      c6d326d3
  7. 19 5月, 2022 6 次提交
    • H
      Mark old reserve* option deprecated (#10016) · dde774db
      Hui Xiao 提交于
      Summary:
      **Context/Summary:**
      https://github.com/facebook/rocksdb/pull/9926 removed inefficient `reserve*` option API but forgot to mark them deprecated in `block_based_table_type_info` for compatible table format.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10016
      
      Test Plan: build-format-compatible
      
      Reviewed By: pdillinger
      
      Differential Revision: D36484247
      
      Pulled By: hx235
      
      fbshipit-source-id: c41b90cc99fb7ab7098934052f0af7290b221f98
      dde774db
    • G
      Set Read rate limiter priority dynamically and pass it to FS (#9996) · 4da34b97
      gitbw95 提交于
      Summary:
      ### Context:
      Background compactions and flush generate large reads and writes, and can be long running, especially for universal compaction. In some cases, this can impact foreground reads and writes by users.
      
      ### Solution
      User, Flush, and Compaction reads share some code path. For this task, we update the rate_limiter_priority in ReadOptions for code paths (e.g. FindTable (mainly in BlockBasedTable::Open()) and various iterators), and eventually update the rate_limiter_priority in IOOptions for FSRandomAccessFile.
      
      **This PR is for the Read path.** The **Read:** dynamic priority for different state are listed as follows:
      
      | State | Normal | Delayed | Stalled |
      | ----- | ------ | ------- | ------- |
      |  Flush (verification read in BuildTable()) | IO_USER | IO_USER | IO_USER |
      |  Compaction | IO_LOW  | IO_USER | IO_USER |
      |  User | User provided | User provided | User provided |
      
      We will respect the read_options that the user provided and will not set it.
      The only sst read for Flush is the verification read in BuildTable(). It claims to be "regard as user read".
      
      **Details**
      1. Set read_options.rate_limiter_priority dynamically:
      - User: Do not update the read_options. Use the read_options that the user provided.
      - Compaction: Update read_options in CompactionJob::ProcessKeyValueCompaction().
      - Flush: Update read_options in BuildTable().
      
      2. Pass the rate limiter priority to FSRandomAccessFile functions:
      - After calling the FindTable(), read_options is passed through GetTableReader(table_cache.cc), BlockBasedTableFactory::NewTableReader(block_based_table_factory.cc), and BlockBasedTable::Open(). The Open() needs some updates for the ReadOptions variable and the updates are also needed for the called functions,  including PrefetchTail(), PrepareIOOptions(), ReadFooterFromFile(), ReadMetaIndexblock(), ReadPropertiesBlock(), PrefetchIndexAndFilterBlocks(), and ReadRangeDelBlock().
      - In RandomAccessFileReader, the functions to be updated include Read(), MultiRead(), ReadAsync(), and Prefetch().
      - Update the downstream functions of NewIndexIterator(), NewDataBlockIterator(), and BlockBasedTableIterator().
      
      ### Test Plans
      Add unit tests.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9996
      
      Reviewed By: anand1976
      
      Differential Revision: D36452483
      
      Pulled By: gitbw95
      
      fbshipit-source-id: 60978204a4f849bb9261cb78d9bc1cb56d6008cf
      4da34b97
    • S
      Remove two tests from platform dependent tests (#10017) · f1303bf8
      sdong 提交于
      Summary:
      Platform dependent tests sometimes run too long and causes timeout in Travis. Remove two tests that are less likely to be platform dependent.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10017
      
      Test Plan: Watch Travis tests.
      
      Reviewed By: pdillinger
      
      Differential Revision: D36486734
      
      fbshipit-source-id: 2a3ad1746791c893a790c2a69a3b70f81e7de260
      f1303bf8
    • Y
      Remove ROCKSDB_SUPPORT_THREAD_LOCAL define because it's a part of C++11 (#10015) · 0a43061f
      Yaroslav Stepanchuk 提交于
      Summary:
      ROCKSDB_SUPPORT_THREAD_LOCAL definition has been removed.
      `__thread`(#define) has been replaced with `thread_local`(C++ keyword) across the code base.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10015
      
      Reviewed By: siying
      
      Differential Revision: D36485491
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 6522d212514ee190b90b4e2750c80c7e34013c78
      0a43061f
    • Y
      Avoid overwriting options loaded from OPTIONS (#9943) · e3a3dbf2
      Yanqin Jin 提交于
      Summary:
      This is similar to https://github.com/facebook/rocksdb/issues/9862, including the following fixes/refactoring:
      
      1. If OPTIONS file is specified via `-options_file`, majority of options will be loaded from the file. We should not
      overwrite options that have been loaded from the file. Instead, we configure only fields of options which are
      shared objects and not set by the OPTIONS file. We also configure a few fields, e.g. `create_if_missing` necessary
      for stress test to run.
      
      2. Refactor options initialization into three functions, `InitializeOptionsFromFile()`, `InitializeOptionsFromFlags()`
      and `InitializeOptionsGeneral()` similar to db_bench. I hope they can be shared in the future. The high-level logic is
      as follows:
      ```cpp
      if (!InitializeOptionsFromFile(...)) {
        InitializeOptionsFromFlags(...);
      }
      InitializeOptionsGeneral(...);
      ```
      
      3. Currently, the setting for `block_cache_compressed` does not seem correct because it by default specifies a
      size of `numeric_limits<size_t>::max()` ((size_t)-1). According to code comments, `-1` indicates default value,
      which should be referring to `num_shard_bits` argument.
      
      4. Clarify `fail_if_options_file_error`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9943
      
      Test Plan:
      1. make check
      2. Run stress tests, and manually check generated OPTIONS file and compare them with input OPTIONS files
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36133769
      
      Pulled By: riversand963
      
      fbshipit-source-id: 35dacdc090a0a72c922907170cd132b9ecaa073e
      e3a3dbf2
    • S
      Log error message when LinkFile() is not supported when ingesting files (#10010) · a74f14b5
      sdong 提交于
      Summary:
      Right now, whether moving file is skipped due to LinkFile() is not supported is opaque to users. Add a log message to help users debug.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10010
      
      Test Plan: Run existing test. Manual test verify the log message printed out.
      
      Reviewed By: riversand963
      
      Differential Revision: D36463237
      
      fbshipit-source-id: b00bd5041bd5c11afa4e326819c8461ee2c98a91
      a74f14b5
  8. 18 5月, 2022 4 次提交
    • G
      Set Write rate limiter priority dynamically and pass it to FS (#9988) · 05c678e1
      gitbw95 提交于
      Summary:
      ### Context:
      Background compactions and flush generate large reads and writes, and can be long running, especially for universal compaction. In some cases, this can impact foreground reads and writes by users.
      
      From the RocksDB perspective, there can be two kinds of rate limiters, the internal (native) one and the external one.
      - The internal (native) rate limiter is introduced in [the wiki](https://github.com/facebook/rocksdb/wiki/Rate-Limiter). Currently, only IO_LOW and IO_HIGH are used and they are set statically.
      - For the external rate limiter, in FSWritableFile functions,  IOOptions is open for end users to set and get rate_limiter_priority for their own rate limiter. Currently, RocksDB doesn’t pass the rate_limiter_priority through IOOptions to the file system.
      
      ### Solution
      During the User Read, Flush write, Compaction read/write, the WriteController is used to determine whether DB writes are stalled or slowed down. The rate limiter priority (Env::IOPriority) can be determined accordingly. We decided to always pass the priority in IOOptions. What the file system does with it should be a contract between the user and the file system. We would like to set the rate limiter priority at file level, since the Flush/Compaction job level may be too coarse with multiple files and block IO level is too granular.
      
      **This PR is for the Write path.** The **Write:** dynamic priority for different state are listed as follows:
      
      | State | Normal | Delayed | Stalled |
      | ----- | ------ | ------- | ------- |
      |  Flush | IO_HIGH | IO_USER | IO_USER |
      |  Compaction | IO_LOW | IO_USER | IO_USER |
      
      Flush and Compaction writes share the same call path through BlockBaseTableWriter, WritableFileWriter, and FSWritableFile. When a new FSWritableFile object is created, its io_priority_ can be set dynamically based on the state of the WriteController. In WritableFileWriter, before the call sites of FSWritableFile functions, WritableFileWriter::DecideRateLimiterPriority() determines the rate_limiter_priority. The options (IOOptions) argument of FSWritableFile functions will be updated with the rate_limiter_priority.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9988
      
      Test Plan: Add unit tests.
      
      Reviewed By: anand1976
      
      Differential Revision: D36395159
      
      Pulled By: gitbw95
      
      fbshipit-source-id: a7c82fc29759139a1a07ec46c37dbf7e753474cf
      05c678e1
    • J
      Add table_properties_collector_factories override (#9995) · b84e3363
      Jay Zhuang 提交于
      Summary:
      Add table_properties_collector_factories override on the remote
      side.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9995
      
      Test Plan: unittest added
      
      Reviewed By: ajkr
      
      Differential Revision: D36392623
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 3ba031294d90247ca063d7de7b43178d38e3f66a
      b84e3363
    • P
      Adjust public APIs to prefer 128-bit SST unique ID (#10009) · 0070680c
      Peter Dillinger 提交于
      Summary:
      128 bits should suffice almost always and for tracking in manifest.
      
      Note that this changes the output of sst_dump --show_properties to only show 128 bits.
      
      Also introduces InternalUniqueIdToHumanString for presenting internal IDs for debugging purposes.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10009
      
      Test Plan: unit tests updated
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36458189
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 93ebc4a3b6f9c73ee154383a1f8b291a5d6bbef5
      0070680c
    • X
      fix: build on risc-v (#9215) · 8b1df101
      XieJiSS 提交于
      Summary:
      Patch is modified from ~~https://reviews.llvm.org/file/data/du5ol5zctyqw53ma7dwz/PHID-FILE-knherxziu4tl4erti5ab/file~~
      
      Tested on Arch Linux riscv64gc (qemu)
      
      UPDATE: Seems like the above link is broken, so I tried to search for a link pointing to the original merge request. It turned out to me that the LLVM guys are cherry-picking from `google/benchmark`, and the upstream should be this:
      
      https://github.com/google/benchmark/blob/808571a52fd6cc7e9f0788e08f71f0f4175b6673/src/cycleclock.h#L190
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9215
      
      Reviewed By: siying, jay-zhuang
      
      Differential Revision: D34170586
      
      Pulled By: riversand963
      
      fbshipit-source-id: 41b16b9f7f3bb0f3e7b26bb078eb575499c0f0f4
      8b1df101