1. 02 6月, 2021 3 次提交
    • P
      Fix "Interval WAL" bytes to say GB instead of MB (#8350) · 2655477c
      PiyushDatta 提交于
      Summary:
      Reference: https://github.com/facebook/rocksdb/issues/7201
      
      Before fix:
      `/tmp/rocksdb_test_file/LOG.old.1622492586055679:Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 MB, 0.00 MB/s`
      
      After fix:
      `/tmp/rocksdb_test_file/LOG:Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s`
      
      Tests:
      ```
      Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
      ETA: 0s Left: 0 AVG: 0.05s  local:0/7720/100%/0.0s
      rm -rf /dev/shm/rocksdb.CLRh
      /usr/bin/python3 tools/check_all_python.py
      No syntax errors in 34 .py files
      /usr/bin/python3 tools/ldb_test.py
      Running testCheckConsistency...
      .Running testColumnFamilies...
      .Running testCountDelimDump...
      .Running testCountDelimIDump...
      .Running testDumpLiveFiles...
      .Running testDumpLoad...
      Warning: 7 bad lines ignored.
      .Running testGetProperty...
      .Running testHexPutGet...
      .Running testIDumpBasics...
      .Running testIngestExternalSst...
      .Running testInvalidCmdLines...
      .Running testListColumnFamilies...
      .Running testManifestDump...
      .Running testMiscAdminTask...
      Sequence,Count,ByteSize,Physical Offset,Key(s)
      .Running testSSTDump...
      .Running testSimpleStringPutGet...
      .Running testStringBatchPut...
      .Running testTtlPutGet...
      .Running testWALDump...
      .
      ----------------------------------------------------------------------
      Ran 19 tests in 15.945s
      
      OK
      sh tools/rocksdb_dump_test.sh
      make check-format
      make[1]: Entering directory '/home/piydatta/Documents/rocksdb'
      $DEBUG_LEVEL is 1
      Makefile:176: Warning: Compiling in debug mode. Don't use the resulting binary in production
      build_tools/format-diff.sh -c
      Checking format of uncommitted changes...
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8350
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28790567
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: dcb1e4c124361156435122f21f0a288335b2c8c8
      2655477c
    • J
      Fix cmake build failure with gflags (#8324) · eda83eaa
      Jay Zhuang 提交于
      Summary:
      - Fix cmake build failure with gflags.
      - Add CI tests for both gflags 2.1 and 2.2.
      - Fix ctest config with gtest.
      - Add CI to run test with ctest.
      
      One benefit of ctest is it support timeout, it's set to 5min in our CI, so we will know which test is hang.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8324
      
      Test Plan: CI pass
      
      Reviewed By: ajkr
      
      Differential Revision: D28762517
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 09063c5af5f9f33abfcdeb48593acbd9826cd199
      eda83eaa
    • S
      Kill whitebox crash test if it is 15 minutes over the limit (#8341) · ab718b41
      sdong 提交于
      Summary:
      Whitebox crash test can run significantly over the time limit for test slowness or no kiling points. This indefinite job can create problem when this test is periodically scheduled as a job. Instead, kill the job if it is 15 minutes over the limit.
      Refactor the code slightly to consolidate the code for executing commands for white and black box tests.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8341
      
      Test Plan: Run both of black and white box tests with both of natual and explicit kill condition.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28756170
      
      fbshipit-source-id: f253149890e62ace78f871be927e093e9b12f49b
      ab718b41
  2. 01 6月, 2021 2 次提交
  3. 28 5月, 2021 4 次提交
    • S
      Add a new blog post for online validation (#8338) · 1c88f66f
      sdong 提交于
      Summary:
      A new blog post to introduce recent development related to online validation.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8338
      
      Test Plan: Local test with "bundle exec jekyll serve"
      
      Reviewed By: ltamasi
      
      Differential Revision: D28757134
      
      fbshipit-source-id: 42268e1af8dc0c6a42ae62ea61568409b7ce10e4
      1c88f66f
    • S
      Use bloom filter to speed up sync point (#8337) · cda79231
      sdong 提交于
      Summary:
      Now SyncPoint is used in crash test but can signiciantly slow down the run. Add a bloom filter before each process to speed itup
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8337
      
      Test Plan: Run all existing tests
      
      Reviewed By: ajkr
      
      Differential Revision: D28730282
      
      fbshipit-source-id: a187377a9d47877a36c5649e4b1f67d5e3033238
      cda79231
    • A
      Blog post about SecondaryCache (#8339) · b53e3d2a
      anand76 提交于
      Summary:
      Blog post about SecondaryCache
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8339
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D28753501
      
      Pulled By: anand1976
      
      fbshipit-source-id: d3241b746a9266fb523e13ad45fd0288083f7470
      b53e3d2a
    • P
      Do not truncate WAL if in read_only mode (#8313) · c75ef03e
      Peter (Stig) Edwards 提交于
      Summary:
      I noticed ```openat``` system call with ```O_WRONLY``` flag and ```sync_file_range``` and ```truncate``` on WAL file when using ```rocksdb::DB::OpenForReadOnly``` by way of ```db_bench --readonly=true --benchmarks=readseq --use_existing_db=1 --num=1 ...```
      
      Noticed in ```strace``` after seeing the last modification time of the WAL file change after each run (with ```--readonly=true```).
      
        I think introduced by https://github.com/facebook/rocksdb/commit/7d7f14480e135a4939ed6903f46b3f7056aa837a from https://github.com/facebook/rocksdb/pull/8122
      
      I added a test to catch the WAL file being truncated and the modification time on it changing.
      I am not sure if a mock filesystem with mock clock could be used to avoid having to sleep 1.1s.
      The test could also check the set of files is the same and that the sizes are also unchanged.
      
      Before:
      
      ```
      [ RUN      ] DBBasicTest.ReadOnlyReopenMtimeUnchanged
      db/db_basic_test.cc:182: Failure
      Expected equality of these values:
        file_mtime_after_readonly_reopen
          Which is: 1621611136
        file_mtime_before_readonly_reopen
          Which is: 1621611135
        file is: 000010.log
      [  FAILED  ] DBBasicTest.ReadOnlyReopenMtimeUnchanged (1108 ms)
      ```
      
      After:
      
      ```
      [ RUN      ] DBBasicTest.ReadOnlyReopenMtimeUnchanged
      [       OK ] DBBasicTest.ReadOnlyReopenMtimeUnchanged (1108 ms)
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8313
      
      Reviewed By: pdillinger
      
      Differential Revision: D28656925
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: ea9e215cb53e7c830e76bc5fc75c45e21f12a1d6
      c75ef03e
  4. 27 5月, 2021 2 次提交
  5. 26 5月, 2021 1 次提交
  6. 25 5月, 2021 1 次提交
  7. 24 5月, 2021 1 次提交
  8. 22 5月, 2021 6 次提交
    • J
      Fix clang-analyze: use uninitiated variable (#8325) · 55853de6
      Jay Zhuang 提交于
      Summary:
      Error:
      ```
      db/db_compaction_test.cc:5211:47: warning: The left operand of '*' is a garbage value
      uint64_t total = (l1_avg_size + l2_avg_size * 10) * 10;
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8325
      
      Test Plan: `$ make analyze`
      
      Reviewed By: pdillinger
      
      Differential Revision: D28620916
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: f6d58ab84eefbcc905cda45afb9522b0c6d230f8
      55853de6
    • Z
      Use new Insert and Lookup APIs in table reader to support secondary cache (#8315) · 7303d02b
      Zhichao Cao 提交于
      Summary:
      Secondary cache is implemented to achieve the secondary cache tier for block cache. New Insert and Lookup APIs are introduced in https://github.com/facebook/rocksdb/issues/8271  . To support and use the secondary cache in block based table reader, this PR introduces the corresponding callback functions that will be used in secondary cache, and update the Insert and Lookup APIs accordingly.
      
      benchmarking:
      ./db_bench --benchmarks="fillrandom" -num=1000000 -key_size=32 -value_size=256 -use_direct_io_for_flush_and_compaction=true -db=/tmp/rocks_t/db -partition_index_and_filters=true
      
      ./db_bench -db=/tmp/rocks_t/db -use_existing_db=true -benchmarks=readrandom -num=1000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=1073741824 -cache_numshardbits=5 -cache_index_and_filter_blocks=true -read_random_exp_range=17 -statistics -partition_index_and_filters=true -stats_dump_period_sec=30 -reads=50000000
      
      master benchmarking results:
      readrandom   :       3.923 micros/op 254881 ops/sec;   33.4 MB/s (23849796 of 50000000 found)
      rocksdb.db.get.micros P50 : 2.820992 P95 : 5.636716 P99 : 16.450553 P100 : 8396.000000 COUNT : 50000000 SUM : 179947064
      
      Current PR benchmarking results
      readrandom   :       4.083 micros/op 244925 ops/sec;   32.1 MB/s (23849796 of 50000000 found)
      rocksdb.db.get.micros P50 : 2.967687 P95 : 5.754916 P99 : 15.665912 P100 : 8213.000000 COUNT : 50000000 SUM : 187250053
      
      About 3.8% throughput reduction.
      P50: 5.2% increasing, P95, 2.09% increasing, P99 4.77% improvement
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8315
      
      Test Plan: added the testing case
      
      Reviewed By: anand1976
      
      Differential Revision: D28599774
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 098c4df0d7327d3a546df7604b2f1602f13044ed
      7303d02b
    • J
      Use large macos instance (#8320) · 6c7c3e8c
      Jay Zhuang 提交于
      Summary:
      Macos build is taking more than 1 hour, bump the instance type from the
      default medium to large (large macos instance was not available before).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8320
      
      Test Plan: watch CI pass
      
      Reviewed By: ajkr
      
      Differential Revision: D28589456
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: cff78dae5aaf9de90ade3468469290176de5ff32
      6c7c3e8c
    • P
      Add table properties for number of entries added to filters (#8323) · 3469d60f
      Peter Dillinger 提交于
      Summary:
      With Ribbon filter work and possible variance in actual bits
      per key (or prefix; general term "entry") to achieve certain FP rates,
      I've received a request to be able to track actual bits per key in
      generated filters. This change adds a num_filter_entries table
      property, which can be combined with filter_size to get bits per key
      (entry).
      
      This can vary from num_entries in at least these ways:
      * Different versions of same key are only counted once in filters.
      * With prefix filters, several user keys map to the same filter entry.
      * A single filter can include both prefixes and user keys.
      
      Note that FilterBlockBuilder::NumAdded() didn't do anything useful
      except distinguish empty from non-empty.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8323
      
      Test Plan: basic unit test included, others updated
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28596210
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 529a111f3c84501e5a470bc84705e436ee68c376
      3469d60f
    • J
      Fix manual compaction `max_compaction_bytes` under-calculated issue (#8269) · 6c865435
      Jay Zhuang 提交于
      Summary:
      Fix a bug that for manual compaction, `max_compaction_bytes` is only
      limit the SST files from input level, but not overlapped files on output
      level.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8269
      
      Test Plan: `make check`
      
      Reviewed By: ajkr
      
      Differential Revision: D28231044
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 9d7d03004f30cc4b1b9819830141436907554b7c
      6c865435
    • S
      Try to build with liburing by default. (#8322) · bd3d080e
      sdong 提交于
      Summary:
      By default, try to build with liburing. For make, if ROCKSDB_USE_IO_URING is not set, treat as 1, which means RocksDB will try to build with liburing. For cmake, add WITH_LIBURING to control it, with default on.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8322
      
      Test Plan: Build using cmake and make.
      
      Reviewed By: anand1976
      
      Differential Revision: D28586498
      
      fbshipit-source-id: cfd39159ab697f4b93a9293a59c07f839b1e7ed5
      bd3d080e
  9. 21 5月, 2021 2 次提交
    • S
      Compare memtable insert and flush count (#8288) · 2f1984dd
      sdong 提交于
      Summary:
      When a memtable is flushed, it will validate number of entries it reads, and compare the number with how many entries inserted into memtable. This serves as one sanity c\
      heck against memory corruption. This change will also allow more counters to be added in the future for better validation.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8288
      
      Test Plan: Pass all existing tests
      
      Reviewed By: ajkr
      
      Differential Revision: D28369194
      
      fbshipit-source-id: 7ff870380c41eab7f99eee508550dcdce32838ad
      2f1984dd
    • J
      Deflake ExternalSSTFileTest.PickedLevelBug (#8307) · 94b4faa0
      Jay Zhuang 提交于
      Summary:
      The test want to make sure these's no compaction during `AddFile`
      (between `DBImpl::AddFile:MutexLock` and `DBImpl::AddFile:MutexUnlock`)
      but the mutex could be unlocked by `EnterUnbatched()`.
      Move the lock start point after bumping the ingest file number.
      
      Also fix the dead lock when ASSERT fails.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8307
      
      Reviewed By: ajkr
      
      Differential Revision: D28479849
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: b3c50f66aa5d5f59c5c27f815bfea189c4cd06cb
      94b4faa0
  10. 20 5月, 2021 7 次提交
  11. 19 5月, 2021 3 次提交
    • A
      Sync ingested files only if reopen is supported by the FS (#8296) · 9d61a085
      anand76 提交于
      Summary:
      Some file systems (especially distributed FS) do not support reopening a file for writing. The ExternalSstFileIngestionJob calls ReopenWritableFile in order to sync the ingested file, which typically makes sense only on a local file system with a page cache (i.e Posix). So this change tries to sync the ingested file only if ReopenWritableFile doesn't return Status::NotSupported().
      
      Tests:
      Add a new unit test in external_sst_file_basic_test
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8296
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28420865
      
      Pulled By: anand1976
      
      fbshipit-source-id: 380e7f5ff95324997f7a59864a9ac96ebbd0100c
      9d61a085
    • S
      Handle return code by io_uring_submit_and_wait() and io_uring_wait_cqe() (#8311) · 60e5af83
      sdong 提交于
      Summary:
      Right now return codes by io_uring_submit_and_wait() and io_uring_wait_cqe() are not handled. It is not the good practice. Although these two functions are not supposed to return non-0 values in normal exeuction, people suspect that they might return non-0 value when an interruption happens, and the code might cause hanging.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8311
      
      Test Plan: Make sure at least normal test cases still pass.
      
      Reviewed By: anand1976
      
      Differential Revision: D28500828
      
      fbshipit-source-id: 8a76cea9cafbd041102e0b6a8eef9d0bfed7c211
      60e5af83
    • M
      Fix MultiGet with PinnableSlices and Merge for WBWI (#8299) · 6b0a22a4
      mrambacher 提交于
      Summary:
      The MultiGetFromBatchAndDB would fail if the PinnableSlice value being returned was pinned.  This could happen if the value was retrieved from the DB (not memtable) or potentially if the values were reused (and a previous iteration returned a slice that was pinned).
      
      This change resets the pinnable value to clear it prior to attempting to use it, thereby eliminating the problem with the value already being pinned.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8299
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28455426
      
      Pulled By: mrambacher
      
      fbshipit-source-id: a34d7d983ec9b6bb4c8a2b4892f72858d43e6972
      6b0a22a4
  12. 18 5月, 2021 3 次提交
    • S
      Expose CompressionOptions::parallel_threads through C API (#8302) · 83d1a665
      Stanislav Tkach 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8302
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28499262
      
      Pulled By: ajkr
      
      fbshipit-source-id: 7b17b79af871d874dfca76db9bca0d640a6cd854
      83d1a665
    • L
      Make it possible to apply only a subrange of table property collectors (#8298) · d83542ca
      Levi Tamasi 提交于
      Summary:
      This patch does two things:
      1) Introduces some aliases in order to eliminate/prevent long-winded type names
      w/r/t the internal table property collectors (see e.g.
      `std::vector<std::unique_ptr<IntTblPropCollectorFactory>>`).
      2) Makes it possible to apply only a subrange of table property collectors during
      table building by turning `TableBuilderOptions::int_tbl_prop_collector_factories`
      from a pointer to a `vector` into a range (i.e. a pair of iterators).
      
      Rationale: I plan to introduce a BlobDB related table property collector, which
      should only be applied during table creation if blob storage is enabled at the moment
      (which can be changed dynamically). This change will make it possible to include/
      exclude the BlobDB related collector as needed without having to introduce
      a second `vector` of collectors in `ColumnFamilyData` with pretty much the same
      contents.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8298
      
      Test Plan: `make check`
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28430910
      
      Pulled By: ltamasi
      
      fbshipit-source-id: a81d28f2c59495865300f43deb2257d2e6977c8e
      d83542ca
    • S
      Write file temperature information to manifest (#8284) · 0ed8cb66
      sdong 提交于
      Summary:
      As a part of tiered storage, writing tempeature information to manifest is needed so that after DB recovery, RocksDB still has the tiering information, to implement some further necessary functionalities.
      
      Also fix some issues in simulated hybrid FS.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8284
      
      Test Plan: Add a new unit test to validate that the information is indeed written and read back.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D28335801
      
      fbshipit-source-id: 56aeb2e6ea090be0200181dd968c8a7278037def
      0ed8cb66
  13. 14 5月, 2021 2 次提交
    • A
      Initial support for secondary cache in LRUCache (#8271) · feb06e83
      anand76 提交于
      Summary:
      Defined the abstract interface for a secondary cache in include/rocksdb/secondary_cache.h, and updated LRUCacheOptions to take a std::shared_ptr<SecondaryCache>. An item is initially inserted into the LRU (primary) cache. When it ages out and evicted from memory, its inserted into the secondary cache. On a LRU cache miss and successful lookup in the secondary cache, the item is promoted to the LRU cache. Only support synchronous lookup currently. The secondary cache would be used to implement a persistent (flash cache) or compressed cache.
      
      Tests:
      Results from cache_bench and db_bench don't show any regression due to these changes.
      
      cache_bench results before and after this change -
      Command
      ```./cache_bench -ops_per_thread=10000000 -threads=1```
      Before
      ```Complete in 40.688 s; QPS = 245774```
      ```Complete in 40.486 s; QPS = 246996```
      ```Complete in 42.019 s; QPS = 237989```
      After
      ```Complete in 40.672 s; QPS = 245869```
      ```Complete in 44.622 s; QPS = 224107```
      ```Complete in 42.445 s; QPS = 235599```
      
      db_bench results before this change, and with this change + https://github.com/facebook/rocksdb/issues/8213 and https://github.com/facebook/rocksdb/issues/8191 -
      Commands
      ```./db_bench  --benchmarks="fillseq,compact" -num=30000000 -key_size=32 -value_size=256 -use_direct_io_for_flush_and_compaction=true -db=/home/anand76/nvm_cache/db -partition_index_and_filters=true```
      
      ```./db_bench -db=/home/anand76/nvm_cache/db -use_existing_db=true -benchmarks=readrandom -num=30000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=1073741824 -cache_numshardbits=6 -cache_index_and_filter_blocks=true -read_random_exp_range=17 -statistics -partition_index_and_filters=true -threads=16 -duration=300```
      Before
      ```
      DB path: [/home/anand76/nvm_cache/db]
      readrandom   :      80.702 micros/op 198104 ops/sec;   54.4 MB/s (3708999 of 3708999 found)
      ```
      ```
      DB path: [/home/anand76/nvm_cache/db]
      readrandom   :      87.124 micros/op 183625 ops/sec;   50.4 MB/s (3439999 of 3439999 found)
      ```
      After
      ```
      DB path: [/home/anand76/nvm_cache/db]
      readrandom   :      77.653 micros/op 206025 ops/sec;   56.6 MB/s (3866999 of 3866999 found)
      ```
      ```
      DB path: [/home/anand76/nvm_cache/db]
      readrandom   :      84.962 micros/op 188299 ops/sec;   51.7 MB/s (3535999 of 3535999 found)
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8271
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D28357511
      
      Pulled By: anand1976
      
      fbshipit-source-id: d1cfa236f00e649a18c53328be10a8062a4b6da2
      feb06e83
    • J
      Refactor Option obj address from char* to void* (#8295) · d15fbae4
      Jay Zhuang 提交于
      Summary:
      And replace `reinterpret_cast` with `static_cast` or no cast.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8295
      
      Test Plan: `make check`
      
      Reviewed By: mrambacher
      
      Differential Revision: D28420303
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 645be123a0df624dc2bea37cd54a35403fc494fa
      d15fbae4
  14. 13 5月, 2021 3 次提交