1. 28 8月, 2020 3 次提交
    • J
      Add buffer prefetch support for non directIO usecase (#7312) · c2485f2d
      Jay Zhuang 提交于
      Summary:
      A new file interface `SupportPrefetch()` is added. When the user overrides it to `false`, an internal prefetch buffer will be used for readahead. Useful for non-directIO but FS doesn't have readahead support.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7312
      
      Reviewed By: anand1976
      
      Differential Revision: D23329847
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 71cd4ce6f4a820840294e4e6aec111ab76175527
      c2485f2d
    • L
      Add a blob file builder class that can be used in background jobs (#7306) · 50439606
      Levi Tamasi 提交于
      Summary:
      The patch adds a class called `BlobFileBuilder` that can be used to build
      and cut blob files in background jobs (flushes/compactions). The class
      enforces a value size threshold (`min_blob_size`; smaller blobs will be inlined
      in the LSM tree itself), and supports specifying a blob file size limit (`blob_file_size`),
      as well as compression (`blob_compression_type`) and checksums for blob files.
      It also keeps track of the generated blob files and their associated `BlobFileAddition`
      metadata, which can be applied as part of the background job's `VersionEdit`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7306
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D23298817
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 38f35d81dab1ba81f15236240612ec173d7f21b5
      50439606
    • A
      Store FSRandomAccessPtr object in RandomAccessFileReader (#7192) · 8e0df905
      Akanksha Mahajan 提交于
      Summary:
      Replace FSRandomAccessFile pointer with FSRandomAccessFilePtr
          object in RandomAccessFileReader.
          This new object wraps FSRandomAccessFile pointer.
      
          Objective: If tracing is enabled, FSRandomAccessFile Ptr returns
          FSRandomAccessFileTracingWrapper pointer that includes all necessary
          information in IORecord and calls underlying FileSystem and invokes
          IOTracer to dump that record in a binary file. If tracing is disabled
          then, underlying FileSystem pointer is returned directly.
          FSRandomAccessFilePtr wrapper class is added to bypass the FSRandomAccessFileWrapper when
          tracing is disabled.
      
          Test Plan: make check -j64
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7192
      
      Reviewed By: anand1976
      
      Differential Revision: D23356867
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 48f31168166a17a7444b40be44a9a9d4a5c7182c
      8e0df905
  2. 27 8月, 2020 1 次提交
    • P
      Real fix for race in backup custom checksum checking (#7309) · 9aad24da
      Peter Dillinger 提交于
      Summary:
      This is a "real" fix for the issue worked around in https://github.com/facebook/rocksdb/issues/7294.
      To get DB checksum info for live files, we now read the manifest file
      that will become part of the checkpoint/backup. This requires a little
      extra handling in taking a custom checkpoint, including only reading the
      manifest file up to the size prescribed by the checkpoint.
      
      This moves GetFileChecksumsFromManifest from backup code to
      file_checksum_helper.{h,cc} and removes apparently unnecessary checking
      related to column families.
      
      Updated HISTORY.md and warned potential future users of
      DB::GetLiveFilesChecksumInfo()
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7309
      
      Test Plan: updated unit test, before and after
      
      Reviewed By: ajkr
      
      Differential Revision: D23311994
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 741e30a2dc1830e8208f7648fcc8c5f000d4e2d5
      9aad24da
  3. 26 8月, 2020 3 次提交
    • S
      Get() to fail with underlying failures in PartitionIndexReader::CacheDependencies() (#7297) · 722814e3
      sdong 提交于
      Summary:
      Right now all I/O failures under PartitionIndexReader::CacheDependencies() is swallowed. This doesn't impact correctness but we've made a decision that any I/O error in read path now should be returned to users for awareness. Return errors in those cases instead.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7297
      
      Test Plan: Add a new unit test that ingest errors in this code path and see Get() fails. Only one I/O path is hit in PartitionIndexReader::CacheDependencies(). Several option changes are attempt but not able to got other pread paths triggered. Not sure whether other failure cases would be even possible. Would rely on continuous stress test to validate it.
      
      Reviewed By: anand1976
      
      Differential Revision: D23257950
      
      fbshipit-source-id: 859dbc92fa239996e1bb378329344d3d54168c03
      722814e3
    • S
      Parameterize DBBasicTest.CompactBetweenSnapshots (#7301) · cecdd5d2
      sdong 提交于
      Summary:
      DBBasicTest.CompactBetweenSnapshots can time-out in some slow-I/O hosts. Parameterize it so that single test runs shorter.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7301
      
      Test Plan: Run the test and see see different runs are of different configerations in a hacky way.
      
      Reviewed By: ltamasi
      
      Differential Revision: D23277733
      
      fbshipit-source-id: 1f717b4131322d175abf9e211131fe7e9b1ef758
      cecdd5d2
    • Z
      Pass SST file checksum information through OnTableFileCreated (#7108) · d51f88c9
      Zhichao Cao 提交于
      Summary:
      When SST file is created, application is able to know the file information through OnTableFileCreated callback in LogAndNotifyTableFileCreationFinished. Since file checksum information can be useful for application when the SST file is created, we add file_checksum and file_checksum_func_name information to TableFileCreationInfo, which will be passed through OnTableFileCreated.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7108
      
      Test Plan: make check, listener_test.
      
      Reviewed By: ajkr
      
      Differential Revision: D22470240
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 92c20344d9b986eadfe3480f3769bf4add0dbaae
      d51f88c9
  4. 25 8月, 2020 6 次提交
  5. 22 8月, 2020 3 次提交
    • J
      Shutdown timer in destructor (#7292) · e500c730
      Jay Zhuang 提交于
      Summary:
      Make sure deleting a running timer works fine.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7292
      
      Test Plan: unittest and an invalid benchmark command: `./db_bench --db=/tmp --use_existing_db=false --benchmarks=fred --compression_type=none`
      
      Reviewed By: riversand963
      
      Differential Revision: D23248500
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 04111681b389a9aa23a439db4568d5ca351f1144
      e500c730
    • A
      Bug Fix for memtables not trimmed down. (#7296) · 38446126
      Akanksha Mahajan 提交于
      Summary:
      When a memtable is trimmed in MemTableListVersion, the memtable
      is only added to delete list if it is
      the last reference. However it is not the last reference as it is held
      by the super version. But the super version would not be switched if the
      delete list is empty. So the memtable is never destroyed and memory
      usage increases beyond write_buffer_size +
      max_write_buffer_size_to_maintain.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7296
      
      Test Plan:
      1.  ./db_bench -benchmarks=randomtransaction
      -optimistic_transaction_db=1 -statistics -stats_interval_seconds=1
      -duration=90 -num=500000 --max_write_buffer_size_to_maintain=16000000
      --transaction_set_snapshot
      
      Reviewed By: ltamasi
      
      Differential Revision: D23267395
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 3a8d437fe9f4015f851ff84c0e29528aa946b650
      38446126
    • J
      Add test function MockTimeEnv.SleepForMicroseconds() (#7293) · 187964a0
      Jay Zhuang 提交于
      Summary:
      And change the internal time value from seconds to microseconds.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7293
      
      Reviewed By: pdillinger
      
      Differential Revision: D23253751
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 36aa9376b8801b85bd10163173590a17cf4f3a3a
      187964a0
  6. 21 8月, 2020 5 次提交
  7. 20 8月, 2020 3 次提交
  8. 19 8月, 2020 3 次提交
    • L
      Add initial set of options for integrated blob write path (#7280) · b9bb59d4
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7280
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D23195192
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 743b382de391963e62ba86119e9fbd0233ea3b3a
      b9bb59d4
    • A
      Store FSSequentialFilePtr object in SequenceFileReader (#7190) · cc24ac14
      Akanksha Mahajan 提交于
      Summary:
      This diff contains following changes:
          1. Replace `FSSequentialFile` pointer with `FSSequentialFilePtr` object that wraps `FSSequentialFile` pointer in `SequenceFileReader`.
      
      Objective: If tracing is enabled, `FSSequentialFilePtr` returns `FSSequentialFileTracingWrapper` pointer that includes all necessary information in `IORecord` and calls underlying FileSystem and invokes `IOTracer` to dump that record in a binary file. If tracing is disabled then, underlying `FileSystem` pointer is returned directly. `FSSequentialFilePtr` wrapper class is added to bypass the `FSSequentialFileTracingWrapper` when tracing is disabled.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7190
      
      Test Plan:
      make check -j64
                COMPILE_WITH_TSAN=1 make check -j64
      
      Reviewed By: anand1976
      
      Differential Revision: D23059616
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 1564b94dd1297cd0fbfe2ed5c9cc3e20f7395301
      cc24ac14
    • A
      fix doc about kTolerateCorruptedTailRecords recovery (#7270) · e6e2f369
      Andrew Kryczka 提交于
      Summary:
      - Made it clear only one record in the tail is allowed to have a problem
      - Added detail about the valid use case instead of calling it legacy behavior
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7270
      
      Reviewed By: riversand963
      
      Differential Revision: D23169075
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2a4b45aa8641f17efa104523fbad765012a98fb0
      e6e2f369
  9. 18 8月, 2020 11 次提交
    • P
      Fix some flaky tests in BackupableDBTest with intentional flushing (#7273) · 7d0ecab5
      Peter Dillinger 提交于
      Summary:
      Some tests like BackupableDBTest.FileCollision and
      ShareTableFilesWithChecksumsNewNaming are intermittently failing,
      probably due to unpredictable flushing with FillDB. This change
      should fix the failures seen and help to prevent similar flakiness in
      future tests in the file.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7273
      
      Test Plan: make check, and with valgrind
      
      Reviewed By: siying
      
      Differential Revision: D23176947
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 654b73a64db475f2b9b065ed53a889a8b9083c59
      7d0ecab5
    • J
      db_bench should be linked with thirdparty libs (#7264) · c073b7fa
      Jay Zhuang 提交于
      Summary:
      `db_bench` is not linked with thirdparty libs in cmake, even `-DWITH_*`
      is specified.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7264
      
      Test Plan:
      `$ mkdir build; cd build; cmake .. -DWITH_SNAPPY=1; make db_bench; ./db_bench`
      `$ cmake .. -DWITH_SNAPPY=1 -DWITH_LZ4; make db_bench; ./db_bench -compression_type=lz4`
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D23165077
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 9c6fead31c41664a5c75ecd6469f47402fcb7d62
      c073b7fa
    • S
      Whole DBTest to skip fsync (#7274) · b194c21b
      sdong 提交于
      Summary:
      After https://github.com/facebook/rocksdb/pull/7036, we still see extra DBTest that can timeout when running 10 or 20 in parallel. Expand skip-fsync mode in whole DBTest. Still preserve other tests from doing this mode to be conservative.
      
      This commit reinstates https://github.com/facebook/rocksdb/issues/7049, whose un-revert was lost in an automatic
      infrastructure mis-merge.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7274
      
      Test Plan: Run all existing files.
      
      Reviewed By: pdillinger
      
      Differential Revision: D23177444
      
      fbshipit-source-id: 1f61690b2ac6333c3b2c87176fef6b2cba086b33
      b194c21b
    • A
      Disable `recycle_log_file_num` with `kTolerateCorruptedTailRecords` (#7271) · 5d5ff824
      Andrew Kryczka 提交于
      Summary:
      The two features are naturally incompatible. WAL recycling expects the recovery to succeed upon encountering a corrupt record at the point where new data ends and recycled data remains at the tail. However, `WALRecoveryMode::kTolerateCorruptedTailRecords` must fail upon encountering any such corrupt record, as it cannot differentiate between this and a real corruption, which would cause committed updates to be truncated.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7271
      
      Reviewed By: riversand963
      
      Differential Revision: D23169923
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2cf8a3bcd2c9a0ecb0055a84725047a10fd4db50
      5d5ff824
    • Y
      Add a new EntryType for deletion with timestamp (#7195) · 92593d51
      Yanqin Jin 提交于
      Summary:
      Add `kEntryDeleteWithTimestamp` to `EntryType` which is a public API.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7195
      
      Test Plan: make check
      
      Reviewed By: ajkr
      
      Differential Revision: D22914704
      
      Pulled By: riversand963
      
      fbshipit-source-id: 886f73c6b70c527cad1c8fc9fc8d3afe60e1ea39
      92593d51
    • L
      Build blob file reader/writer classes in LITE mode as well (#7272) · 9b083cb1
      Levi Tamasi 提交于
      Summary:
      The patch makes sure that the functionality required for the new integrated
      BlobDB implementation (most importantly, the classes related to reading and
      writing blob files) is also built in LITE mode by removing the corresponding
      `#ifndef`s.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7272
      
      Test Plan: Ran `make check` in both regular and LITE mode.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D23173280
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 1596bd1a76409a8a6d83d8f1dbfe08bfdea7ffe6
      9b083cb1
    • S
      CompactRange() refit level should confirm destination level is not empty (#7261) · 17606375
      sdong 提交于
      Summary:
      There is potential data race related CompactRange() with level refitting. After the compaction step and refitting step, some automatic compaction could put data to the destination level and cause the DB to be corrupted. Fix the bug by checking the target level to be empty.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7261
      
      Test Plan: Add a unit test, which would fail with "Corruption: L1 have overlapping ranges '666F6F' seq:6, type:1 vs. '626172' seq:2, type:1", and now it succeeds.
      
      Reviewed By: ajkr
      
      Differential Revision: D23142269
      
      fbshipit-source-id: 28bc14d5ac934c192260b23a4ce3f10a95e3ee91
      17606375
    • Z
      Re-enable param tests for backup engine (#7260) · 500eeb6f
      Zitan Chen 提交于
      Summary:
      The param tests did not take any effect previously. This PR re-enables it.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7260
      
      Test Plan: Some manual tests and `./backupable_db_test`.
      
      Reviewed By: siying
      
      Differential Revision: D23140902
      
      Pulled By: pdillinger
      
      fbshipit-source-id: cd62b11b926affed25127d9074fa97a1c7f748c4
      500eeb6f
    • M
      Populate cf_id member of CompactionJobInfo for OnCompactionBegin (#6938) · 2ad88cea
      matthewvon 提交于
      Summary:
      Looks like somebody simply missed initializing a member variable. The column family ID, cf_id, is not set during OnCompactionBegin. But it is set properly in the next function for OnCompactionCompleted. Need this cf_id for tracking progress of a Stardog optimize since there may be multiple compactions required for a given column family.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6938
      
      Reviewed By: siying
      
      Differential Revision: D23153235
      
      Pulled By: ajkr
      
      fbshipit-source-id: 932938de3a4ebbc7ac89702f655583862587d251
      2ad88cea
    • H
      Add a file system parameter: --fs_uri to db_stress and db_bench (#6878) · 2a0d3c70
      Hans Holmberg 提交于
      Summary:
      This pull request adds the parameter --fs_uri to db_bench and db_stress, creating a composite env combining the default env with a specified registered rocksdb file system.
      
      This makes it easier to develop and test new RocksDB FileSystems.
      
      The pull request also registers the posix file system for testing purposes.
      
      Examples:
      ```
      $./db_bench --fs_uri=posix:// --benchmarks=fillseq
      
      $./db_stress --fs_uri=zenfs://nullb1
      ```
      
      zenfs is a RocksDB FileSystem I'm developing to add support for zoned block devices, and in that case the zoned block device is specified in the uri (a zoned null block device in the above example).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6878
      
      Reviewed By: siying
      
      Differential Revision: D23023063
      
      Pulled By: ajkr
      
      fbshipit-source-id: 8b3fe7193ce45e683043b021779b7a4d547af247
      2a0d3c70
    • J
      Generate and install a pkg-config file (#7244) · 59ebab65
      John Goerzen 提交于
      Summary:
      pkg-config files are quite useful for communicating to users of a
      library how to compile against them. This commit generates and installs
      a pkg-config file that can be used for both static and dynamic builds
      against the RocksDB library. This should make life easier for developers
      of client programs, language bindings, etc.
      
      Example usage:
      
      ```
      g++ `pkg-config --cflags rocksdb` -o simple_example simple_example.cc `pkg-config --libs rocksdb`
      
      g++ `pkg-config --cflags --static rocksdb` -static \
         -o simple_example simple_example.cc `pkg-config --libs --static rocksdb`
      ```
      
      The commit also adds the generated file to .gitignore, to the uninstall
      target, and to clean.
      
      No additional dependencies are added to RocksDB itself, and this does
      not make RocksDB use pkg-config as part of its build process.
      
      Resolves https://github.com/facebook/rocksdb/issues/4452
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7244
      
      Reviewed By: siying
      
      Differential Revision: D23146153
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3045aa650d68bd5ac42d40ed709570e9584ef004
      59ebab65
  10. 15 8月, 2020 2 次提交
    • J
      Introduce a global StatsDumpScheduler for stats dumping (#7223) · 69760b4d
      Jay Zhuang 提交于
      Summary:
      Have a global StatsDumpScheduler for all DB instance stats dumping, including `DumpStats()` and `PersistStats()`. Before this, there're 2 dedicate threads for every DB instance, one for DumpStats() one for PersistStats(), which could create lots of threads if there're hundreds DB instances.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7223
      
      Reviewed By: riversand963
      
      Differential Revision: D23056737
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 0faa2311142a73433ebb3317361db7cbf43faeba
      69760b4d
    • Y
      Get() with timestamp should respect snapshot (#7227) · d758273c
      Yanqin Jin 提交于
      Summary:
      If user-defined timestamp is enabled, current implementation can expose
      newer data to queries even if an older sequence number is specified via
      read_options.snapshot. This PR makes Get() respect sequence-number-based
      snapshot.
      
      Solution is simple. Besides using <ukey, ts, seq> to search the index for the key,
      we also verify that the candidate result's seq is smaller than or equal to seq. This
      requires passing a seq via `GetContext`, which results in the majority of code
      change caused by this PR.
      
      Also added a few unit tests to demonstrate standard visibility during point lookup
      and range scan when timestamp and snapshot are both present.
      
      Test plan (devserver):
      ```
      make check
      $./db_bench --benchmarks=fillseq,readrandom -cache_size=$[64*1024*1024]
      ```
      Result
      this PR: readrandom   :       4.827 micros/op 207180 ops/sec;   22.9 MB/s (1000000 of 1000000 found)
      master:  readrandom   :       4.936 micros/op 202610 ops/sec;   22.4 MB/s (1000000 of 1000000 found)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7227
      
      Reviewed By: ltamasi
      
      Differential Revision: D23015242
      
      Pulled By: riversand963
      
      fbshipit-source-id: ea7b85a728654553ba357d2e6a207b5e40f7376a
      d758273c