1. 12 7月, 2021 2 次提交
    • A
      Correct CVS -> CSV typo (#8513) · 5afd1e30
      Adam Retter 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8513
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29654066
      
      Pulled By: mrambacher
      
      fbshipit-source-id: b8f492fe21edd37fe1f1c5a4a0e9153f58bbf3e2
      5afd1e30
    • A
      Avoid passing existing BG error to WriteStatusCheck (#8511) · d1b70b05
      anand76 提交于
      Summary:
      In ```DBImpl::WriteImpl()```, we call ```PreprocessWrite()``` which, among other things, checks the BG error and returns it set. This return status is later on passed to ```WriteStatusCheck()```, which calls ```SetBGError()```. This results in a spurious call, and info logs, on every user write request. We should avoid passing the ```PreprocessWrite()``` return status to ```WriteStatusCheck()```, as the former would have called ```SetBGError()``` already if it encountered any new errors, such as error when creating a new WAL file.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8511
      
      Test Plan: Run existing tests
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29639917
      
      Pulled By: anand1976
      
      fbshipit-source-id: 19234163969e1645dbeb273712aaf5cd9ea2b182
      d1b70b05
  2. 10 7月, 2021 2 次提交
    • B
      Make mempurge a background process (equivalent to in-memory compaction). (#8505) · 837705ad
      Baptiste Lemaire 提交于
      Summary:
      In https://github.com/facebook/rocksdb/issues/8454, I introduced a new process baptized `MemPurge` (memtable garbage collection). This new PR is built upon this past mempurge prototype.
      In this PR, I made the `mempurge` process a background task, which provides superior performance since the mempurge process does not cling on the db_mutex anymore, and addresses severe restrictions from the past iteration (including a scenario where the past mempurge was failling, when a memtable was mempurged but was still referred to by an iterator/snapshot/...).
      Now the mempurge process ressembles an in-memory compaction process: the stack of immutable memtables is filtered out, and the useful payload is used to populate an output memtable. If the output memtable is filled at more than 60% capacity (arbitrary heuristic) the mempurge process is aborted and a regular flush process takes place, else the output memtable is kept in the immutable memtable stack. Note that adding this output memtable to the `imm()` memtable stack does not trigger another flush process, so that the flush thread can go to sleep at the end of a successful mempurge.
      MemPurge is activated by making the `experimental_allow_mempurge` flag `true`. When activated, the `MemPurge` process will always happen when the flush reason is `kWriteBufferFull`.
      The 3 unit tests confirm that this process supports `Put`, `Get`, `Delete`, `DeleteRange` operators and is compatible with `Iterators` and `CompactionFilters`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8505
      
      Reviewed By: pdillinger
      
      Differential Revision: D29619283
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 8a99bee76b63a8211bff1a00e0ae32360aaece95
      837705ad
    • Q
      Add ribbon filter to C API (#8486) · bb485e98
      qieqieplus 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8486
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29625501
      
      Pulled By: pdillinger
      
      fbshipit-source-id: e6e2a455ae62a71f3a202278a751b9bba17ad03c
      bb485e98
  3. 09 7月, 2021 3 次提交
  4. 08 7月, 2021 2 次提交
    • S
      FaultInjectionTestFS::DeleteFilesCreatedAfterLastDirSync() to recover… (#8501) · b1a53db3
      sdong 提交于
      Summary:
      … small overwritten files.
      If a file is overwritten with renamed and the parent path is not synced, FaultInjectionTestFS::DeleteFilesCreatedAfterLastDirSync() will delete the file. However, RocksDB relies on file renaming to be atomic no matter whether the parent directory is synced or not, and the current behavior breaks the assumption and caused some false positive: https://github.com/facebook/rocksdb/pull/8489
      
      Since the atomic renaming is used in CURRENT files, to fix the problem, in FaultInjectionTestFS::DeleteFilesCreatedAfterLastDirSync(), we recover the state of overwritten file if the file is small.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8501
      
      Test Plan: Run stress test for a while and see it doesn't break.
      
      Reviewed By: anand1976
      
      Differential Revision: D29594384
      
      fbshipit-source-id: 589b5c2f0a9d2aca53752d7bdb0231efa5b3ae92
      b1a53db3
    • A
      Move slow valgrind tests behind -DROCKSDB_FULL_VALGRIND_RUN (#8475) · ed8eb436
      Andrew Kryczka 提交于
      Summary:
      Various tests had disabled valgrind due to it slowing down and timing
      out (as is the case right now) the CI runs. Where a test was disabled with no comment,
      I assumed slowness was the cause. For these tests that were slow under
      valgrind, as well as the ones identified in https://github.com/facebook/rocksdb/issues/8352, this PR moves them
      behind the compiler flag `-DROCKSDB_FULL_VALGRIND_RUN`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8475
      
      Test Plan: running `make full_valgrind_test`, `make valgrind_test`, `make check`; will verify they appear working correctly
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29504843
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2aac90749cfbd30d5ce11cb29a07a1b9314eeea7
      ed8eb436
  5. 07 7月, 2021 6 次提交
  6. 02 7月, 2021 6 次提交
    • B
      Memtable "MemPurge" prototype (#8454) · 9dc887ec
      Baptiste Lemaire 提交于
      Summary:
      Implement an experimental feature called "MemPurge", which consists in purging "garbage" bytes out of a memtable and reuse the memtable struct instead of making it immutable and eventually flushing its content to storage.
      The prototype is by default deactivated and is not intended for use. It is intended for correctness and validation testing. At the moment, the "MemPurge" feature can be switched on by using the `options.experimental_allow_mempurge` flag. For this early stage, when the allow_mempurge flag is set to `true`, all the flush operations will be rerouted to perform a MemPurge. This is a temporary design decision that will give us the time to explore meaningful heuristics to use MemPurge at the right time for relevant workloads . Moreover, the current MemPurge operation only supports `Puts`, `Deletes`, `DeleteRange` operations, and handles `Iterators` as well as `CompactionFilter`s that are invoked at flush time .
      Three unit tests are added to `db_flush_test.cc` to test if MemPurge works correctly (and checks that the previously mentioned operations are fully supported thoroughly tested).
      One noticeable design decision is the timing of the MemPurge operation in the memtable workflow: for this prototype, the mempurge happens when the memtable is switched (and usually made immutable). This is an inefficient process because it implies that the entirety of the MemPurge operation happens while holding the db_mutex. Future commits will make the MemPurge operation a background task (akin to the regular flush operation) and aim at drastically enhancing the performance of this operation. The MemPurge is also not fully "WAL-compatible" yet, but when the WAL is full, or when the regular MemPurge operation fails (or when the purged memtable still needs to be flushed), a regular flush operation takes place. Later commits will also correct these behaviors.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8454
      
      Reviewed By: anand1976
      
      Differential Revision: D29433971
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 6af48213554e35048a7e03816955100a80a26dc5
      9dc887ec
    • A
      Call OnCompactionCompleted API in case of DisableManualCompaction (#8469) · c76778e2
      Akanksha Mahajan 提交于
      Summary:
      Call OnCompactionCompleted API in case of
      DisableManualCompaction() with updated Status::Incomplete
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8469
      
      Reviewed By: ajkr
      
      Differential Revision: D29475517
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: a1726c5e6ee18c0b5097ea04f5e6975fbe108055
      c76778e2
    • P
      Add -report_open_timing to db_bench (#8464) · b2073770
      Peter (Stig) Edwards 提交于
      Summary:
      Hello and thanks for RocksDB,
      
      This PR adds support for ```-report_open_timing true``` to ```db_bench```.
      It can be useful when tuning RocksDB on filesystem/env with high latencies for file level operations (create/delete/rename...) seen during ```((Optimistic)Transaction)DB::Open```.
      
      Some examples:
      
      ```
      > db_bench -benchmarks updaterandom -num 1 -db /dev/shm/db_bench
      > db_bench -benchmarks updaterandom -num 0 -db /dev/shm/db_bench -use_existing_db true -report_open_timing true -readonly true 2>&1 | grep OpenDb
      OpenDb:     3.90133 milliseconds
      > db_bench -benchmarks updaterandom -num 0 -db /dev/shm/db_bench -use_existing_db true -report_open_timing true -use_secondary_db true 2>&1 | grep OpenDb
      OpenDb:     3.33414 milliseconds
      > db_bench -benchmarks updaterandom -num 0 -db /dev/shm/db_bench -use_existing_db true -report_open_timing true 2>&1 | grep -A1 OpenDb
      OpenDb:     6.05423 milliseconds
      
      > db_bench -benchmarks updaterandom -num 1
      > db_bench -benchmarks updaterandom -num 0 -use_existing_db true -report_open_timing true -readonly true 2>&1 | grep OpenDb
      OpenDb:     4.06859 milliseconds
      > db_bench -benchmarks updaterandom -num 0 -use_existing_db true -report_open_timing true -use_secondary_db true 2>&1 | grep OpenDb
      OpenDb:     2.85794 milliseconds
      > db_bench -benchmarks updaterandom -num 0 -use_existing_db true -report_open_timing true 2>&1 | grep OpenDb
      OpenDb:     6.46376 milliseconds
      
      > db_bench -benchmarks updaterandom -num 1 -db /clustered_fs/db_bench
      > db_bench -benchmarks updaterandom -num 0 -db /clustered_fs/db_bench -use_existing_db true -report_open_timing true -readonly true 2>&1 | grep OpenDb
      OpenDb:     3.79805 milliseconds
      > db_bench -benchmarks updaterandom -num 0 -db /clustered_fs/db_bench -use_existing_db true -report_open_timing true -use_secondary_db true 2>&1 | grep OpenDb
      OpenDb:     3.00174 milliseconds
      > db_bench -benchmarks updaterandom -num 0 -db /clustered_fs/db_bench -use_existing_db true -report_open_timing true 2>&1 | grep OpenDb
      OpenDb:     24.8732 milliseconds
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8464
      
      Reviewed By: hx235
      
      Differential Revision: D29398096
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 8f05dc3284f084612a3f30234e39e1c37548f50c
      b2073770
    • Z
      Inject fatal write failures to db_stress when DB is running (#8479) · a95a776d
      Zhichao Cao 提交于
      Summary:
      add the injest_error_severity to control if it is a retryable IO Error or a fatal or unrecoverable error. Use a flag to indicate, if fatal error comes, the flag is set and db is stopped (but not corrupted).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8479
      
      Test Plan: run  ./db_stress --reopen=0 --read_fault_one_in=1000 --write_fault_one_in=5 --disable_wal=true --write_buffer_size=3000000 -writepercent=5 -readpercent=50 --injest_error_severity=2 --column_families=1, make check
      
      Reviewed By: anand1976
      
      Differential Revision: D29524271
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 1aa9fb9b5655b0adba6f5ad12005ca8c074c795b
      a95a776d
    • A
      Enable crash test to run using fbcode components (#8471) · 41d32152
      anand76 提交于
      Summary:
      Add a new test ```fbcode_crash_test``` to rocksdb-lego-determinator. This test allows the crash test to be run on Facebook Sandcastle infra using fbcode components. Also use the default Env in db_stress to access the expected values path as it requires a memory mapped file and may not work with custom Envs.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8471
      
      Reviewed By: ajkr
      
      Differential Revision: D29474722
      
      Pulled By: anand1976
      
      fbshipit-source-id: 7d086d82dd7091ae48e08cb4ace763ce3e3b87ef
      41d32152
    • M
      Fix TSAN issue (#8477) · d45b8377
      mrambacher 提交于
      Summary:
      Added mutex to fix TSAN issue
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8477
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29517053
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 661ccb1f495b7d34874a79e0a3d7aea1123d6047
      d45b8377
  7. 01 7月, 2021 3 次提交
    • S
      Stress Test to inject write failures in reopen (#8474) · ba224b75
      sdong 提交于
      Summary:
      Previously Stress can inject metadata write failures when reopening a DB. We extend it to file append too, in the same way.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8474
      
      Test Plan: manually run crash test with various setting and make sure the failures are triggered as expected.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29503116
      
      fbshipit-source-id: e73a446e80ccbd09301a579280e56ff949381fab
      ba224b75
    • M
      Fix PrepareOptions for Customizable Classes (#8468) · 41c4b665
      mrambacher 提交于
      Summary:
      Added the Customizable::ConfigureNewObject method.  The method will configure the object if options are found and invoke PrepareOptions if the flag is set properly.
      
      Added tests to test that PrepareOptions is properly called and to test if PrepareOptions fails.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8468
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29494703
      
      Pulled By: mrambacher
      
      fbshipit-source-id: d5767dee5d7a98620ac66190262101cd0aa9d2b7
      41c4b665
    • A
      Fix assertion failure when releasing a handle after secondary cache lookup fails (#8470) · a0cbb694
      anand76 提交于
      Summary:
      When the secondary cache lookup fails, we may still allocate a handle and charge the cache for metadata usage. If the cache is full, this can cause the usage to go over capacity. Later, when a (unrelated) handle is released, it trips up an assertion that checks that usage is less than capacity. To prevent this assertion failure, don't charge the cache for a failed secondary cache lookup.
      
      Tests:
      Run crash_test
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8470
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29474713
      
      Pulled By: anand1976
      
      fbshipit-source-id: 27191969c95470a7b070d292b458efce71395bf2
      a0cbb694
  8. 30 6月, 2021 3 次提交
  9. 29 6月, 2021 3 次提交
  10. 28 6月, 2021 2 次提交
    • M
      Add BlobMetaData retrieval methods (#8273) · be219089
      mrambacher 提交于
      Summary:
      Added BlobMetaData to ColumnFamilyMetaData and LiveBlobMetaData and DB API GetLiveBlobMetaData to retrieve it.
      
      First pass at struct.  More tests and maybe fields to come...
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8273
      
      Reviewed By: ltamasi
      
      Differential Revision: D29102400
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 8a2383a4446328be6b91dced9841fdd3dfc80b73
      be219089
    • A
      Allow db_stress to use a secondary cache (#8455) · 6f9ed59b
      anand76 提交于
      Summary:
      Add a ```-secondary_cache_uri``` to db_stress to allow the user to specify a custom ```SecondaryCache``` object from the object registry. Also allow db_crashtest.py to be run with an alternate db_stress location. Together, these changes will allow us to run db_stress using FB internal components.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8455
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29371972
      
      Pulled By: anand1976
      
      fbshipit-source-id: dd1b1fd80ebbedc11aa63d9246ea6ae49edb77c4
      6f9ed59b
  11. 26 6月, 2021 1 次提交
  12. 25 6月, 2021 6 次提交
    • Z
      Using existing crc32c checksum in checksum handoff for Manifest and WAL (#8412) · a904c62d
      Zhichao Cao 提交于
      Summary:
      In PR https://github.com/facebook/rocksdb/issues/7523 , checksum handoff is introduced in RocksDB for WAL, Manifest, and SST files. When user enable checksum handoff for a certain type of file, before the data is written to the lower layer storage system, we calculate the checksum (crc32c) of each piece of data and pass the checksum down with the data, such that data verification can be down by the lower layer storage system if it has the capability. However, it cannot cover the whole lifetime of the data in the memory and also it potentially introduces extra checksum calculation overhead.
      
      In this PR, we introduce a new interface in WritableFileWriter::Append, which allows the caller be able to pass the data and the checksum (crc32c) together. In this way, WritableFileWriter can directly use the pass-in checksum (crc32c) to generate the checksum of data being passed down to the storage system. It saves the calculation overhead and achieves higher protection coverage. When a new checksum is added with the data, we use Crc32cCombine https://github.com/facebook/rocksdb/issues/8305 to combine the existing checksum and the new checksum. To avoid the segmenting of data by rate-limiter before it is stored, rate-limiter is called enough times to accumulate enough credits for a certain write. This design only support Manifest and WAL which use log_writer in the current stage.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8412
      
      Test Plan: make check, add new testing cases.
      
      Reviewed By: anand1976
      
      Differential Revision: D29151545
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 75e2278c5126cfd58393c67b1efd18dcc7a30772
      a904c62d
    • A
      add missing fields to `GetLiveFilesMetaData()` (#8460) · 3d844dff
      Andrew Kryczka 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8460
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29381865
      
      Pulled By: ajkr
      
      fbshipit-source-id: 47ba54c25f3cc039d72ea32e1df20875795683b3
      3d844dff
    • A
      Add support for Merge with base value during Compaction in IntegratedBlobDB (#8445) · 95d0ee95
      Akanksha Mahajan 提交于
      Summary:
      Provide support for Merge operation with base values during
      Compaction in IntegratedBlobDB.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8445
      
      Test Plan: Add new unit test
      
      Reviewed By: ltamasi
      
      Differential Revision: D29343949
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 844f6f02f93388a11e6e08bda7bb3a2a28e47c70
      95d0ee95
    • L
      Update HISTORY.md for PR 8450 (#8458) · 66b62a12
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8458
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29378728
      
      Pulled By: ltamasi
      
      fbshipit-source-id: d5a40b1414500f53823763be5c2bfce8db04daf8
      66b62a12
    • L
      Log the amount of blob garbage generated by compactions in the MANIFEST (#8450) · 68d8b283
      Levi Tamasi 提交于
      Summary:
      The patch builds on `BlobGarbageMeter` and `BlobCountingIterator`
      (introduced in https://github.com/facebook/rocksdb/issues/8426 and
      https://github.com/facebook/rocksdb/issues/8443 respectively)
      and ties it all together. It measures the amount of garbage
      generated by a compaction and logs the corresponding `BlobFileGarbage`
      records as part of the compaction job's `VersionEdit`. Note: in order
      to have accurate results, `kRemoveAndSkipUntil` for compaction filters
      is implemented using iteration.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8450
      
      Test Plan: Ran `make check` and the crash test script.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29338207
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 4381c432ac215139439f6d6fb801a6c0e4d8c128
      68d8b283
    • P
      Add more ops to: db_bench -report_file_operations (#8448) · 75741eb0
      Peter (Stig) Edwards 提交于
      Summary:
      Hello and thanks for RocksDB,
      
      Here is a PR to add file deletes, renames and ```Flush()```, ```Sync()```, ```Fsync()``` and ```Close()``` to file ops report.
      
      The reason is to help tune RocksDB options when using an env/filesystem with high latencies for file level ("metadata") operations, typically seen during ```DB::Open``` (```db_bench -num 0``` also see https://github.com/facebook/rocksdb/pull/7203 where IOTracing does not trace ```DB::Open```).
      
      Before:
      ```
      > db_bench -benchmarks updaterandom -num 0 -report_file_operations true
      ...
      Entries:    0
      ...
      Num files opened: 12
      Num Read(): 6
      Num Append(): 8
      Num bytes read: 6216
      Num bytes written: 6289
      ```
      After:
      ```
      > db_bench -benchmarks updaterandom -num 0 -report_file_operations true
      ...
      Entries:    0
      ...
      Num files opened: 12
      Num files deleted: 3
      Num files renamed: 4
      Num Flush(): 10
      Num Sync(): 5
      Num Fsync(): 1
      Num Close(): 2
      Num Read(): 6
      Num Append(): 8
      Num bytes read: 6216
      Num bytes written: 6289
      ```
      
      Before:
      ```
      > db_bench -benchmarks updaterandom -report_file_operations true
      ...
      Entries:    1000000
      ...
      Num files opened: 18
      Num Read(): 396339
      Num Append(): 1000058
      Num bytes read: 892030224
      Num bytes written: 187569238
      ```
      After:
      ```
      > db_bench -benchmarks updaterandom -report_file_operations true
      ...
      Entries:    1000000
      ...
      Num files opened: 18
      Num files deleted: 5
      Num files renamed: 4
      Num Flush(): 1000068
      Num Sync(): 9
      Num Fsync(): 1
      Num Close(): 6
      Num Read(): 396339
      Num Append(): 1000058
      Num bytes read: 892030224
      Num bytes written: 187569238
      ```
      
      Another example showing how using ```DB::OpenForReadOnly``` reduces file operations compared to ```((Optimistic)Transaction)DB::Open```:
      
      ```
      > db_bench -benchmarks updaterandom -num 1
      > db_bench -benchmarks updaterandom -num 0 -use_existing_db true -readonly true -report_file_operations true
      ...
      Entries:    0
      ...
      Num files opened: 8
      Num files deleted: 0
      Num files renamed: 0
      Num Flush(): 0
      Num Sync(): 0
      Num Fsync(): 0
      Num Close(): 0
      Num Read(): 13
      Num Append(): 0
      Num bytes read: 374
      Num bytes written: 0
      ```
      
      ```
      > db_bench -benchmarks updaterandom -num 1
      > db_bench -benchmarks updaterandom -num 0 -use_existing_db true -report_file_operations true
      ...
      Entries:    0
      ...
      Num files opened: 14
      Num files deleted: 3
      Num files renamed: 4
      Num Flush(): 14
      Num Sync(): 5
      Num Fsync(): 1
      Num Close(): 3
      Num Read(): 11
      Num Append(): 10
      Num bytes read: 7291
      Num bytes written: 7357
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8448
      
      Reviewed By: anand1976
      
      Differential Revision: D29333818
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: a06a8c87f799806462319115195b3e94faf5f542
      75741eb0
  13. 24 6月, 2021 1 次提交