1. 24 4月, 2020 1 次提交
  2. 30 3月, 2020 1 次提交
  3. 12 3月, 2020 1 次提交
    • C
      Cache result of GetLogicalBufferSize in Linux (#6457) · 2d9efc9a
      Cheng Chang 提交于
      Summary:
      In Linux, when reopening DB with many SST files, profiling shows that 100% system cpu time spent for a couple of seconds for `GetLogicalBufferSize`. This slows down MyRocks' recovery time when site is down.
      
      This PR introduces two new APIs:
      1. `Env::RegisterDbPaths` and `Env::UnregisterDbPaths` lets `DB` tell the env when it starts or stops using its database directories . The `PosixFileSystem` takes this opportunity to set up a cache from database directories to the corresponding logical block sizes.
      2. `LogicalBlockSizeCache` is defined only for OS_LINUX to cache the logical block sizes.
      
      Other modifications:
      1. rename `logical buffer size` to `logical block size` to be consistent with Linux terms.
      2. declare `GetLogicalBlockSize` in `PosixHelper` to expose it to `PosixFileSystem`.
      3. change the functions `IOError` and `IOStatus` in `env/io_posix.h` to have external linkage since they are used in other translation units too.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6457
      
      Test Plan:
      1. A new unit test is added for `LogicalBlockSizeCache` in `env/io_posix_test.cc`.
      2. A new integration test is added for `DB` operations related to the cache in `db/db_logical_block_size_cache_test.cc`.
      
      `make check`
      
      Differential Revision: D20131243
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 3077c50f8065c0bffb544d8f49fb10bba9408d04
      2d9efc9a
  4. 22 2月, 2020 1 次提交
    • S
      Handle io_uring partial results (#6441) · 942eaba0
      sdong 提交于
      Summary:
      The logic that handles io_uring partial results was wrong. Fix the logic by putting it into a queue and continue reading.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6441
      
      Test Plan: Make sure this patch fixes the application test case where the bug was discovered; in env_test, add a unit test that simulates partial results and make sure the results are still correct.
      
      Differential Revision: D20018616
      
      fbshipit-source-id: 5398a7e34d74c26d52aa69dfd604e93e95d99c62
      942eaba0
  5. 21 2月, 2020 1 次提交
    • S
      Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) · fdf882de
      sdong 提交于
      Summary:
      When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
      
      Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
      
      Differential Revision: D19977691
      
      fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
      fdf882de
  6. 14 12月, 2019 1 次提交
    • A
      Introduce a new storage specific Env API (#5761) · afa2420c
      anand76 提交于
      Summary:
      The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc.
      
      This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO.
      
      The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before.
      
      This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection.
      
      The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761
      
      Differential Revision: D18868376
      
      Pulled By: anand1976
      
      fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
      afa2420c
  7. 10 12月, 2019 1 次提交
    • S
      Fix an asan warning caused by the recent io_uring change (#6135) · d1ae2c3f
      sdong 提交于
      Summary:
      ASAN reports:
      
      internal_repo_rocksdb/repo:db_test - MultiThreaded/MultiThreadedDBTest.MultiThreaded/43: fatal
      ==2692739==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6130000500ca at pc 0x0000006be780 bp 0x7efef85ccd20 sp 0x7efef85cc4d0
      [CONTEXT] === How to use this, how to get the raw stack trace, and more: fburl.com/ASAN ===
      [CONTEXT] READ of size 331 at 0x6130000500ca thread T195
      [CONTEXT]      #0 db_test_bin+0x6be77f                     __interceptor_strlen.part.35
      [CONTEXT]      https://github.com/facebook/rocksdb/issues/1 internal_repo_rocksdb/repo/include/rocksdb/slice.h:55 rocksdb::Slice::Slice(char const*)
      [CONTEXT]      https://github.com/facebook/rocksdb/issues/2 internal_repo_rocksdb/repo/env/io_posix.cc:522 rocksdb::PosixRandomAccessFile::MultiRead(rocksdb::ReadRequest*, unsigned long)
      
      I looked at env/io_posix.cc:522 but don't see a reason why the line needs to be there at all, because it is not used before overwritten. So it must be a line that is put there as a bug. Remove it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6135
      
      Test Plan: Rerun the same test which passes after the fix. Run all the tests and make sure they all pass.
      
      Differential Revision: D18880251
      
      fbshipit-source-id: 3b84ac6a05b67b529c4202e0ceb4c047460f44f2
      d1ae2c3f
  8. 08 12月, 2019 1 次提交
    • S
      PosixRandomAccessFile::MultiRead() to use I/O uring if supported (#5881) · e3a82bb9
      sdong 提交于
      Summary:
      Right now, PosixRandomAccessFile::MultiRead() executes read requests in parallel. In this PR, it leverages I/O Uring library to run it in parallel, even when page cache is enabled. This function will fall back if the kernel version doesn't support it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5881
      
      Test Plan: Run the unit test on a kernel version supporting it and make sure all tests pass, and run a unit test on kernel version supporting it and see it pass. Before merging, will also run stress test and see it passes.
      
      Differential Revision: D17742266
      
      fbshipit-source-id: e05699c925ac04fdb42379456a4e23e4ebcb803a
      e3a82bb9
  9. 06 9月, 2019 1 次提交
  10. 17 8月, 2019 1 次提交
  11. 31 7月, 2019 1 次提交
  12. 10 7月, 2019 1 次提交
  13. 14 6月, 2019 1 次提交
    • A
      Dynamic test whether sync_file_range returns ENOSYS (#5416) · 2c9df9f9
      Andrew Kryczka 提交于
      Summary:
      `sync_file_range` returns `ENOSYS` on Windows Subsystem for Linux even
      when using a supposedly supported filesystem like ext4. To handle this
      case we can do a dynamic check that a no-op `sync_file_range`
      invocation, which is accomplished by passing zero for the `flags`
      argument, succeeds.
      
      Also I rearranged the function and comments to hopefully make it more
      easily understandable.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5416
      
      Differential Revision: D15807061
      
      fbshipit-source-id: d31d94e1f228b7850ea500e6199f8b5daf8cfbd3
      2c9df9f9
  14. 01 6月, 2019 1 次提交
  15. 31 5月, 2019 2 次提交
  16. 16 5月, 2019 1 次提交
  17. 23 4月, 2019 1 次提交
    • A
      Optionally wait on bytes_per_sync to smooth I/O (#5183) · 8272a6de
      Andrew Kryczka 提交于
      Summary:
      The existing implementation does not guarantee bytes reach disk every `bytes_per_sync` when writing SST files, or every `wal_bytes_per_sync` when writing WALs. This can cause confusing behavior for users who enable this feature to avoid large syncs during flush and compaction, but then end up hitting them anyways.
      
      My understanding of the existing behavior is we used `sync_file_range` with `SYNC_FILE_RANGE_WRITE` to submit ranges for async writeback, such that we could continue processing the next range of bytes while that I/O is happening. I believe we can preserve that benefit while also limiting how far the processing can get ahead of the I/O, which prevents huge syncs from happening when the file finishes.
      
      Consider this `sync_file_range` usage: `sync_file_range(fd_, 0, static_cast<off_t>(offset + nbytes), SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE)`. Expanding the range to start at 0 and adding the `SYNC_FILE_RANGE_WAIT_BEFORE` flag causes any pending writeback (like from a previous call to `sync_file_range`) to finish before it proceeds to submit the latest `nbytes` for writeback. The latest `nbytes` are still written back asynchronously, unless processing exceeds I/O speed, in which case the following `sync_file_range` will need to wait on it.
      
      There is a second change in this PR to use `fdatasync` when `sync_file_range` is unavailable (determined statically) or has some known problem with the underlying filesystem (determined dynamically).
      
      The above two changes only apply when the user enables a new option, `strict_bytes_per_sync`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5183
      
      Differential Revision: D14953553
      
      Pulled By: siying
      
      fbshipit-source-id: 445c3862e019fb7b470f9c7f314fc231b62706e9
      8272a6de
  18. 26 3月, 2019 1 次提交
    • Z
      remove incorrect assert in `GetUniqueIdFromFile` (#5102) · 3c5eed5e
      Zhongyi Xie 提交于
      Summary:
      User report has shown that sometimes `BlockBasedTable::SetupCacheKeyPrefix` would assert when trying to generate an id from the file. The actual cause seems to be hardware related but we might be better off without the incorrect assertion
      See T42178927 for more information
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5102
      
      Differential Revision: D14604677
      
      Pulled By: miasantreble
      
      fbshipit-source-id: fcb09207ebdc4fa66e941afbc0523d84797e7ad7
      3c5eed5e
  19. 22 3月, 2019 1 次提交
    • R
      Make it easier for users to load options from option file and set shared block cache. (#5063) · a4396f92
      Rashmi Sharma 提交于
      Summary:
      [RocksDB] Make it easier for users to load options from option file and set shared block cache.
      Right now, it requires several dynamic casting for users to set the shared block cache to their option struct cast from the option file.
      If people don't do that, every CF of every DB will generate its own 8MB block cache. It's not a usable setting. So we are dragging every user who loads options from the file into such a mess.
      Instead, we should allow them to pass their cache object to LoadLatestOptions() and LoadOptionsFromFile(), so that those loaded option structs will have the shared block cache.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5063
      
      Differential Revision: D14518584
      
      Pulled By: rashmishrm
      
      fbshipit-source-id: c91430ff9425a0e67d76fc67931d755f491ca5aa
      a4396f92
  20. 05 3月, 2019 1 次提交
    • A
      Use `fallocate` even if hole-punching unsupported (#5023) · 186b3afa
      Andrew Kryczka 提交于
      Summary:
      The compiler flag `-DROCKSDB_FALLOCATE_PRESENT` was only set when
      `fallocate`, `FALLOC_FL_KEEP_SIZE`, and `FALLOC_FL_PUNCH_HOLE` were all
      present. However, the last of the three is not really necessary for the
      primary `fallocate` use case; furthermore, it was introduced only in later
      Linux kernel versions (2.6.38+).
      
      This PR changes the flag `-DROCKSDB_FALLOCATE_PRESENT` to only require
      `fallocate` and `FALLOC_FL_KEEP_SIZE` to be present. There is a separate
      check for `FALLOC_FL_PUNCH_HOLE` only in the place where it is used.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5023
      
      Differential Revision: D14248487
      
      Pulled By: siying
      
      fbshipit-source-id: a10ed0b902fa755988e957bd2dcec9081ec0502e
      186b3afa
  21. 01 2月, 2019 1 次提交
    • Y
      fix for nvme device path (#4866) · 4091597c
      Young Tack Jin 提交于
      Summary:
      nvme device path doesn't have "block" as like "nvme/nvme0/nvme0n1"
      or "nvme/nvme0/nvme0n1/nvme0n1p1". the last directory such as
      "nvme0n1p1" should be removed if nvme drive is partitioned.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4866
      
      Differential Revision: D13627824
      
      Pulled By: riversand963
      
      fbshipit-source-id: 09ab968f349f3dbb890beea20193f1359b17d317
      4091597c
  22. 22 6月, 2018 1 次提交
  23. 26 5月, 2018 1 次提交
  24. 25 5月, 2018 1 次提交
    • D
      Catchup with posix features · 3db8504c
      Dmitri Smirnov 提交于
      Summary:
      Catch up with Posix features
        NewWritableRWFile must fail when file does not exists
        Implement Env::Truncate()
        Adjust Env options optimization functions
        Implement MemoryMappedBuffer on Windows.
      Closes https://github.com/facebook/rocksdb/pull/3857
      
      Differential Revision: D8053610
      
      Pulled By: ajkr
      
      fbshipit-source-id: ccd0d46c29648a9f6f496873bc1c9d6c5547487e
      3db8504c
  25. 01 5月, 2018 1 次提交
    • A
      Second attempt at db_stress crash-recovery verification · 46152d53
      Andrew Kryczka 提交于
      Summary:
      - Original commit: a4fb1f8c
      - Revert commit (we reverted as a quick fix to get crash tests passing): 6afe22db
      
      This PR includes the contents of the original commit plus two bug fixes, which are:
      
      - In whitebox crash test, only set `--expected_values_path` for `db_stress` runs in the first half of the crash test's duration. In the second half, a fresh DB is created for each `db_stress` run, so we cannot maintain expected state across `db_stress` runs.
      - Made `Exists()` return true for `UNKNOWN_SENTINEL` values. I previously had an assert in `Exists()` that value was not `UNKNOWN_SENTINEL`. But it is possible for post-crash-recovery expected values to be `UNKNOWN_SENTINEL` (i.e., if the crash happens in the middle of an update), in which case this assertion would be tripped. The effect of returning true in this case is there may be cases where a `SingleDelete` deletes no data. But if we had returned false, the effect would be calling `SingleDelete` on a key with multiple older versions, which is not supported.
      Closes https://github.com/facebook/rocksdb/pull/3793
      
      Differential Revision: D7811671
      
      Pulled By: ajkr
      
      fbshipit-source-id: 67e0295bfb1695ff9674837f2e05bb29c50efc30
      46152d53
  26. 28 4月, 2018 1 次提交
  27. 25 4月, 2018 1 次提交
    • A
      Add crash-recovery correctness check to db_stress · a4fb1f8c
      Andrew Kryczka 提交于
      Summary:
      Previously, our `db_stress` tool held the expected state of the DB in-memory, so after crash-recovery, there was no way to verify data correctness. This PR adds an option, `--expected_values_file`, which specifies a file holding the expected values.
      
      In black-box testing, the `db_stress` process can be killed arbitrarily, so updates to the `--expected_values_file` must be atomic. We achieve this by `mmap`ing the file and relying on `std::atomic<uint32_t>` for atomicity. Actually this doesn't provide a total guarantee on what we want as `std::atomic<uint32_t>` could, in theory, be translated into multiple stores surrounded by a mutex. We can verify our assumption by looking at `std::atomic::is_always_lock_free`.
      
      For the `mmap`'d file, we didn't have an existing way to expose its contents as a raw memory buffer. This PR adds it in the `Env::NewMemoryMappedFileBuffer` function, and `MemoryMappedFileBuffer` class.
      
      `db_crashtest.py` is updated to use an expected values file for black-box testing. On the first iteration (when the DB is created), an empty file is provided as `db_stress` will populate it when it runs. On subsequent iterations, that same filename is provided so `db_stress` can check the data is as expected on startup.
      Closes https://github.com/facebook/rocksdb/pull/3629
      
      Differential Revision: D7463144
      
      Pulled By: ajkr
      
      fbshipit-source-id: c8f3e82c93e045a90055e2468316be155633bd8b
      a4fb1f8c
  28. 16 4月, 2018 1 次提交
  29. 13 4月, 2018 1 次提交
  30. 06 3月, 2018 1 次提交
  31. 23 2月, 2018 2 次提交
  32. 03 2月, 2018 1 次提交
  33. 16 11月, 2017 1 次提交
    • Y
      Suppress valgrind "unimplemented functionality" error · bbcd3b0b
      Yi Wu 提交于
      Summary:
      Add ROCKSDB_VALGRIND_RUN macro and suppress false-positive "unimplemented functionality" throw by valgrind for steam hints.
      
      Another approach would be add a valgrind suppress file. Valgrind is suppose to print the suppression when given "--gen-suppressions=all" param, which is suppose to be the content for the suppression file. But it doesn't print.
      Closes https://github.com/facebook/rocksdb/pull/3174
      
      Differential Revision: D6338786
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 3559efa5f3b92d40d09ad6ac82bc7b59f86c75aa
      bbcd3b0b
  34. 11 11月, 2017 1 次提交
  35. 22 7月, 2017 2 次提交
  36. 16 7月, 2017 1 次提交
  37. 13 7月, 2017 1 次提交