1. 28 3月, 2020 1 次提交
    • Z
      Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487) · 42468881
      Zhichao Cao 提交于
      Summary:
      In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
      
      The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
      
      Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
      
      Reviewed By: anand1976
      
      Differential Revision: D20685017
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
      42468881
  2. 27 3月, 2020 3 次提交
    • L
      Add blob files to VersionStorageInfo/VersionBuilder (#6597) · 6f62322f
      Levi Tamasi 提交于
      Summary:
      The patch adds a couple of classes to represent metadata about
      blob files: `SharedBlobFileMetaData` contains the information elements
      that are immutable (once the blob file is closed), e.g. blob file number,
      total number and size of blob files, checksum method/value, while
      `BlobFileMetaData` contains attributes that can vary across versions like
      the amount of garbage in the file. There is a single `SharedBlobFileMetaData`
      for each blob file, which is jointly owned by the `BlobFileMetaData` objects
      that point to it; `BlobFileMetaData` objects, in turn, are owned by `Version`s
      and can also be shared if the (immutable _and_ mutable) state of the blob file
      is the same in two versions.
      
      In addition, the patch adds the blob file metadata to `VersionStorageInfo`, and extends
      `VersionBuilder` so that it can apply blob file related `VersionEdit`s (i.e. those
      containing `BlobFileAddition`s and/or `BlobFileGarbage`), and save blob file metadata
      to a new `VersionStorageInfo`. Consistency checks are also extended to ensure
      that table files point to blob files that are part of the `Version`, and that all blob files
      that are part of any given `Version` have at least some _non_-garbage data in them.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6597
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D20656803
      
      Pulled By: ltamasi
      
      fbshipit-source-id: f1f74d135045b3b42d0146f03ee576ef0a4bfd80
      6f62322f
    • L
      Use function objects as deleters in the block cache (#6545) · 6301dbe7
      Levi Tamasi 提交于
      Summary:
      As the first step of reintroducing eviction statistics for the block
      cache, the patch switches from using simple function pointers as deleters
      to function objects implementing an interface. This will enable using
      deleters that have state, like a smart pointer to the statistics object
      that is to be updated when an entry is removed from the cache. For now,
      the patch adds a deleter template class `SimpleDeleter`, which simply
      casts the `value` pointer to its original type and calls `delete` or
      `delete[]` on it as appropriate. Note: to prevent object lifecycle
      issues, deleters must outlive the cache entries referring to them;
      `SimpleDeleter` ensures this by using the ("leaky") Meyers singleton
      pattern.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6545
      
      Test Plan: `make asan_check`
      
      Reviewed By: siying
      
      Differential Revision: D20475823
      
      Pulled By: ltamasi
      
      fbshipit-source-id: fe354c33dd96d9bafc094605462352305449a22a
      6301dbe7
    • M
      Fix iterator reading filter block despite read_tier == kBlockCacheTier (#6562) · 963af52f
      Mike Kolupaev 提交于
      Summary:
      We're seeing iterators with `ReadOptions::read_tier == kBlockCacheTier` sometimes doing file reads. Stack trace:
      
      ```
      rocksdb::RandomAccessFileReader::Read(unsigned long, unsigned long, rocksdb::Slice*, char*, bool) const
      rocksdb::BlockFetcher::ReadBlockContents()
      rocksdb::Status rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<rocksdb::ParsedFullFilterBlock>(rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>*, rocksdb::BlockType, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, rocksdb::BlockContents*) const
      rocksdb::Status rocksdb::BlockBasedTable::RetrieveBlock<rocksdb::ParsedFullFilterBlock>(rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>*, rocksdb::BlockType, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, bool, bool) const
      rocksdb::FilterBlockReaderCommon<rocksdb::ParsedFullFilterBlock>::ReadFilterBlock(rocksdb::BlockBasedTable const*, rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, bool, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>*)
      rocksdb::FilterBlockReaderCommon<rocksdb::ParsedFullFilterBlock>::GetOrReadFilterBlock(bool, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>*) const
      rocksdb::FullFilterBlockReader::MayMatch(rocksdb::Slice const&, bool, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*) const
      rocksdb::FullFilterBlockReader::RangeMayExist(rocksdb::Slice const*, rocksdb::Slice const&, rocksdb::SliceTransform const*, rocksdb::Comparator const*, rocksdb::Slice const*, bool*, bool, rocksdb::BlockCacheLookupContext*)
      rocksdb::BlockBasedTable::PrefixMayMatch(rocksdb::Slice const&, rocksdb::ReadOptions const&, rocksdb::SliceTransform const*, bool, rocksdb::BlockCacheLookupContext*) const
      rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::SeekImpl(rocksdb::Slice const*)
      rocksdb::ForwardIterator::SeekInternal(rocksdb::Slice const&, bool)
      rocksdb::DBIter::Seek(rocksdb::Slice const&)
      ```
      
      `BlockBasedTableIterator::CheckPrefixMayMatch` was missing a check for `kBlockCacheTier`. This PR adds it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6562
      
      Test Plan: deployed it to a logdevice test cluster and looked at logdevice's IO tracing.
      
      Reviewed By: siying
      
      Differential Revision: D20529368
      
      Pulled By: al13n321
      
      fbshipit-source-id: 65bf33964b1951464415c900336635fb20919611
      963af52f
  3. 25 3月, 2020 3 次提交
    • S
      CompactRange() to use bottom pool when goes to bottommost level (#6593) · 6fd0ed49
      sdong 提交于
      Summary:
      In automatic compaction, if a compaction is bottommost, it goes to bottom thread pool. We should do the same for manual compaction too.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6593
      
      Test Plan: Add a unit test. See all existing tests pass.
      
      Reviewed By: ajkr
      
      Differential Revision: D20637408
      
      fbshipit-source-id: cb03031e8f895085f7acf6d2d65e69e84c9ddef3
      6fd0ed49
    • H
      multiget support for timestamps (#6483) · a6ce5c82
      Huisheng Liu 提交于
      Summary:
      Add timestamp support for MultiGet().
      timestamp from readoptions is honored, and timestamps can be returned along with values.
      
      MultiReadRandom perf test (10 minutes) on the same development machine ram drive with the same DB data shows no regression (within marge of error). The test is adapted from https://github.com/facebook/rocksdb/wiki/RocksDB-In-Memory-Workload-Performance-Benchmarks.
      base line (commit 17bef7d3):
        multireadrandom :     104.173 micros/op 307167 ops/sec; (5462999 of 5462999 found)
      This PR:
        multireadrandom :     104.199 micros/op 307095 ops/sec; (5307999 of 5307999 found)
      
      .\db_bench --db=r:\rocksdb.github --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --cache_size=2147483648 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=r:\rocksdb.github\WAL_LOG --sync=0 --verify_checksum=1 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --duration=600 --benchmarks=multireadrandom --use_existing_db=1 --num=25000000 --threads=32 --allow_concurrent_memtable_write=0
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6483
      
      Reviewed By: anand1976
      
      Differential Revision: D20498373
      
      Pulled By: riversand963
      
      fbshipit-source-id: 8505f22bc40fd791bc7dd05e48d7e67c91edb627
      a6ce5c82
    • S
      Fix bug that number of table loading threads is set as a boolean (#6576) · 921cdd37
      sdong 提交于
      Summary:
      When applying a new version in non DB open case, optimize_filters_for_hits is used for max_threads, which is clearly a bug. It is not clear what the indented value in the first place, but it value 1 makes sense here, which would create no extra threads. This bug is not expected to cause user visible problems, assuming C++ implicitly cast bool to 0 or 1.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6576
      
      Test Plan: Run all exsiting test.
      
      Reviewed By: ajkr
      
      Differential Revision: D20602467
      
      fbshipit-source-id: 40b2cd8619aba09ae9242b36c415464db3c9b737
      921cdd37
  4. 24 3月, 2020 4 次提交
    • A
      Simplify migration to FileSystem API (#6552) · a9d168cf
      anand76 提交于
      Summary:
      The current Env/FileSystem API separation has a couple of issues -
      1. It requires the user to specify 2 options - ```Options::env``` and ```Options::file_system``` - which means they have to make code changes to benefit from the new APIs. Furthermore, there is a risk of accessing the same APIs in two different ways, through Env in the old way and through FileSystem in the new way. The two may not always match, for example, if env is ```PosixEnv``` and FileSystem is a custom implementation. Any stray RocksDB calls to env will use the ```PosixEnv``` implementation rather than the file_system implementation.
      2. There needs to be a simple way for the FileSystem developer to instantiate an Env for backward compatibility purposes.
      
      This PR solves the above issues and simplifies the migration in the following ways -
      1. Embed a shared_ptr to the ```FileSystem``` in the ```Env```, and remove ```Options::file_system``` as a configurable option. This way, no code changes will be required in application code to benefit from the new API. The default Env constructor uses a ```LegacyFileSystemWrapper``` as the embedded ```FileSystem```.
      1a. - This also makes it more robust by ensuring that even if RocksDB
        has some stray calls to Env APIs rather than FileSystem, they will go
        through the same object and thus there is no risk of getting out of
        sync.
      2. Provide a ```NewCompositeEnv()``` API that can be used to construct a
      PosixEnv with a custom FileSystem implementation. This eliminates an
      indirection to call Env APIs, and relieves the FileSystem developer of
      the burden of having to implement wrappers for the Env APIs.
      3. Add a couple of missing FileSystem APIs - ```SanitizeEnvOptions()``` and
      ```NewLogger()```
      
      Tests:
      1. New unit tests
      2. make check and make asan_check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6552
      
      Reviewed By: riversand963
      
      Differential Revision: D20592038
      
      Pulled By: anand1976
      
      fbshipit-source-id: c3801ad4153f96d21d5a3ae26c92ba454d1bf1f7
      a9d168cf
    • Z
      Fix the MultiGet testing failure in Circleci (#6578) · d300d109
      Zhichao Cao 提交于
      Summary:
      The MultiGet test in db_basic_test fails in CircleCI vs2019. The reason is that even Snappy compression is enabled, the first compression type is still kNoCompression. This PR checks the list and ensure that only when compression is enable and the compression type is valid, compression will be enabled. Such that, it will not fail the combined read test in MultiGet.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6578
      
      Test Plan: make check, db_basic_test.
      
      Reviewed By: anand1976
      
      Differential Revision: D20607529
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: dcead264d5c2da105912c18caad34b8510bb04b0
      d300d109
    • Y
      Fix LITE build (#6575) · 617f4792
      Yanqin Jin 提交于
      Summary:
      Fix LITE build by excluding some unit tests that use features not supported in LITE.
      ```
      db/db_basic_test.cc:1778:8: error: ‘void rocksdb::{anonymous}::TableFileListener::OnTableFileCreated(const rocksdb::TableFileCreationInfo&)’ marked ‘override’, but does not override
         void OnTableFileCreated(const TableFileCreationInfo& info) override {
              ^~~~~~~~~~~~~~~~~~
      make: *** [db/db_basic_test.o] Error 1
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6575
      
      Reviewed By: ltamasi
      
      Differential Revision: D20598598
      
      Pulled By: riversand963
      
      fbshipit-source-id: 367f7cb2500360ad57030b138a94c0f731a04339
      617f4792
    • Z
      Revert "Added the safe-to-ignore tag to version_edit (#6530)" (#6569) · 5c6346c4
      Zhichao Cao 提交于
      Summary:
      This reverts commit e10553f2.
      
      Pass make asan_check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6569
      
      Reviewed By: riversand963
      
      Differential Revision: D20574319
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: ce36981a21596f5f2e14da6a59a2bb3619509a8b
      5c6346c4
  5. 21 3月, 2020 2 次提交
    • Y
      Attempt to recover from db with missing table files (#6334) · fb09ef05
      Yanqin Jin 提交于
      Summary:
      There are situations when RocksDB tries to recover, but the db is in an inconsistent state due to SST files referenced in the MANIFEST being missing. In this case, previous RocksDB will just fail the recovery and return a non-ok status.
      This PR enables another possibility. During recovery, RocksDB checks possible MANIFEST files, and try to recover to the most recent state without missing table file. `VersionSet::Recover()` applies version edits incrementally and "materializes" a version only when this version does not reference any missing table file. After processing the entire MANIFEST, the version created last will be the latest version.
      `DBImpl::Recover()` calls `VersionSet::Recover()`. Afterwards, WAL replay will *not* be performed.
      To use this capability, set `options.best_efforts_recovery = true` when opening the db. Best-efforts recovery is currently incompatible with atomic flush.
      
      Test plan (on devserver):
      ```
      $make check
      $COMPILE_WITH_ASAN=1 make all && make check
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6334
      
      Reviewed By: anand1976
      
      Differential Revision: D19778960
      
      Pulled By: riversand963
      
      fbshipit-source-id: c27ea80f29bc952e7d3311ecf5ee9c54393b40a8
      fb09ef05
    • C
      Get block size only in direct IO mode (#6522) · 5fd152b7
      Cheng Chang 提交于
      Summary:
      When `use_direct_reads` and `use_direct_writes` are `false`, `logical_sector_size_` inside various `*File` implementations are not actually used, so `GetLogicalBlockSize` does not necessarily need to be called for `logical_sector_size_`, just set a default page size.
      
      This is a follow up PR for https://github.com/facebook/rocksdb/pull/6457.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6522
      
      Test Plan: make check
      
      Reviewed By: siying
      
      Differential Revision: D20408885
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: f2d3808f41265237e7fa2c0be9f084f8fa97fe3d
      5fd152b7
  6. 20 3月, 2020 2 次提交
    • Z
      Added the safe-to-ignore tag to version_edit (#6530) · e10553f2
      Zhichao Cao 提交于
      Summary:
      Each time RocksDB switches to a new MANIFEST file from old one, it calls WriteCurrentStateToManifest() which writes a 'snapshot' of the current in-memory state of versions to the beginning of the new manifest as a bunch of version edits. We can distinguish these version edits from other version edits written during normal operations with a custom, safe-to-ignore tag.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6530
      
      Test Plan: added test to version_edit_test, pass make asan_check
      
      Reviewed By: riversand963
      
      Differential Revision: D20524516
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: f1de102f5499bfa88dae3caa2f32c7f42cf904db
      e10553f2
    • L
      Clean up VersionBuilder a bit (#6556) · 44240455
      Levi Tamasi 提交于
      Summary:
      The whole point of the pimpl idiom is to hide implementation details.
      Internal helper methods like `CheckConsistency`, `CheckConsistencyForDeletes`,
      and `MaybeAddFile` do not belong in the public interface of the class.
      In addition, the patch switches to `unique_ptr` for the implementation
      object instead of using a raw `delete`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6556
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D20523568
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 5bbb0ccebd0c47a33b815398c7f9cfe13bd775ac
      44240455
  7. 18 3月, 2020 1 次提交
  8. 14 3月, 2020 3 次提交
  9. 13 3月, 2020 2 次提交
  10. 12 3月, 2020 4 次提交
    • C
      Cache result of GetLogicalBufferSize in Linux (#6457) · 2d9efc9a
      Cheng Chang 提交于
      Summary:
      In Linux, when reopening DB with many SST files, profiling shows that 100% system cpu time spent for a couple of seconds for `GetLogicalBufferSize`. This slows down MyRocks' recovery time when site is down.
      
      This PR introduces two new APIs:
      1. `Env::RegisterDbPaths` and `Env::UnregisterDbPaths` lets `DB` tell the env when it starts or stops using its database directories . The `PosixFileSystem` takes this opportunity to set up a cache from database directories to the corresponding logical block sizes.
      2. `LogicalBlockSizeCache` is defined only for OS_LINUX to cache the logical block sizes.
      
      Other modifications:
      1. rename `logical buffer size` to `logical block size` to be consistent with Linux terms.
      2. declare `GetLogicalBlockSize` in `PosixHelper` to expose it to `PosixFileSystem`.
      3. change the functions `IOError` and `IOStatus` in `env/io_posix.h` to have external linkage since they are used in other translation units too.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6457
      
      Test Plan:
      1. A new unit test is added for `LogicalBlockSizeCache` in `env/io_posix_test.cc`.
      2. A new integration test is added for `DB` operations related to the cache in `db/db_logical_block_size_cache_test.cc`.
      
      `make check`
      
      Differential Revision: D20131243
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 3077c50f8065c0bffb544d8f49fb10bba9408d04
      2d9efc9a
    • S
      Include more information in file lock failure (#6507) · 331e6199
      sdong 提交于
      Summary:
      When users fail to open a DB with file lock failure, it is sometimes hard for users to debug. We now include the time the lock is acquired and the thread ID that acquired the lock, to help users debug problems like this. Default Env's thread ID is used.
      
      Since type of lockedFiles is changed, rename it to follow naming convention too.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6507
      
      Test Plan: Add a unit test and improve an existing test to validate the case.
      
      Differential Revision: D20378333
      
      fbshipit-source-id: 312fe0e9733fd1d1e9969c321b90ce523cf4708a
      331e6199
    • L
      Disambiguate CustomFieldTags for the unity build (#6513) · 37a635cf
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6513
      
      Test Plan: `make unity_test`
      
      Differential Revision: D20388919
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 88dbceab0723a54ee3939e1644e13dc9a4c70420
      37a635cf
    • A
      Add ppc64le builds to Travis (#6144) · 8fc20ac4
      Adam Retter 提交于
      Summary:
      Let's see how this goes...
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6144
      
      Differential Revision: D20387515
      
      Pulled By: pdillinger
      
      fbshipit-source-id: ba2669348c267141dfddff910b4c2224a22cbb38
      8fc20ac4
  11. 11 3月, 2020 1 次提交
    • L
      Split BlobFileState into an immutable and a mutable part (#6502) · f5bc3b99
      Levi Tamasi 提交于
      Summary:
      It's never too soon to refactor something. The patch splits the recently
      introduced (`VersionEdit` related) `BlobFileState` into two classes
      `BlobFileAddition` and `BlobFileGarbage`. The idea is that once blob files
      are closed, they are immutable, and the only thing that changes is the
      amount of garbage in them. In the new design, `BlobFileAddition` contains
      the immutable attributes (currently, the count and total size of all blobs, checksum
      method, and checksum value), while `BlobFileGarbage` contains the mutable
      GC-related information elements (count and total size of garbage blobs). This is a
      better fit for the GC logic and is more consistent with how SST files are handled.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6502
      
      Test Plan: `make check`
      
      Differential Revision: D20348352
      
      Pulled By: ltamasi
      
      fbshipit-source-id: ff93f0121e80ab15e0e0a6525ba0d6af16a0e008
      f5bc3b99
  12. 10 3月, 2020 1 次提交
    • Y
      Support options.max_open_files != -1 with FIFO compaction (#6503) · fd1da221
      Yanqin Jin 提交于
      Summary:
      Allow user to specify options.max_open_files != -1 with FIFO compaction.
      If max_open_files != -1, not all table files are kept open.
      In the past, FIFO style compaction requires all table files to be open in order
      to read file creation time from table properties. Later, we added file creation
      time to MANIFEST, making it possible to read file creation time without opening
      file.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6503
      
      Test Plan: make check
      
      Differential Revision: D20353758
      
      Pulled By: riversand963
      
      fbshipit-source-id: ba5c61a648419e47e9ef6d74e0e280e3ee24f296
      fd1da221
  13. 07 3月, 2020 1 次提交
    • Y
      Iterator with timestamp (#6255) · d93812c9
      Yanqin Jin 提交于
      Summary:
      Preliminary support for iterator with user timestamp. Current implementation does not consider merge operator and reverse iterator. Auto compaction is also disabled in unit tests.
      
      Create an iterator with timestamp.
      ```
      ...
      read_opts.timestamp = &ts;
      auto* iter = db->NewIterator(read_opts);
      // target is key without timestamp.
      for (iter->Seek(target); iter->Valid(); iter->Next()) {}
      for (iter->SeekToFirst(); iter->Valid(); iter->Next()) {}
      delete iter;
      read_opts.timestamp = &ts1;
      // lower_bound and upper_bound are without timestamp.
      read_opts.iterate_lower_bound = &lower_bound;
      read_opts.iterate_upper_bound = &upper_bound;
      auto* iter1 = db->NewIterator(read_opts);
      // Do Seek or SeekToFirst()
      delete iter1;
      ```
      
      Test plan (dev server)
      ```
      $make check
      ```
      
      Simple benchmarking (dev server)
      1. The overhead introduced by this PR even when timestamp is disabled.
      key size: 16 bytes
      value size: 100 bytes
      Entries: 1000000
      Data reside in main memory, and try to stress iterator.
      Repeated three times on master and this PR.
      - Seek without next
      ```
      ./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3
      ```
      master: 159047.0 ops/sec
      this PR: 158922.3 ops/sec (2% drop in throughput)
      - Seek and next 10 times
      ```
      ./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3 -seek_nexts=10
      ```
      master: 109539.3 ops/sec
      this PR: 107519.7 ops/sec (2% drop in throughput)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6255
      
      Differential Revision: D19438227
      
      Pulled By: riversand963
      
      fbshipit-source-id: b66b4979486f8474619f4aa6bdd88598870b0746
      d93812c9
  14. 06 3月, 2020 1 次提交
  15. 05 3月, 2020 2 次提交
    • C
      Skip high levels with no key falling in the range in CompactRange (#6482) · afb97094
      Cheng Chang 提交于
      Summary:
      In CompactRange, if there is no key in memtable falling in the specified range, then flush is skipped.
      This PR extends this skipping logic to SST file levels: it starts compaction from the highest level (starting from L0) that has files with key falling in the specified range, instead of always starts from L0.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6482
      
      Test Plan:
      A new test ManualCompactionTest::SkipLevel is added.
      
      Also updated a test related to statistics of index block cache hit in db_test2, the index cache hit is increased by 1 in this PR because when checking overlap for the key range in L0, OverlapWithLevelIterator will do a seek in the table cache iterator, which will read from the cached index.
      
      Also updated db_compaction_test and db_test to use correct range for full compaction.
      
      Differential Revision: D20251149
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: f822157cf4796972bd5035d9d7178d8dfb7af08b
      afb97094
    • Z
      Introduce FaultInjectionTestFS to test fault File system instead of Env (#6414) · e62fe506
      Zhichao Cao 提交于
      Summary:
      In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR https://github.com/facebook/rocksdb/issues/5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller.
      
      A set of ErrorHandlerFSTests are introduced for testing
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6414
      
      Test Plan: pass make asan_check, pass error_handler_fs_test.
      
      Differential Revision: D20252421
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21
      e62fe506
  16. 04 3月, 2020 1 次提交
    • K
      s/const auto/const auto&/ when doing loop (#6477) · 03dbd11e
      Kefu Chai 提交于
      Summary:
      this silences following warning from clang-11
      ```
      rocksdb/db/db_impl/db_impl_compaction_flush.cc:1040:21: warning: loop variable 'newf' of type 'const std::pair<int, rocksdb::FileMetaData>' creates a copy from type 'const
      std::pair<int\
      , rocksdb::FileMetaData>' [-Wrange-loop-analysis]
          for (const auto newf : c->edit()->GetNewFiles()) {
                          ^
      rocksdb/db/db_impl/db_impl_compaction_flush.cc:1040:10: note: use reference type 'const std::pair<int, rocksdb::FileMetaData> &' to prevent copying
          for (const auto newf : c->edit()->GetNewFiles()) {
               ^~~~~~~~~~~~~~~~~
                          &
      ```
      Signed-off-by: NKefu Chai <tchaikov@gmail.com>
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6477
      
      Differential Revision: D20211850
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 3e89e13a12bba79f1b934d46b7c4c0576cdafb01
      03dbd11e
  17. 03 3月, 2020 3 次提交
    • S
      Fix data race of GetCreationTimeOfOldestFile() (#6473) · 17bef7d3
      sdong 提交于
      Summary:
      When DBImpl::GetCreationTimeOfOldestFile() calls Version::GetCreationTimeOfOldestFile(), the version is not directly or indirectly referenced, so an event like compaction can race with the operation and cause DBImpl::GetCreationTimeOfOldestFile() to access delocated data. This was caught by an ASAN run:
      
      ==268==ERROR: AddressSanitizer: heap-use-after-free on address 0x612000b7d198 at pc 0x000018332913 bp 0x7f391510d310 sp 0x7f391510d308
      READ of size 8 at 0x612000b7d198 thread T845 (store_load-33)
      SCARINESS: 51 (8-byte-read-heap-use-after-free)
          #0 0x18332912 in rocksdb::Version::GetCreationTimeOfOldestFile(unsigned long*) rocksdb/src/db/version_set.cc:1488
          https://github.com/facebook/rocksdb/issues/1 0x1803ddaa in rocksdb::DBImpl::GetCreationTimeOfOldestFile(unsigned long*) rocksdb/src/db/db_impl/db_impl.cc:4499
          https://github.com/facebook/rocksdb/issues/2 0xe24ca09 in rocksdb::StackableDB::GetCreationTimeOfOldestFile(unsigned long*) rocksdb/utilities/stackable_db.h:392
          ......
      0x612000b7d198 is located 216 bytes inside of 296-byte region [0x612000b7d0c0,0x612000b7d1e8)
      freed by thread T28 here:
          ......
          https://github.com/facebook/rocksdb/issues/5 0x1832c73f in std::vector<rocksdb::FileMetaData*, std::allocator<rocksdb::FileMetaData*> >::~vector() third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/stl_vector.h:435
          https://github.com/facebook/rocksdb/issues/6 0x1832c73f in rocksdb::VersionStorageInfo::~VersionStorageInfo() rocksdb/src/db/version_set.cc:734
          https://github.com/facebook/rocksdb/issues/7 0x1832cf42 in rocksdb::Version::~Version() rocksdb/src/db/version_set.cc:758
          https://github.com/facebook/rocksdb/issues/8 0x9d1bb5 in rocksdb::Version::Unref() rocksdb/src/db/version_set.cc:2869
          https://github.com/facebook/rocksdb/issues/9 0x183e7631 in rocksdb::Compaction::~Compaction() rocksdb/src/db/compaction/compaction.cc:275
          https://github.com/facebook/rocksdb/issues/10 0x9e6de6 in std::default_delete<rocksdb::Compaction>::operator()(rocksdb::Compaction*) const third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/unique_ptr.h:78
          https://github.com/facebook/rocksdb/issues/11 0x9e6de6 in std::unique_ptr<rocksdb::Compaction, std::default_delete<rocksdb::Compaction> >::reset(rocksdb::Compaction*) third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/unique_ptr.h:376
          https://github.com/facebook/rocksdb/issues/12 0x9e6de6 in rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) rocksdb/src/db/db_impl/db_impl_compaction_flush.cc:2826
          https://github.com/facebook/rocksdb/issues/13 0x9ac3b8 in rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) rocksdb/src/db/db_impl/db_impl_compaction_flush.cc:2320
          https://github.com/facebook/rocksdb/issues/14 0x9abff7 in rocksdb::DBImpl::BGWorkCompaction(void*) rocksdb/src/db/db_impl/db_impl_compaction_flush.cc:2096
          ......
      
      Fix the issue by reference the super version and use the referenced version from it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6473
      
      Test Plan: Run ASAN for all existing tests.
      
      Differential Revision: D20196416
      
      fbshipit-source-id: 5f4a7918110fc7b8dd7841932d376bc9d1e59d6f
      17bef7d3
    • Z
      Replace Directory with FSDirectory in DB (#6468) · 8d73137a
      Zhichao Cao 提交于
      Summary:
      In the current code base, we can use Directory from Env to manage directory (e.g, Fsync()). The PR https://github.com/facebook/rocksdb/issues/5761  introduce the File System as a new Env API. So we further replace the Directory class in DB with FSDirectory such that we can have more IO information from IOStatus returned by FSDirectory.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6468
      
      Test Plan: pass make asan_check
      
      Differential Revision: D20195261
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 93962cb9436852bfcfb76e086d9e7babd461cbe1
      8d73137a
    • H
      return timestamp from get (#6409) · 904a60ff
      Huisheng Liu 提交于
      Summary:
      Added new Get() methods that return timestamp. Dummy implementation is given so that classes derived from DB don't need to be touched to provide their implementation. MultiGet is not included.
      
      ReadRandom perf test (10 minutes) on the same development machine ram drive with the same DB data shows no regression (within marge of error). The test is adapted from https://github.com/facebook/rocksdb/wiki/RocksDB-In-Memory-Workload-Performance-Benchmarks.
          base line (commit 72ee067b):
              101.712 micros/op 314602 ops/sec;   36.0 MB/s (5658999 of 5658999 found)
          This PR:
              100.288 micros/op 319071 ops/sec;   36.5 MB/s (5674999 of 5674999 found)
      
      ./db_bench --db=r:\rocksdb.github --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --cache_size=2147483648 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=r:\rocksdb.github\WAL_LOG --sync=0 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --duration=600 --benchmarks=readrandom --use_existing_db=1 --num=25000000 --threads=32
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6409
      
      Differential Revision: D20200086
      
      Pulled By: riversand963
      
      fbshipit-source-id: 490edd74d924f62bd8ae9c29c2a6bbbb8410ca50
      904a60ff
  18. 29 2月, 2020 1 次提交
  19. 26 2月, 2020 1 次提交
    • A
      Fix range deletion tombstone ingestion with global seqno (#6429) · 69679e73
      Andrew Kryczka 提交于
      Summary:
      Original author: jeffrey-xiao
      
      If we are writing a global seqno for an ingested file, the range
      tombstone metablock gets accessed and put into the cache during
      ingestion preparation. At the time, the global seqno of the ingested
      file has not yet been determined, so the cached block will not have a
      global seqno. When the file is ingested and we read its range tombstone
      metablock, it will be returned from the cache with no global seqno. In
      that case, we use the actual seqnos stored in the range tombstones,
      which are all zero, so the tombstones cover nothing.
      
      This commit removes global_seqno_ variable from Block. When iterating
      over a block, the global seqno for the block is determined by the
      iterator instead of storing this mutable attribute in Block.
      Additionally, this commit adds a regression test to check that keys are
      deleted when ingesting a file with a global seqno and range deletion
      tombstones.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6429
      
      Differential Revision: D19961563
      
      Pulled By: ajkr
      
      fbshipit-source-id: 5cf777397fa3e452401f0bf0364b0750492487b7
      69679e73
  20. 25 2月, 2020 1 次提交
    • L
      Add blob file state to VersionEdit (#6416) · d87c10c6
      Levi Tamasi 提交于
      Summary:
      BlobDB currently does not keep track of blob files: no records are written to
      the manifest when a blob file is added or removed, and upon opening a database,
      the list of blob files is populated simply based on the contents of the blob directory.
      This means that lost blob files cannot be detected at the moment. We plan to solve
      this issue by making blob files a part of `Version`; as a first step, this patch makes
      it possible to store information about blob files in `VersionEdit`. Currently, this information
      includes blob file number, total number and size of all blobs, and total number and size
      of garbage blobs. However, the format is extensible: new fields can be added in
      both a forward compatible and a forward incompatible manner if needed (similarly
      to `kNewFile4`).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6416
      
      Test Plan: `make check`
      
      Differential Revision: D19894234
      
      Pulled By: ltamasi
      
      fbshipit-source-id: f9753e1f2aedf6dadb70c09b345207cb9c58c329
      d87c10c6
  21. 22 2月, 2020 2 次提交