1. 20 9月, 2020 1 次提交
    • Z
      fix the flaky test failure (#7415) · 485fd9d9
      Zhichao Cao 提交于
      Summary:
      Fix the flaky test failure in error_handler_fs_test. Add the sync point, solve the dependency.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7415
      
      Test Plan: make asan_check, ~/gtest-parallel/gtest-parallel -r 100 ./error_handler_fs_test
      
      Reviewed By: siying
      
      Differential Revision: D23804330
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 5175108651f7652e47e15978f2a9c1669ef59d80
      485fd9d9
  2. 17 9月, 2020 1 次提交
  3. 15 9月, 2020 1 次提交
    • L
      Integrate blob file writing with the flush logic (#7345) · b0e78341
      Levi Tamasi 提交于
      Summary:
      The patch adds support for writing blob files during flush by integrating
      `BlobFileBuilder` with the flush logic, most importantly, `BuildTable` and
      `CompactionIterator`. If `enable_blob_files` is set, large values are extracted
      to blob files and replaced with references. The resulting blob files are then
      logged to the MANIFEST as part of the flush job's `VersionEdit` and
      added to the `Version`, similarly to table files. Errors related to writing
      blob files fail the flush, and any blob files written by such jobs are immediately
      deleted (again, similarly to how SST files are handled). In addition, the patch
      extends the logging and statistics around flushes to account for the presence
      of blob files (e.g. `InternalStats::CompactionStats::bytes_written`, which is
      used for calculating write amplification, now considers the blob files as well).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7345
      
      Test Plan: Tested using `make check` and `db_bench`.
      
      Reviewed By: riversand963
      
      Differential Revision: D23506369
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 646885f22dfbe063f650d38a1fedc132f499a159
      b0e78341
  4. 09 9月, 2020 1 次提交
    • A
      Store FSWritableFilePtr object in WritableFileWriter (#7193) · b175eceb
      Akanksha Mahajan 提交于
      Summary:
      Replace FSWritableFile pointer with FSWritableFilePtr
          object in WritableFileWriter.
          This new object wraps FSWritableFile pointer.
      
          Objective: If tracing is enabled, FSWritableFile Ptr returns
          FSWritableFileTracingWrapper pointer that includes all necessary
          information in IORecord and calls underlying FileSystem and invokes
          IOTracer to dump that record in a binary file. If tracing is disabled
          then, underlying FileSystem pointer is returned directly.
          FSWritableFilePtr wrapper class is added to bypass the
          FSWritableFileWrapper when
          tracing is disabled.
      
          Test Plan: make check -j64
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7193
      
      Reviewed By: anand1976
      
      Differential Revision: D23355915
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: e62a27a13c1fd77e36a6dbafc7006d969bed25cf
      b175eceb
  5. 26 8月, 2020 1 次提交
    • Z
      Pass SST file checksum information through OnTableFileCreated (#7108) · d51f88c9
      Zhichao Cao 提交于
      Summary:
      When SST file is created, application is able to know the file information through OnTableFileCreated callback in LogAndNotifyTableFileCreationFinished. Since file checksum information can be useful for application when the SST file is created, we add file_checksum and file_checksum_func_name information to TableFileCreationInfo, which will be passed through OnTableFileCreated.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7108
      
      Test Plan: make check, listener_test.
      
      Reviewed By: ajkr
      
      Differential Revision: D22470240
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 92c20344d9b986eadfe3480f3769bf4add0dbaae
      d51f88c9
  6. 07 8月, 2020 1 次提交
  7. 04 8月, 2020 1 次提交
    • A
      dedup ReadOptions in iterator hierarchy (#7210) · a4a4a2da
      Andrew Kryczka 提交于
      Summary:
      Previously, a `ReadOptions` object was stored in every `BlockBasedTableIterator`
      and every `LevelIterator`. This redundancy consumes extra memory,
      resulting in the `Arena` making more allocations, and iteration
      observing worse cache performance.
      
      This PR migrates callers of `NewInternalIterator()` and
      `MakeInputIterator()` to provide a `ReadOptions` object guaranteed to
      outlive the returned iterator. When the iterator's lifetime will be managed by the
      user, this lifetime guarantee is achieved by storing the `ReadOptions`
      value in `ArenaWrappedDBIter`. Then, sub-iterators of `NewInternalIterator()` and
      `MakeInputIterator()` can hold a reference-to-const `ReadOptions`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7210
      
      Test Plan:
      - `make check` under ASAN and valgrind
      - benchmark: on a DB with 2 L0 files and 3 L1+ levels, this PR reduced `Arena` allocation 4792 -> 4160 bytes.
      
      Reviewed By: anand1976
      
      Differential Revision: D22861323
      
      Pulled By: ajkr
      
      fbshipit-source-id: 54aebb3e89c872eeab0f5793b4b6e42878d093ce
      a4a4a2da
  8. 23 7月, 2020 1 次提交
  9. 16 7月, 2020 1 次提交
    • Z
      Auto resume the DB from Retryable IO Error (#6765) · a10f12ed
      Zhichao Cao 提交于
      Summary:
      In current codebase, in write path, if Retryable IO Error happens, SetBGError is called. The retryable IO Error is converted to hard error and DB is in read only mode. User or application needs to resume it. In this PR, if Retryable IO Error happens in one DB, SetBGError will create a new thread to call Resume (auto resume). otpions.max_bgerror_resume_count controls if auto resume is enabled or not (if max_bgerror_resume_count<=0, auto resume will not be enabled). options.bgerror_resume_retry_interval controls the time interval to call Resume again if the previous resume fails due to the Retryable IO Error. If non-retryable error happens during resume, auto resume will terminate.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6765
      
      Test Plan: Added the unit test cases in error_handler_fs_test and pass make asan_check
      
      Reviewed By: anand1976
      
      Differential Revision: D21916789
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: acb8b5e5dc3167adfa9425a5b7fc104f6b95cb0b
      a10f12ed
  10. 18 6月, 2020 1 次提交
    • Z
      Store DB identity and DB session ID in SST files (#6983) · 94d04529
      Zitan Chen 提交于
      Summary:
      `db_id` and `db_session_id` are now part of the table properties for all formats and stored in SST files. This adds about 99 bytes to each new SST file.
      
      The `TablePropertiesNames` for these two identifiers are `rocksdb.creating.db.identity` and `rocksdb.creating.session.identity`.
      
      In addition, SST files generated from SstFileWriter and Repairer have DB identity “SST Writer” and “DB Repairer”, respectively. Their DB session IDs are generated in the same way as `DB::GetDbSessionId`.
      
      A table property test is added.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6983
      
      Test Plan: make check and some manual tests.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22048826
      
      Pulled By: gg814
      
      fbshipit-source-id: afdf8c11424a6f509b5c0b06dafad584a80103c9
      94d04529
  11. 10 6月, 2020 1 次提交
  12. 16 4月, 2020 1 次提交
    • M
      Properly report IO errors when IndexType::kBinarySearchWithFirstKey is used (#6621) · e45673de
      Mike Kolupaev 提交于
      Summary:
      Context: Index type `kBinarySearchWithFirstKey` added the ability for sst file iterator to sometimes report a key from index without reading the corresponding data block. This is useful when sst blocks are cut at some meaningful boundaries (e.g. one block per key prefix), and many seeks land between blocks (e.g. for each prefix, the ranges of keys in different sst files are nearly disjoint, so a typical seek needs to read a data block from only one file even if all files have the prefix). But this added a new error condition, which rocksdb code was really not equipped to deal with: `InternalIterator::value()` may fail with an IO error or Status::Incomplete, but it's just a method returning a Slice, with no way to report error instead. Before this PR, this type of error wasn't handled at all (an empty slice was returned), and kBinarySearchWithFirstKey implementation was considered a prototype.
      
      Now that we (LogDevice) have experimented with kBinarySearchWithFirstKey for a while and confirmed that it's really useful, this PR is adding the missing error handling.
      
      It's a pretty inconvenient situation implementation-wise. The error needs to be reported from InternalIterator when trying to access value. But there are ~700 call sites of `InternalIterator::value()`, most of which either can't hit the error condition (because the iterator is reading from memtable or from index or something) or wouldn't benefit from the deferred loading of the value (e.g. compaction iterator that reads all values anyway). Adding error handling to all these call sites would needlessly bloat the code. So instead I made the deferred value loading optional: only the call sites that may use deferred loading have to call the new method `PrepareValue()` before calling `value()`. The feature is enabled with a new bool argument `allow_unprepared_value` to a bunch of methods that create iterators (it wouldn't make sense to put it in ReadOptions because it's completely internal to iterators, with virtually no user-visible effect). Lmk if you have better ideas.
      
      Note that the deferred value loading only happens for *internal* iterators. The user-visible iterator (DBIter) always prepares the value before returning from Seek/Next/etc. We could go further and add an API to defer that value loading too, but that's most likely not useful for LogDevice, so it doesn't seem worth the complexity for now.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6621
      
      Test Plan: make -j5 check . Will also deploy to some logdevice test clusters and look at stats.
      
      Reviewed By: siying
      
      Differential Revision: D20786930
      
      Pulled By: al13n321
      
      fbshipit-source-id: 6da77d918bad3780522e918f17f4d5513d3e99ee
      e45673de
  13. 03 4月, 2020 1 次提交
  14. 30 3月, 2020 1 次提交
    • Z
      Use FileChecksumGenFactory for SST file checksum (#6600) · e8d332d9
      Zhichao Cao 提交于
      Summary:
      In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6600
      
      Test Plan: tested with make asan_check
      
      Reviewed By: riversand963
      
      Differential Revision: D20717670
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 2a74c1c280ac11a07a1980185b43b671acaa71c6
      e8d332d9
  15. 28 3月, 2020 1 次提交
    • Z
      Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487) · 42468881
      Zhichao Cao 提交于
      Summary:
      In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
      
      The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
      
      Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
      
      Reviewed By: anand1976
      
      Differential Revision: D20685017
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
      42468881
  16. 21 2月, 2020 1 次提交
    • S
      Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) · fdf882de
      sdong 提交于
      Summary:
      When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
      
      Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
      
      Differential Revision: D19977691
      
      fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
      fdf882de
  17. 11 2月, 2020 1 次提交
    • Z
      Checksum for each SST file and stores in MANIFEST (#6216) · 4369f2c7
      Zhichao Cao 提交于
      Summary:
      In the current code base, RocksDB generate the checksum for each block and verify the checksum at usage. Current PR enable SST file checksum. After a SST file is generated by Flush or Compaction, RocksDB generate the SST file checksum and store the checksum value and checksum method name in the vs_info and MANIFEST as part for the FileMetadata.
      
      Added the enable_sst_file_checksum to Options to enable or disable file checksum. Added sst_file_checksum to Options such that user can plugin their own SST file checksum calculate method via overriding the SstFileChecksum class. The checksum information inlcuding uint32_t checksum value and a checksum name (string).  A new tool is added to LDB such that user can dump out a list of file checksum information from MANIFEST. If user enables the file checksum but does not provide the sst_file_checksum instance, RocksDB will use the default crc32checksum implemented in table/sst_file_checksum_crc32c.h
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6216
      
      Test Plan: Added the testing case in table_test and ldb_cmd_test to verify checksum is correct in different level. Pass make asan_check.
      
      Differential Revision: D19171461
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: b2e53479eefc5bb0437189eaa1941670e5ba8b87
      4369f2c7
  18. 14 12月, 2019 1 次提交
    • A
      Introduce a new storage specific Env API (#5761) · afa2420c
      anand76 提交于
      Summary:
      The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc.
      
      This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO.
      
      The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before.
      
      This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection.
      
      The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761
      
      Differential Revision: D18868376
      
      Pulled By: anand1976
      
      fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
      afa2420c
  19. 15 10月, 2019 2 次提交
    • L
      BlobDB GC: add SST <-> oldest blob file referenced mapping (#5903) · 5f025ea8
      Levi Tamasi 提交于
      Summary:
      This is groundwork for adding garbage collection support to BlobDB. The
      patch adds logic that keeps track of the oldest blob file referred to by
      each SST file. The oldest blob file is identified during flush/
      compaction (similarly to how the range of keys covered by the SST is
      identified), and persisted in the manifest as a custom field of the new
      file edit record. Blob indexes with TTL are ignored for the purposes of
      identifying the oldest blob file (since such blob files are cleaned up by the
      TTL logic in BlobDB).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5903
      
      Test Plan:
      Added new unit tests; also ran db_bench in BlobDB mode, inspected the
      manifest using ldb, and confirmed (by scanning the SST files using
      sst_dump) that the value of the oldest blob file number field matches
      the contents of the file for each SST.
      
      Differential Revision: D17859997
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 21662c137c6259a6af70446faaf3a9912c550e90
      5f025ea8
    • Y
      OnTableFileCreationCompleted use "(nil)" for empty file during flush (#5905) · 6febfd84
      Yanqin Jin 提交于
      Summary:
      Compaction can call OnTableFileCreationCompleted(). If file is empty, "(nil)"
      is used as the file name.
      Do the same for flush.
      
      Test plan (dev server):
      ```
      make all
      make check
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5905
      
      Differential Revision: D17883285
      
      Pulled By: riversand963
      
      fbshipit-source-id: 6565884adbb00e8023d88b17dfb3b6eb92220b59
      6febfd84
  20. 17 9月, 2019 1 次提交
    • S
      Divide file_reader_writer.h and .cc (#5803) · b931f84e
      sdong 提交于
      Summary:
      file_reader_writer.h and .cc contain several files and helper function, and it's hard to navigate. Separate it to multiple files and put them under file/
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5803
      
      Test Plan: Build whole project using make and cmake.
      
      Differential Revision: D17374550
      
      fbshipit-source-id: 10efca907721e7a78ed25bbf74dc5410dea05987
      b931f84e
  21. 21 6月, 2019 1 次提交
    • H
      Add more callers for table reader. (#5454) · 705b8eec
      haoyuhuang 提交于
      Summary:
      This PR adds more callers for table readers. These information are only used for block cache analysis so that we can know which caller accesses a block.
      1. It renames the BlockCacheLookupCaller to TableReaderCaller as passing the caller from upstream requires changes to table_reader.h and TableReaderCaller is a more appropriate name.
      2. It adds more table reader callers in table/table_reader_caller.h, e.g., kCompactionRefill, kExternalSSTIngestion, and kBuildTable.
      
      This PR is long as it requires modification of interfaces in table_reader.h, e.g., NewIterator.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5454
      
      Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32.
      
      Differential Revision: D15819451
      
      Pulled By: HaoyuHuang
      
      fbshipit-source-id: b6caa704c8fb96ddd15b9a934b7e7ea87f88092d
      705b8eec
  22. 01 6月, 2019 1 次提交
  23. 31 5月, 2019 3 次提交
  24. 30 5月, 2019 1 次提交
  25. 11 4月, 2019 1 次提交
    • S
      Periodic Compactions (#5166) · d3d20dcd
      Sagar Vemuri 提交于
      Summary:
      Introducing Periodic Compactions.
      
      This feature allows all the files in a CF to be periodically compacted. It could help in catching any corruptions that could creep into the DB proactively as every file is constantly getting re-compacted.  And also, of course, it helps to cleanup data older than certain threshold.
      
      - Introduced a new option `periodic_compaction_time` to control how long a file can live without being compacted in a CF.
      - This works across all levels.
      - The files are put in the same level after going through the compaction. (Related files in the same level are picked up as `ExpandInputstoCleanCut` is used).
      - Compaction filters, if any, are invoked as usual.
      - A new table property, `file_creation_time`, is introduced to implement this feature. This property is set to the time at which the SST file was created (and that time is given by the underlying Env/OS).
      
      This feature can be enabled on its own, or in conjunction with `ttl`. It is possible to set a different time threshold for the bottom level when used in conjunction with ttl. Since `ttl` works only on 0 to last but one levels, you could set `ttl` to, say, 1 day, and `periodic_compaction_time` to, say, 7 days. Since `ttl < periodic_compaction_time` all files in last but one levels keep getting picked up based on ttl, and almost never based on periodic_compaction_time. The files in the bottom level get picked up for compaction based on `periodic_compaction_time`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5166
      
      Differential Revision: D14884441
      
      Pulled By: sagar0
      
      fbshipit-source-id: 408426cbacb409c06386a98632dcf90bfa1bda47
      d3d20dcd
  26. 19 3月, 2019 1 次提交
    • S
      Feature for sampling and reporting compressibility (#4842) · b45b1cde
      Shobhit Dayal 提交于
      Summary:
      This is a feature to sample data-block compressibility and and report them as stats. 1 in N (tunable) blocks is sampled for compressibility using two algorithms:
      1. lz4 or snappy for fast compression
      2. zstd or zlib for slow but higher compression.
      
      The stats are reported to the caller as raw-bytes and compressed-bytes. The block continues to be compressed for storage using the specified CompressionType.
      
      The db_bench_tool how has a command line option for specifying the sampling rate. It's default value is 0 (no sampling). To test the overhead for a certain value, users can compare the performance of db_bench_tool, varying the sampling rate. It is unlikely to have a noticeable impact for high values like 20.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4842
      
      Differential Revision: D13629011
      
      Pulled By: shobhitdayal
      
      fbshipit-source-id: 14ca668bcab6499b2a1734edf848eb62a4f4fafa
      b45b1cde
  27. 15 2月, 2019 1 次提交
    • A
      Dictionary compression for files written by SstFileWriter (#4978) · c8c8104d
      Andrew Kryczka 提交于
      Summary:
      If `CompressionOptions::max_dict_bytes` and/or `CompressionOptions::zstd_max_train_bytes` are set, `SstFileWriter` will now generate files respecting those options.
      
      I refactored the logic a bit for deciding when to use dictionary compression. Previously we plumbed `is_bottommost_level` down to the table builder and used that. However it was kind of confusing in `SstFileWriter`'s context since we don't know what level the file will be ingested to. Instead, now the higher-level callers (e.g., flush, compaction, file writer) are responsible for building the right `CompressionOptions` to give the table builder.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4978
      
      Differential Revision: D14060763
      
      Pulled By: ajkr
      
      fbshipit-source-id: dc802c327896df2b319dc162d6acc82b9cdb452a
      c8c8104d
  28. 12 2月, 2019 1 次提交
    • A
      Reduce scope of compression dictionary to single SST (#4952) · 62f70f6d
      Andrew Kryczka 提交于
      Summary:
      Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio.
      
      So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include:
      
      - The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called.
      - After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up.
      - Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952
      
      Differential Revision: D13967980
      
      Pulled By: ajkr
      
      fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f
      62f70f6d
  29. 30 1月, 2019 1 次提交
  30. 18 12月, 2018 2 次提交
  31. 10 11月, 2018 1 次提交
    • S
      Update all unique/shared_ptr instances to be qualified with namespace std (#4638) · dc352807
      Sagar Vemuri 提交于
      Summary:
      Ran the following commands to recursively change all the files under RocksDB:
      ```
      find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
      find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
      find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
      find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
      ```
      Running `make format` updated some formatting on the files touched.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638
      
      Differential Revision: D12934992
      
      Pulled By: sagar0
      
      fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
      dc352807
  32. 13 10月, 2018 1 次提交
    • Y
      Add listener to sample file io (#3933) · 729a617b
      Yanqin Jin 提交于
      Summary:
      We would like to collect file-system-level statistics including file name, offset, length, return code, latency, etc., which requires to add callbacks to intercept file IO function calls when RocksDB is running.
      To collect file-system-level statistics, users can inherit the class `EventListener`, as in `TestFileOperationListener `. Note that `TestFileOperationListener::ShouldBeNotifiedOnFileIO()` returns true.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3933
      
      Differential Revision: D10219571
      
      Pulled By: riversand963
      
      fbshipit-source-id: 7acc577a2d31097766a27adb6f78eaf8b1e8ff15
      729a617b
  33. 24 8月, 2018 1 次提交
  34. 12 8月, 2018 1 次提交
    • A
      Revert changes in PR #4003 (#4263) · 4ea56b1b
      Anand Ananthabhotla 提交于
      Summary:
      Revert this change. Not generating the OnTableFileCreated() notification for a 0 byte SST on flush breaks the assumption that every OnTableFileCreationStarted() notification is followed by a corresponding OnTableFileCreated().
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4263
      
      Differential Revision: D9285623
      
      Pulled By: anand1976
      
      fbshipit-source-id: 808c3dcd498b4b4f4ed4be947a29a24b2296aa8d
      4ea56b1b
  35. 14 7月, 2018 1 次提交
    • P
      Relax VersionStorageInfo::GetOverlappingInputs check (#4050) · 90fc4069
      Peter Mattis 提交于
      Summary:
      Do not consider the range tombstone sentinel key as causing 2 adjacent
      sstables in a level to overlap. When a range tombstone's end key is the
      largest key in an sstable, the sstable's end key is so to a "sentinel"
      value that is the smallest key in the next sstable with a sequence
      number of kMaxSequenceNumber. This "sentinel" is guaranteed to not
      overlap in internal-key space with the next sstable. Unfortunately,
      GetOverlappingFiles uses user-keys to determine overlap and was thus
      considering 2 adjacent sstables in a level to overlap if they were
      separated by this sentinel key. This in turn would cause compactions to
      be larger than necessary.
      
      Note that this conflicts with
      https://github.com/facebook/rocksdb/pull/2769 and cases
      `DBRangeDelTest.CompactionTreatsSplitInputLevelDeletionAtomically` to
      fail.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4050
      
      Differential Revision: D8844423
      
      Pulled By: ajkr
      
      fbshipit-source-id: df3f9f1db8f4cff2bff77376b98b83c2ae1d155b
      90fc4069
  36. 13 7月, 2018 1 次提交
    • N
      Range deletion performance improvements + cleanup (#4014) · 5f3088d5
      Nikhil Benesch 提交于
      Summary:
      This fixes the same performance issue that #3992 fixes but with much more invasive cleanup.
      
      I'm more excited about this PR because it paves the way for fixing another problem we uncovered at Cockroach where range deletion tombstones can cause massive compactions. For example, suppose L4 contains deletions from [a, c) and [x, z) and no other keys, and L5 is entirely empty. L6, however, is full of data. When compacting L4 -> L5, we'll end up with one file that spans, massively, from [a, z). When we go to compact L5 -> L6, we'll have to rewrite all of L6! If, instead of range deletions in L4, we had keys a, b, x, y, and z, RocksDB would have been smart enough to create two files in L5: one for a and b and another for x, y, and z.
      
      With the changes in this PR, it will be possible to adjust the compaction logic to split tombstones/start new output files when they would span too many files in the grandparent level.
      
      ajkr please take a look when you have a minute!
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4014
      
      Differential Revision: D8773253
      
      Pulled By: ajkr
      
      fbshipit-source-id: ec62fa85f648fdebe1380b83ed997f9baec35677
      5f3088d5