1. 23 3月, 2023 1 次提交
  2. 28 1月, 2023 1 次提交
    • S
      Remove RocksDB LITE (#11147) · 4720ba43
      sdong 提交于
      Summary:
      We haven't been actively mantaining RocksDB LITE recently and the size must have been gone up significantly. We are removing the support.
      
      Most of changes were done through following comments:
      
      unifdef -m -UROCKSDB_LITE `git grep -l ROCKSDB_LITE | egrep '[.](cc|h)'`
      
      by Peter Dillinger. Others changes were manually applied to build scripts, CircleCI manifests, ROCKSDB_LITE is used in an expression and file db_stress_test_base.cc.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11147
      
      Test Plan: See CI
      
      Reviewed By: pdillinger
      
      Differential Revision: D42796341
      
      fbshipit-source-id: 4920e15fc2060c2cd2221330a6d0e5e65d4b7fe2
      4720ba43
  3. 26 10月, 2022 1 次提交
  4. 01 10月, 2022 1 次提交
    • C
      User-defined timestamp support for `DeleteRange()` (#10661) · 9f2363f4
      Changyu Bi 提交于
      Summary:
      Add user-defined timestamp support for range deletion. The new API is `DeleteRange(opt, cf, begin_key, end_key, ts)`. Most of the change is to update the comparator to compare without timestamp. Other than that, major changes are
      - internal range tombstone data structures (`FragmentedRangeTombstoneList`, `RangeTombstone`, etc.) to store timestamps.
      - Garbage collection of range tombstones and range tombstone covered keys during compaction.
      - Get()/MultiGet() to return the timestamp of a range tombstone when needed.
      - Get/Iterator with range tombstones bounded by readoptions.timestamp.
      - timestamp crash test now issues DeleteRange by default.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10661
      
      Test Plan:
      - Added unit test: `make check`
      - Stress test: `python3 tools/db_crashtest.py --enable_ts whitebox --readpercent=57 --prefixpercent=4 --writepercent=25 -delpercent=5 --iterpercent=5 --delrangepercent=4`
      - Ran `db_bench` to measure regression when timestamp is not enabled. The tests are for write (with some range deletion) and iterate with DB fitting in memory: `./db_bench--benchmarks=fillrandom,seekrandom --writes_per_range_tombstone=200 --max_write_buffer_number=100 --min_write_buffer_number_to_merge=100 --writes=500000 --reads=500000 --seek_nexts=10 --disable_auto_compactions -disable_wal=true --max_num_range_tombstones=1000`.  Did not see consistent regression in no timestamp case.
      
      | micros/op | fillrandom | seekrandom |
      | --- | --- | --- |
      |main| 2.58 |10.96|
      |PR 10661| 2.68 |10.63|
      
      Reviewed By: riversand963
      
      Differential Revision: D39441192
      
      Pulled By: cbi42
      
      fbshipit-source-id: f05aca3c41605caf110daf0ff405919f300ddec2
      9f2363f4
  5. 15 7月, 2022 2 次提交
    • J
      Add seqno to time mapping (#10338) · a3acf2ef
      Jay Zhuang 提交于
      Summary:
      Which will be used for tiered storage to preclude hot data from
      compacting to the cold tier (the last level).
      Internally, adding seqno to time mapping. A periodic_task is scheduled
      to record the current_seqno -> current_time in certain cadence. When
      memtable flush, the mapping informaiton is stored in sstable property.
      During compaction, the mapping information are merged and get the
      approximate time of sequence number, which is used to determine if a key
      is recently inserted or not and preclude it from the last level if it's
      recently inserted (within the `preclude_last_level_data_seconds`).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10338
      
      Test Plan: CI
      
      Reviewed By: siying
      
      Differential Revision: D37810187
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 6953be7a18a99de8b1cb3b162d712f79c2b4899f
      a3acf2ef
    • S
      Make InternalKeyComparator not configurable (#10342) · c8b20d46
      sdong 提交于
      Summary:
      InternalKeyComparator is an internal class which is a simple wrapper of Comparator. https://github.com/facebook/rocksdb/pull/8336 made Comparator customizeable. As a side effect, internal key comparator was made configurable too. This introduces overhead to this simple wrapper. For example, every InternalKeyComparator will have an std::vector attached to it, which consumes memory and possible allocation overhead too.
      We remove InternalKeyComparator from being customizable by making InternalKeyComparator not a subclass of Comparator.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10342
      
      Test Plan: Run existing CI tests and make sure it doesn't fail
      
      Reviewed By: riversand963
      
      Differential Revision: D37771351
      
      fbshipit-source-id: 917256ee04b2796ed82974549c734fb6c4d8ccee
      c8b20d46
  6. 08 3月, 2022 1 次提交
  7. 10 9月, 2021 1 次提交
    • L
      Support timestamps in SstFileWriter (#8899) · 7e78d7c5
      Levi Tamasi 提交于
      Summary:
      As a first step of supporting user-defined timestamps with ingestion, the
      patch adds timestamp support to `SstFileWriter`; namely, it adds new
      versions of the `Put` and `Delete` APIs that take timestamps. (`Merge`
      and `DeleteRange` are currently not supported with user-defined timestamps
      in general but once those features are implemented, we can handle them
      in `SstFileWriter` in a similar fashion.) The new APIs validate the size of
      the timestamp provided by the client. Similarly, calls to the pre-existing
      timestamp-less APIs are now disallowed when user-defined timestamps are
      in use according to the comparator.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8899
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D30850699
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 779154373618f19b8f0797976bb7286783c57b67
      7e78d7c5
  8. 21 8月, 2021 1 次提交
    • P
      Embed original file number in SST table properties (#8686) · 04db7648
      Peter Dillinger 提交于
      Summary:
      I very recently realized that with https://github.com/facebook/rocksdb/issues/8669 we cannot later add
      file numbers to external SST files (so that more can share db session
      ids for better uniqueness properties), because of forward compatibility.
      We would have a version of RocksDB that assumes session IDs are unique
      on external SST files and therefore can't really break that invariant in
      future files.
      
      This change adds a table property for "orig_file_number" which is
      populated by normal SST files and also external SST files generated by
      SstFileWriter. SstFileWriter now keeps a db_session_id for life of the
      object and increments its own file numbers for embedding in table
      properties. (They are arguably "fake" file numbers because these numbers
      and not embedded in the file name.)
      
      While updating block_based_table_builder, I removed several unnecessary
      fields from Rep, because following the pattern would have created
      another unnecessary field.
      
      This change also updates block_based_table_reader to use this new
      property when available, which means that for newer SST files, we can
      determine the stable/original <db_session_id,file_number> unique
      identifier using just the file contents, not the file name. (It's a bit
      complicated; detailed comments in block_based_table_reader.)
      
      Also added DB host id to properties listing by sst_dump, which could be
      useful in debugging.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8686
      
      Test Plan: majorly overhauled StableCacheKeys test for this change
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30457742
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 2e5ae7dddeb94fb9d8eac8a928486aed8b8cd445
      04db7648
  9. 19 8月, 2021 1 次提交
  10. 25 6月, 2021 1 次提交
    • Z
      Using existing crc32c checksum in checksum handoff for Manifest and WAL (#8412) · a904c62d
      Zhichao Cao 提交于
      Summary:
      In PR https://github.com/facebook/rocksdb/issues/7523 , checksum handoff is introduced in RocksDB for WAL, Manifest, and SST files. When user enable checksum handoff for a certain type of file, before the data is written to the lower layer storage system, we calculate the checksum (crc32c) of each piece of data and pass the checksum down with the data, such that data verification can be down by the lower layer storage system if it has the capability. However, it cannot cover the whole lifetime of the data in the memory and also it potentially introduces extra checksum calculation overhead.
      
      In this PR, we introduce a new interface in WritableFileWriter::Append, which allows the caller be able to pass the data and the checksum (crc32c) together. In this way, WritableFileWriter can directly use the pass-in checksum (crc32c) to generate the checksum of data being passed down to the storage system. It saves the calculation overhead and achieves higher protection coverage. When a new checksum is added with the data, we use Crc32cCombine https://github.com/facebook/rocksdb/issues/8305 to combine the existing checksum and the new checksum. To avoid the segmenting of data by rate-limiter before it is stored, rate-limiter is called enough times to accumulate enough credits for a certain write. This design only support Manifest and WAL which use log_writer in the current stage.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8412
      
      Test Plan: make check, add new testing cases.
      
      Reviewed By: anand1976
      
      Differential Revision: D29151545
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 75e2278c5126cfd58393c67b1efd18dcc7a30772
      a904c62d
  11. 11 6月, 2021 1 次提交
    • Z
      Use DbSessionId as cache key prefix when secondary cache is enabled (#8360) · f44e69c6
      Zhichao Cao 提交于
      Summary:
      Currently, we either use the file system inode or a monotonically incrementing runtime ID as the block cache key prefix. However, if we use a monotonically incrementing runtime ID (in the case that the file system does not support inode id generation), in some cases, it cannot ensure uniqueness (e.g., we have secondary cache migrated from host to host). We use DbSessionID (20 bytes) + current file number (at most 10 bytes) as the new cache block key prefix when the secondary cache is enabled. So can accommodate scenarios such as transfer of cache state across hosts.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8360
      
      Test Plan: add the test to lru_cache_test
      
      Reviewed By: pdillinger
      
      Differential Revision: D29006215
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 6cff686b38d83904667a2bd39923cd030df16814
      f44e69c6
  12. 18 5月, 2021 1 次提交
    • L
      Make it possible to apply only a subrange of table property collectors (#8298) · d83542ca
      Levi Tamasi 提交于
      Summary:
      This patch does two things:
      1) Introduces some aliases in order to eliminate/prevent long-winded type names
      w/r/t the internal table property collectors (see e.g.
      `std::vector<std::unique_ptr<IntTblPropCollectorFactory>>`).
      2) Makes it possible to apply only a subrange of table property collectors during
      table building by turning `TableBuilderOptions::int_tbl_prop_collector_factories`
      from a pointer to a `vector` into a range (i.e. a pair of iterators).
      
      Rationale: I plan to introduce a BlobDB related table property collector, which
      should only be applied during table creation if blob storage is enabled at the moment
      (which can be changed dynamically). This change will make it possible to include/
      exclude the BlobDB related collector as needed without having to introduce
      a second `vector` of collectors in `ColumnFamilyData` with pretty much the same
      contents.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8298
      
      Test Plan: `make check`
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28430910
      
      Pulled By: ltamasi
      
      fbshipit-source-id: a81d28f2c59495865300f43deb2257d2e6977c8e
      d83542ca
  13. 06 5月, 2021 1 次提交
    • M
      Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) · 8948dc85
      mrambacher 提交于
      Summary:
      The ImmutableCFOptions contained a bunch of fields that belonged to the ImmutableDBOptions.  This change cleans that up by introducing an ImmutableOptions struct.  Following the pattern of Options struct, this class inherits from the DB and CFOption structs (of the Immutable form).
      
      Only one structural change (the ImmutableCFOptions::fs was changed to a shared_ptr from a raw one) is in this PR.  All of the other changes involve moving the member variables from the ImmutableCFOptions into the ImmutableOptions and changing member variables or function parameters as required for compilation purposes.
      
      Follow-on PRs may do a further clean-up of the code, such as renaming variables (such as "ImmutableOptions cf_options") and potentially eliminating un-needed function parameters (there is no longer a need to pass both an ImmutableDBOptions and an ImmutableOptions to a function).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8262
      
      Reviewed By: pdillinger
      
      Differential Revision: D28226540
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 18ae71eadc879dedbe38b1eb8e6f9ff5c7147dbf
      8948dc85
  14. 01 5月, 2021 1 次提交
    • P
      Add more LSM info to FilterBuildingContext (#8246) · d2ca04e3
      Peter Dillinger 提交于
      Summary:
      Add `num_levels`, `is_bottommost`, and table file creation
      `reason` to `FilterBuildingContext`, in anticipation of more powerful
      Bloom-like filter support.
      
      To support this, added `is_bottommost` and `reason` to
      `TableBuilderOptions`, which allowed removing `reason` parameter from
      `rocksdb::BuildTable`.
      
      I attempted to remove `skip_filters` from `TableBuilderOptions`, because
      filter construction decisions should arise from options, not one-off
      parameters. I could not completely remove it because the public API for
      SstFileWriter takes a `skip_filters` parameter, and translating this
      into an option change would mean awkwardly replacing the table_factory
      if it is BlockBasedTableFactory with new filter_policy=nullptr option.
      I marked this public skip_filters option as deprecated because of this
      oddity. (skip_filters on the read side probably makes sense.)
      
      At least `skip_filters` is now largely hidden for users of
      `TableBuilderOptions` and is no longer used for implementing the
      optimize_filters_for_hits option. Bringing the logic for that option
      closer to handling of FilterBuildingContext makes it more obvious that
      hese two are using the same notion of "bottommost." (Planned:
      configuration options for Bloom-like filters that generalize
      `optimize_filters_for_hits`)
      
      Recommended follow-up: Try to get away from "bottommost level" naming of
      things, which is inaccurate (see
      VersionStorageInfo::RangeMightExistAfterSortedRun), and move to
      "bottommost run" or just "bottommost."
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8246
      
      Test Plan:
      extended an existing unit test to exercise and check various
      filter building contexts. Also, existing tests for
      optimize_filters_for_hits validate some of the "bottommost" handling,
      which is now closely connected to FilterBuildingContext::is_bottommost
      through TableBuilderOptions::is_bottommost
      
      Reviewed By: mrambacher
      
      Differential Revision: D28099346
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 2c1072e29c24d4ac404c761a7b7663292372600a
      d2ca04e3
  15. 29 4月, 2021 1 次提交
    • P
      Refactor: use TableBuilderOptions to reduce parameter lists (#8240) · 85becd94
      Peter Dillinger 提交于
      Summary:
      Greatly reduced the not-quite-copy-paste giant parameter lists
      of rocksdb::NewTableBuilder, rocksdb::BuildTable,
      BlockBasedTableBuilder::Rep ctor, and BlockBasedTableBuilder ctor.
      
      Moved weird separate parameter `uint32_t column_family_id` of
      TableFactory::NewTableBuilder into TableBuilderOptions.
      
      Re-ordered parameters to TableBuilderOptions ctor, so that `uint64_t
      target_file_size` is not randomly placed between uint64_t timestamps
      (was easy to mix up).
      
      Replaced a couple of fields of BlockBasedTableBuilder::Rep with a
      FilterBuildingContext. The motivation for this change is making it
      easier to pass along more data into new fields in FilterBuildingContext
      (follow-up PR).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8240
      
      Test Plan: ASAN make check
      
      Reviewed By: mrambacher
      
      Differential Revision: D28075891
      
      Pulled By: pdillinger
      
      fbshipit-source-id: fddb3dbb8260a0e8bdcbb51b877ebabf9a690d4f
      85becd94
  16. 23 4月, 2021 1 次提交
    • M
      Make types of Immutable/Mutable Options fields match that of the underlying Option (#8176) · 01e460d5
      mrambacher 提交于
      Summary:
      This PR is a first step at attempting to clean up some of the Mutable/Immutable Options code.  With this change, a DBOption and a ColumnFamilyOption can be reconstructed from their Mutable and Immutable equivalents, respectively.
      
      readrandom tests do not show any performance degradation versus master (though both are slightly slower than the current 6.19 release).
      
      There are still fields in the ImmutableCFOptions that are not CF options but DB options.  Eventually, I would like to move those into an ImmutableOptions (= ImmutableDBOptions+ImmutableCFOptions).  But that will be part of a future PR to minimize changes and disruptions.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8176
      
      Reviewed By: pdillinger
      
      Differential Revision: D27954339
      
      Pulled By: mrambacher
      
      fbshipit-source-id: ec6b805ba9afe6e094bffdbd76246c2d99aa9fad
      01e460d5
  17. 07 4月, 2021 1 次提交
  18. 26 3月, 2021 1 次提交
  19. 15 3月, 2021 1 次提交
    • M
      Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) · 3dff28cf
      mrambacher 提交于
      Summary:
      For performance purposes, the lower level routines were changed to use a SystemClock* instead of a std::shared_ptr<SystemClock>.  The shared ptr has some performance degradation on certain hardware classes.
      
      For most of the system, there is no risk of the pointer being deleted/invalid because the shared_ptr will be stored elsewhere.  For example, the ImmutableDBOptions stores the Env which has a std::shared_ptr<SystemClock> in it.  The SystemClock* within the ImmutableDBOptions is essentially a "short cut" to gain access to this constant resource.
      
      There were a few classes (PeriodicWorkScheduler?) where the "short cut" property did not hold.  In those cases, the shared pointer was preserved.
      
      Using db_bench readrandom perf_level=3 on my EC2 box, this change performed as well or better than 6.17:
      
      6.17: readrandom   :      28.046 micros/op 854902 ops/sec;   61.3 MB/s (355999 of 355999 found)
      6.18: readrandom   :      32.615 micros/op 735306 ops/sec;   52.7 MB/s (290999 of 290999 found)
      PR: readrandom   :      27.500 micros/op 871909 ops/sec;   62.5 MB/s (367999 of 367999 found)
      
      (Note that the times for 6.18 are prior to revert of the SystemClock).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8033
      
      Reviewed By: pdillinger
      
      Differential Revision: D27014563
      
      Pulled By: mrambacher
      
      fbshipit-source-id: ad0459eba03182e454391b5926bf5cdd45657b67
      3dff28cf
  20. 11 2月, 2021 1 次提交
    • Z
      Handoff checksum Implementation (#7523) · d1c510ba
      Zhichao Cao 提交于
      Summary:
      in PR https://github.com/facebook/rocksdb/issues/7419 , we introduce the new Append and PositionedAppend APIs to WritableFile at File System, which enable RocksDB to pass the data verification information (e.g., checksum of the data) to the lower layer. In this PR, we use the new API in WritableFileWriter, such that the file created via WritableFileWrite can pass the checksum to the storage layer. To control which types file should apply the checksum handoff, we add checksum_handoff_file_types to DBOptions. User can use this option to control which file types (Currently supported file tyes: kLogFile, kTableFile, kDescriptorFile.) should use the new Append and PositionedAppend APIs to handoff the verification information.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7523
      
      Test Plan: add new unit test, pass make check/ make asan_check
      
      Reviewed By: pdillinger
      
      Differential Revision: D24313271
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: aafd69091ae85c3318e3e17cbb96fe7338da11d0
      d1c510ba
  21. 29 1月, 2021 1 次提交
    • M
      Remove Legacy and Custom FileWrapper classes from header files (#7851) · 4a09d632
      mrambacher 提交于
      Summary:
      Removed the uses of the Legacy FileWrapper classes from the source code.  The wrappers were creating an additional layer of indirection/wrapping, as the Env already has a FileSystem.
      
      Moved the Custom FileWrapper classes into the CustomEnv, as these classes are really for the private use the the CustomEnv class.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7851
      
      Reviewed By: anand1976
      
      Differential Revision: D26114816
      
      Pulled By: mrambacher
      
      fbshipit-source-id: db32840e58d969d3a0fa6c25aaf13d6dcdc74150
      4a09d632
  22. 26 1月, 2021 1 次提交
    • M
      Add a SystemClock class to capture the time functions of an Env (#7858) · 12f11373
      mrambacher 提交于
      Summary:
      Introduces and uses a SystemClock class to RocksDB.  This class contains the time-related functions of an Env and these functions can be redirected from the Env to the SystemClock.
      
      Many of the places that used an Env (Timer, PerfStepTimer, RepeatableThread, RateLimiter, WriteController) for time-related functions have been changed to use SystemClock instead.  There are likely more places that can be changed, but this is a start to show what can/should be done.  Over time it would be nice to migrate most (if not all) of the uses of the time functions from the Env to the SystemClock.
      
      There are several Env classes that implement these functions.  Most of these have not been converted yet to SystemClock implementations; that will come in a subsequent PR.  It would be good to unify many of the Mock Timer implementations, so that they behave similarly and be tested similarly (some override Sleep, some use a MockSleep, etc).
      
      Additionally, this change will allow new methods to be introduced to the SystemClock (like https://github.com/facebook/rocksdb/issues/7101 WaitFor) in a consistent manner across a smaller number of classes.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7858
      
      Reviewed By: pdillinger
      
      Differential Revision: D26006406
      
      Pulled By: mrambacher
      
      fbshipit-source-id: ed10a8abbdab7ff2e23d69d85bd25b3e7e899e90
      12f11373
  23. 29 9月, 2020 1 次提交
  24. 09 9月, 2020 1 次提交
    • A
      Store FSWritableFilePtr object in WritableFileWriter (#7193) · b175eceb
      Akanksha Mahajan 提交于
      Summary:
      Replace FSWritableFile pointer with FSWritableFilePtr
          object in WritableFileWriter.
          This new object wraps FSWritableFile pointer.
      
          Objective: If tracing is enabled, FSWritableFile Ptr returns
          FSWritableFileTracingWrapper pointer that includes all necessary
          information in IORecord and calls underlying FileSystem and invokes
          IOTracer to dump that record in a binary file. If tracing is disabled
          then, underlying FileSystem pointer is returned directly.
          FSWritableFilePtr wrapper class is added to bypass the
          FSWritableFileWrapper when
          tracing is disabled.
      
          Test Plan: make check -j64
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7193
      
      Reviewed By: anand1976
      
      Differential Revision: D23355915
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: e62a27a13c1fd77e36a6dbafc7006d969bed25cf
      b175eceb
  25. 18 6月, 2020 1 次提交
    • Z
      Store DB identity and DB session ID in SST files (#6983) · 94d04529
      Zitan Chen 提交于
      Summary:
      `db_id` and `db_session_id` are now part of the table properties for all formats and stored in SST files. This adds about 99 bytes to each new SST file.
      
      The `TablePropertiesNames` for these two identifiers are `rocksdb.creating.db.identity` and `rocksdb.creating.session.identity`.
      
      In addition, SST files generated from SstFileWriter and Repairer have DB identity “SST Writer” and “DB Repairer”, respectively. Their DB session IDs are generated in the same way as `DB::GetDbSessionId`.
      
      A table property test is added.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6983
      
      Test Plan: make check and some manual tests.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22048826
      
      Pulled By: gg814
      
      fbshipit-source-id: afdf8c11424a6f509b5c0b06dafad584a80103c9
      94d04529
  26. 21 5月, 2020 1 次提交
    • Z
      Generate file checksum in SstFileWriter (#6859) · 545e14b5
      Zhichao Cao 提交于
      Summary:
      If Option.file_checksum_gen_factory is set, rocksdb generates the file checksum during flush and compaction based on the checksum generator created by the factory and store the checksum and function name in vstorage and Manifest.
      
      This PR enable file checksum generation in SstFileWrite and store the checksum and checksum function name in the  ExternalSstFileInfo, such that application can use them for other purpose, for example, ingest the file checksum with files in IngestExternalFile().
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6859
      
      Test Plan: add unit test and pass make asan_check.
      
      Reviewed By: ajkr
      
      Differential Revision: D21656247
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 78a3570c76031d8832e3d2de3d6c79cdf2b675d0
      545e14b5
  27. 01 4月, 2020 1 次提交
    • S
      Make options.bottommost_compression, compression_opts and... · 80979f81
      sdong 提交于
      Make options.bottommost_compression, compression_opts and bottommost_compression_opts dynamically changeable. (#6615)
      
      Summary:
      These three options should be made dynamically changeable. Simply add them to MutableCFOptions and made the change.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6615
      
      Test Plan: Add a unit test to make sure that SetOptions() can change the options.
      
      Reviewed By: riversand963
      
      Differential Revision: D20755951
      
      fbshipit-source-id: 8165f4fd7a7a665cc7fb049698935022a5d2e7ff
      80979f81
  28. 29 2月, 2020 1 次提交
  29. 21 2月, 2020 1 次提交
    • S
      Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) · fdf882de
      sdong 提交于
      Summary:
      When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
      
      Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
      
      Differential Revision: D19977691
      
      fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
      fdf882de
  30. 07 1月, 2020 1 次提交
    • Y
      Improve error msg for SstFileWriter Merge (#6261) · 946c43a0
      Yanqin Jin 提交于
      Summary:
      Reword the error message when keys are not added in strict ascending order.
      Specifically, original error message is not clear when application tries to
      call SstFileWriter::Merge() with duplicate keys.
      
      Test plan (dev server)
      ```
      make check
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6261
      
      Differential Revision: D19290398
      
      Pulled By: riversand963
      
      fbshipit-source-id: 4dc30a701414e6894db2eb024e3734470c22b371
      946c43a0
  31. 14 12月, 2019 1 次提交
    • A
      Introduce a new storage specific Env API (#5761) · afa2420c
      anand76 提交于
      Summary:
      The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc.
      
      This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO.
      
      The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before.
      
      This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection.
      
      The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761
      
      Differential Revision: D18868376
      
      Pulled By: anand1976
      
      fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
      afa2420c
  32. 17 9月, 2019 1 次提交
    • S
      Divide file_reader_writer.h and .cc (#5803) · b931f84e
      sdong 提交于
      Summary:
      file_reader_writer.h and .cc contain several files and helper function, and it's hard to navigate. Separate it to multiple files and put them under file/
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5803
      
      Test Plan: Build whole project using make and cmake.
      
      Differential Revision: D17374550
      
      fbshipit-source-id: 10efca907721e7a78ed25bbf74dc5410dea05987
      b931f84e
  33. 31 5月, 2019 3 次提交
  34. 19 3月, 2019 1 次提交
    • S
      Feature for sampling and reporting compressibility (#4842) · b45b1cde
      Shobhit Dayal 提交于
      Summary:
      This is a feature to sample data-block compressibility and and report them as stats. 1 in N (tunable) blocks is sampled for compressibility using two algorithms:
      1. lz4 or snappy for fast compression
      2. zstd or zlib for slow but higher compression.
      
      The stats are reported to the caller as raw-bytes and compressed-bytes. The block continues to be compressed for storage using the specified CompressionType.
      
      The db_bench_tool how has a command line option for specifying the sampling rate. It's default value is 0 (no sampling). To test the overhead for a certain value, users can compare the performance of db_bench_tool, varying the sampling rate. It is unlikely to have a noticeable impact for high values like 20.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4842
      
      Differential Revision: D13629011
      
      Pulled By: shobhitdayal
      
      fbshipit-source-id: 14ca668bcab6499b2a1734edf848eb62a4f4fafa
      b45b1cde
  35. 12 2月, 2019 1 次提交
    • A
      Reduce scope of compression dictionary to single SST (#4952) · 62f70f6d
      Andrew Kryczka 提交于
      Summary:
      Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio.
      
      So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include:
      
      - The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called.
      - After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up.
      - Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952
      
      Differential Revision: D13967980
      
      Pulled By: ajkr
      
      fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f
      62f70f6d
  36. 30 1月, 2019 1 次提交
  37. 13 10月, 2018 1 次提交
    • Y
      Add listener to sample file io (#3933) · 729a617b
      Yanqin Jin 提交于
      Summary:
      We would like to collect file-system-level statistics including file name, offset, length, return code, latency, etc., which requires to add callbacks to intercept file IO function calls when RocksDB is running.
      To collect file-system-level statistics, users can inherit the class `EventListener`, as in `TestFileOperationListener `. Note that `TestFileOperationListener::ShouldBeNotifiedOnFileIO()` returns true.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3933
      
      Differential Revision: D10219571
      
      Pulled By: riversand963
      
      fbshipit-source-id: 7acc577a2d31097766a27adb6f78eaf8b1e8ff15
      729a617b