1. 23 5月, 2020 1 次提交
    • P
      Fix/expand ASSERT_STATUS_CHECKED build, add to Travis (#6870) · 35a25a3f
      Peter Dillinger 提交于
      Summary:
      Fixed some option handling code that recently broke the
      ASSERT_STATUS_CHECKED build for options_test.
      
      Added all other existing tests that pass under ASSERT_STATUS_CHECKED to
      the whitelist.
      
      Added a Travis configuration to run all whitelisted tests with
      ASSERT_STATUS_CHECKED. (Someday we might enable this check by default in
      debug builds.)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6870
      
      Test Plan: ASSERT_STATUS_CHECKED=1 make check, Travis
      
      Reviewed By: ajkr
      
      Differential Revision: D21704374
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 15daef98136a19d7a6843fa0c9ec08738c2ac693
      35a25a3f
  2. 22 5月, 2020 1 次提交
    • M
      Add Struct Type to OptionsTypeInfo (#6425) · 38be6861
      mrambacher 提交于
      Summary:
      Added code for generically handing structs to OptionTypeInfo.  A struct is a collection of variables handled by their own map of OptionTypeInfos.  Examples of structs include Compaction and Cache options.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6425
      
      Reviewed By: siying
      
      Differential Revision: D21668789
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 064b110de39dadf82361ed4663f7ac1a535b0b07
      38be6861
  3. 09 5月, 2020 1 次提交
    • A
      prototype status check enforcement (#6798) · 1c846604
      Andrew Kryczka 提交于
      Summary:
      Tried making Status object enforce that it is checked in some way. In cases it is not checked, `PermitUncheckedError()` must be called explicitly.
      
      Added a way to run tests (`ASSERT_STATUS_CHECKED=1 make -j48 check`) on a
      whitelist. The effort appears significant to get each test to pass with
      this assertion, so I only fixed up enough to get one test (`options_test`)
      working and added it to the whitelist.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6798
      
      Reviewed By: pdillinger
      
      Differential Revision: D21377404
      
      Pulled By: ajkr
      
      fbshipit-source-id: 73236f9c8df38f01cf24ecac4a6d1661b72d077e
      1c846604
  4. 06 5月, 2020 1 次提交
    • M
      Add OptionTypeInfo::Enum and related methods (#6423) · 394f2bbd
      mrambacher 提交于
      Summary:
      Add methods and constructors for handling enums to the OptionTypeInfo.  This change allows enums to be converted/compared without adding a special "type" to the OptionType.
      
      This change addresses a couple of issues:
      - It allows new enumerated types to be added to the options without editing the OptionType base class (and related methods)
      - It standardizes the procedure for adding enumerated types to the options, reducing potential mistakes
      - It moves the enum maps to the location where they are used, allowing them to be static file members rather than global values
      - It reduces the number of types and cases that need to be handled in the various OptionType methods
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6423
      
      Reviewed By: siying
      
      Differential Revision: D21408713
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: fc492af285d011822578b95d186a0fce25d35626
      394f2bbd
  5. 01 5月, 2020 1 次提交
    • S
      Remove the support of setting CompressionOptions.parallel_threads from string for now (#6782) · 6504ae0c
      sdong 提交于
      Summary:
      The current way of implementing CompressionOptions.parallel_threads introduces a format change. We plan to change CompressionOptions's serailization format to a new JSON-like format, which would be another format change. We would like to consolidate the two format changes into one, rather than making some users to change twice. Hold CompressionOptions.parallel_threads from being supported by option string for now. Will add it back after the general CompressionOptions's format change.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6782
      
      Test Plan: Run all existing tests.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D21338614
      
      fbshipit-source-id: bca2dac3cb37d4e6e64b52cbbe8ea749cd848685
      6504ae0c
  6. 30 4月, 2020 1 次提交
  7. 29 4月, 2020 1 次提交
    • M
      Add Functions to OptionTypeInfo (#6422) · 618bf638
      mrambacher 提交于
      Summary:
      Added functions for parsing, serializing, and comparing elements to OptionTypeInfo.  These functions allow all of the special cases that could not be handled directly in the map of OptionTypeInfo to be moved into the map.  Using these functions, every type can be handled via the map rather than special cased.
      
      By adding these functions, the code for handling options can become more standardized (fewer special cases) and (eventually) handled completely by common classes.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6422
      
      Test Plan: pass make check
      
      Reviewed By: siying
      
      Differential Revision: D21269005
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 9ba71c721a38ebf9ee88259d60bd81b3282b9077
      618bf638
  8. 28 4月, 2020 1 次提交
  9. 25 4月, 2020 1 次提交
    • C
      Reduce memory copies when fetching and uncompressing blocks from SST files (#6689) · 40497a87
      Cheng Chang 提交于
      Summary:
      In https://github.com/facebook/rocksdb/pull/6455, we modified the interface of `RandomAccessFileReader::Read` to be able to get rid of memcpy in direct IO mode.
      This PR applies the new interface to `BlockFetcher` when reading blocks from SST files in direct IO mode.
      
      Without this PR, in direct IO mode, when fetching and uncompressing compressed blocks, `BlockFetcher` will first copy the raw compressed block into `BlockFetcher::compressed_buf_` or `BlockFetcher::stack_buf_` inside `RandomAccessFileReader::Read` depending on the block size. then during uncompressing, it will copy the uncompressed block into `BlockFetcher::heap_buf_`.
      
      In this PR, we get rid of the first memcpy and directly uncompress the block from `direct_io_buf_` to `heap_buf_`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6689
      
      Test Plan: A new unit test `block_fetcher_test` is added.
      
      Reviewed By: anand1976
      
      Differential Revision: D21006729
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 2370b92c24075692423b81277415feb2aed5d980
      40497a87
  10. 22 4月, 2020 2 次提交
    • M
      Add a ConfigOptions for use in comparing objects and converting to/from strings (#6389) · 4cbc19d2
      mrambacher 提交于
      Summary:
      The methods in convenience.h are used to compare/convert objects to/from strings.  There is a mishmash of parameters in use here with more needed in the future.  This PR replaces those parameters with a single structure.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6389
      
      Reviewed By: siying
      
      Differential Revision: D21163707
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: f807b4cc7e2b0af3871536b69546b2604dfa81bd
      4cbc19d2
    • A
      Implement deadline support for MultiGet (#6710) · c1ccd6b6
      anand76 提交于
      Summary:
      Initial implementation of ReadOptions.deadline for MultiGet. If the request takes longer than the deadline, the keys not yet found will be returned with Status::TimedOut(). This
      implementation enforces the deadline in DBImpl, which is fairly high
      level. Its best effort and may not check the deadline after every key
      lookup, but may do so after a batch of keys.
      
      In subsequent stages, we will extend this to passing a timeout down to the FileSystem.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6710
      
      Test Plan: Add new unit tests
      
      Reviewed By: riversand963
      
      Differential Revision: D21149158
      
      Pulled By: anand1976
      
      fbshipit-source-id: 9f44eecffeb40873f5034ed59a66d21f9f88879e
      c1ccd6b6
  11. 21 4月, 2020 1 次提交
    • A
      Set max_background_flushes dynamically (#6701) · 03a1d95d
      Akanksha Mahajan 提交于
      Summary:
      1. Add changes so that max_background_flushes can be set dynamically.
                         2. Add a testcase DBOptionsTest.SetBackgroundFlushThreads which set the
                              max_background_flushes dynamically using SetDBOptions.
      
      TestPlan:  1. make -j64 check
                        2. Using new testcase DBOptionsTest.SetBackgroundFlushThreads
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6701
      
      Reviewed By: ajkr
      
      Differential Revision: D21028010
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 5f949e4a8fd3c32537b637947b7ee09a69cfc7c1
      03a1d95d
  12. 14 4月, 2020 1 次提交
    • A
      Log CompactOnDeletionCollectorFactory parameters on DB open (#6686) · 3d6d7bcf
      anand76 提交于
      Summary:
      Log it in the info log to help in troubleshooting. It is logged as follows -
      ```
      2020/04/10-10:51:39.886662 7ffff7fef340                   Options.table_properties_collectors: CompactOnDeletionCollector (Sliding window size = 100 Deletion trigger = 90);
      ```
      
      Tests:
      make check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6686
      
      Reviewed By: ltamasi
      
      Differential Revision: D21002442
      
      Pulled By: anand1976
      
      fbshipit-source-id: 7adf0dbae7f1febcb00ce61fea5097118ede5c6a
      3d6d7bcf
  13. 11 4月, 2020 1 次提交
    • H
      make iterator return versions between timestamp bounds (#6544) · 9e89ffb7
      Huisheng Liu 提交于
      Summary:
      (Based on Yanqin's idea) Add a new field in readoptions as lower timestamp bound for iterator. When the parameter is not supplied (nullptr), the iterator returns the latest visible version of a record. When it is supplied, the existing timestamp field is the upper bound. Together the two serves as a bounded time window. The iterator returns all versions of a record falling in the window.
      
      SeekRandom perf test (10 minutes) on the same development machine ram drive with the same DB data shows no regression (within marge of error). The test is adapted from https://github.com/facebook/rocksdb/wiki/RocksDB-In-Memory-Workload-Performance-Benchmarks.
      base line (commit e860f884):
      seekrandom   : 7.836 micros/op 4082449 ops/sec; (0 of 73481999 found)
      This PR:
      seekrandom   : 7.764 micros/op 4120935 ops/sec; (0 of 71303999 found)
      
      db_bench --db=r:\rocksdb.github --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --cache_size=2147483648 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=r:\rocksdb.github\WAL_LOG --sync=0 --verify_checksum=1 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --duration=600 --benchmarks=seekrandom --use_existing_db=1 --num=25000000 --threads=32 --allow_concurrent_memtable_write=0
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6544
      
      Reviewed By: ltamasi
      
      Differential Revision: D20844069
      
      Pulled By: riversand963
      
      fbshipit-source-id: d97f2bf38a323c8c6a68db213b2d3c694b1c1f74
      9e89ffb7
  14. 10 4月, 2020 1 次提交
  15. 09 4月, 2020 1 次提交
  16. 04 4月, 2020 1 次提交
    • M
      Move the OptionTypeMap code closer to home (#6198) · 259b6ec8
      mrambacher 提交于
      Summary:
      This is a predecessor to the Configurable PR.  This change moves the OptionTypeInfo maps closer to where they will be used.
      
      When the Configurable changes are adopted, these values will become static and not associated with the OptionsHelper.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6198
      
      Reviewed By: siying
      
      Differential Revision: D20778108
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: a9f85fc73bc53503656e1958ecc1e764052fd1aa
      259b6ec8
  17. 02 4月, 2020 1 次提交
    • Z
      Add pipelined & parallel compression optimization (#6262) · 03a781a9
      Ziyue Yang 提交于
      Summary:
      This PR adds support for pipelined & parallel compression optimization for `BlockBasedTableBuilder`. This optimization makes block building, block compression and block appending a pipeline, and uses multiple threads to accelerate block compression. Users can set `CompressionOptions::parallel_threads` greater than 1 to enable compression parallelism.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6262
      
      Reviewed By: ajkr
      
      Differential Revision: D20651306
      
      fbshipit-source-id: 62125590a9c15b6d9071def9dc72589c1696a4cb
      03a781a9
  18. 01 4月, 2020 1 次提交
    • S
      Make options.bottommost_compression, compression_opts and... · 80979f81
      sdong 提交于
      Make options.bottommost_compression, compression_opts and bottommost_compression_opts dynamically changeable. (#6615)
      
      Summary:
      These three options should be made dynamically changeable. Simply add them to MutableCFOptions and made the change.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6615
      
      Test Plan: Add a unit test to make sure that SetOptions() can change the options.
      
      Reviewed By: riversand963
      
      Differential Revision: D20755951
      
      fbshipit-source-id: 8165f4fd7a7a665cc7fb049698935022a5d2e7ff
      80979f81
  19. 30 3月, 2020 1 次提交
    • Z
      Use FileChecksumGenFactory for SST file checksum (#6600) · e8d332d9
      Zhichao Cao 提交于
      Summary:
      In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6600
      
      Test Plan: tested with make asan_check
      
      Reviewed By: riversand963
      
      Differential Revision: D20717670
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 2a74c1c280ac11a07a1980185b43b671acaa71c6
      e8d332d9
  20. 24 3月, 2020 1 次提交
    • A
      Simplify migration to FileSystem API (#6552) · a9d168cf
      anand76 提交于
      Summary:
      The current Env/FileSystem API separation has a couple of issues -
      1. It requires the user to specify 2 options - ```Options::env``` and ```Options::file_system``` - which means they have to make code changes to benefit from the new APIs. Furthermore, there is a risk of accessing the same APIs in two different ways, through Env in the old way and through FileSystem in the new way. The two may not always match, for example, if env is ```PosixEnv``` and FileSystem is a custom implementation. Any stray RocksDB calls to env will use the ```PosixEnv``` implementation rather than the file_system implementation.
      2. There needs to be a simple way for the FileSystem developer to instantiate an Env for backward compatibility purposes.
      
      This PR solves the above issues and simplifies the migration in the following ways -
      1. Embed a shared_ptr to the ```FileSystem``` in the ```Env```, and remove ```Options::file_system``` as a configurable option. This way, no code changes will be required in application code to benefit from the new API. The default Env constructor uses a ```LegacyFileSystemWrapper``` as the embedded ```FileSystem```.
      1a. - This also makes it more robust by ensuring that even if RocksDB
        has some stray calls to Env APIs rather than FileSystem, they will go
        through the same object and thus there is no risk of getting out of
        sync.
      2. Provide a ```NewCompositeEnv()``` API that can be used to construct a
      PosixEnv with a custom FileSystem implementation. This eliminates an
      indirection to call Env APIs, and relieves the FileSystem developer of
      the burden of having to implement wrappers for the Env APIs.
      3. Add a couple of missing FileSystem APIs - ```SanitizeEnvOptions()``` and
      ```NewLogger()```
      
      Tests:
      1. New unit tests
      2. make check and make asan_check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6552
      
      Reviewed By: riversand963
      
      Differential Revision: D20592038
      
      Pulled By: anand1976
      
      fbshipit-source-id: c3801ad4153f96d21d5a3ae26c92ba454d1bf1f7
      a9d168cf
  21. 21 3月, 2020 1 次提交
    • Y
      Attempt to recover from db with missing table files (#6334) · fb09ef05
      Yanqin Jin 提交于
      Summary:
      There are situations when RocksDB tries to recover, but the db is in an inconsistent state due to SST files referenced in the MANIFEST being missing. In this case, previous RocksDB will just fail the recovery and return a non-ok status.
      This PR enables another possibility. During recovery, RocksDB checks possible MANIFEST files, and try to recover to the most recent state without missing table file. `VersionSet::Recover()` applies version edits incrementally and "materializes" a version only when this version does not reference any missing table file. After processing the entire MANIFEST, the version created last will be the latest version.
      `DBImpl::Recover()` calls `VersionSet::Recover()`. Afterwards, WAL replay will *not* be performed.
      To use this capability, set `options.best_efforts_recovery = true` when opening the db. Best-efforts recovery is currently incompatible with atomic flush.
      
      Test plan (on devserver):
      ```
      $make check
      $COMPILE_WITH_ASAN=1 make all && make check
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6334
      
      Reviewed By: anand1976
      
      Differential Revision: D19778960
      
      Pulled By: riversand963
      
      fbshipit-source-id: c27ea80f29bc952e7d3311ecf5ee9c54393b40a8
      fb09ef05
  22. 12 3月, 2020 1 次提交
  23. 21 2月, 2020 1 次提交
    • S
      Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) · fdf882de
      sdong 提交于
      Summary:
      When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
      
      Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
      
      Differential Revision: D19977691
      
      fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
      fdf882de
  24. 11 2月, 2020 3 次提交
    • Z
      Checksum for each SST file and stores in MANIFEST (#6216) · 4369f2c7
      Zhichao Cao 提交于
      Summary:
      In the current code base, RocksDB generate the checksum for each block and verify the checksum at usage. Current PR enable SST file checksum. After a SST file is generated by Flush or Compaction, RocksDB generate the SST file checksum and store the checksum value and checksum method name in the vs_info and MANIFEST as part for the FileMetadata.
      
      Added the enable_sst_file_checksum to Options to enable or disable file checksum. Added sst_file_checksum to Options such that user can plugin their own SST file checksum calculate method via overriding the SstFileChecksum class. The checksum information inlcuding uint32_t checksum value and a checksum name (string).  A new tool is added to LDB such that user can dump out a list of file checksum information from MANIFEST. If user enables the file checksum but does not provide the sst_file_checksum instance, RocksDB will use the default crc32checksum implemented in table/sst_file_checksum_crc32c.h
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6216
      
      Test Plan: Added the testing case in table_test and ldb_cmd_test to verify checksum is correct in different level. Pass make asan_check.
      
      Differential Revision: D19171461
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: b2e53479eefc5bb0437189eaa1941670e5ba8b87
      4369f2c7
    • S
      Make clang analyze happy with options_test (#6398) · 594e815e
      sdong 提交于
      Summary:
      clang analysis shows following warning:
      
      options/options_test.cc:1554:24: warning: The left operand of '-' is a garbage value
                  (file_size - 1) / readahead_size + 1);
                   ~~~~~~~~~ ^
      
      Explicitly initialize file_size and add an assertion to make clang analysis happy.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6398
      
      Test Plan: Run "make analysis" and see the warning goes away.
      
      Differential Revision: D19819662
      
      fbshipit-source-id: 1589ea91c0c8f78242538f01448e4ad0e5fbc219
      594e815e
    • S
      Try to fix some analysis failures · b2bc1da5
      sdong 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6384
      
      Test Plan: Wait and see the analysis result.
      
      Differential Revision: D19781072
      
      fbshipit-source-id: 75e7cb6ee619ebd289841eaabea03dd075c09d3b
      b2bc1da5
  25. 08 2月, 2020 1 次提交
    • S
      Allow readahead when reading option files. (#6372) · 876c2dbf
      sdong 提交于
      Summary:
      Right, when reading from option files, no readahead is used and 8KB buffer is used. It might introduce high latency if the file system provide high latency and doesn't do readahead. Instead, introduce a readahead to the file. When calling inside DB, infer the value from options.log_readahead. Otherwise, a default 512KB readahead size is used.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6372
      
      Test Plan: Add --log_readahead_size in db_bench. Run it with several options and observe read size from option files using strace.
      
      Differential Revision: D19727739
      
      fbshipit-source-id: e6d8053b0a64259abc087f1f388b9cd66fa8a583
      876c2dbf
  26. 04 2月, 2020 1 次提交
    • M
      Add an option to prevent DB::Open() from querying sizes of all sst files (#6353) · 637e64b9
      Mike Kolupaev 提交于
      Summary:
      When paranoid_checks is on, DBImpl::CheckConsistency() iterates over all sst files and calls Env::GetFileSize() for each of them. As far as I could understand, this is pretty arbitrary and doesn't affect correctness - if filesystem doesn't corrupt fsynced files, the file sizes will always match; if it does, it may as well corrupt contents as well as sizes, and rocksdb doesn't check contents on open.
      
      If there are thousands of sst files, getting all their sizes takes a while. If, on top of that, Env is overridden to use some remote storage instead of local filesystem, it can be *really* slow and overload the remote storage service. This PR adds an option to not do GetFileSize(); instead it does GetChildren() for parent directory to check that all the expected sst files are at least present, but doesn't check their sizes.
      
      We can't just disable paranoid_checks instead because paranoid_checks do a few other important things: make the DB read-only on write errors, print error messages on read errors, etc.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6353
      
      Test Plan: ran the added sanity check unit test. Will try it out in a LogDevice test cluster where the GetFileSize() calls are causing a lot of trouble.
      
      Differential Revision: D19656425
      
      Pulled By: al13n321
      
      fbshipit-source-id: c2c421b367633033760d1f56747bad206d1fbf82
      637e64b9
  27. 29 1月, 2020 1 次提交
    • S
      Add ReadOptions.auto_prefix_mode (#6314) · 8f2bee67
      sdong 提交于
      Summary:
      Add a new option ReadOptions.auto_prefix_mode. When set to true, iterator should return the same result as total order seek, but may choose to do prefix seek internally, based on iterator upper bounds. Also fix two previous bugs when handling prefix extrator changes: (1) reverse iterator should not rely on upper bound to determine prefix. Fix it with skipping prefix check. (2) block-based filter is not handled properly.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6314
      
      Test Plan: (1) add a unit test; (2) add the check to stress test and run see whether it can pass at least one run.
      
      Differential Revision: D19458717
      
      fbshipit-source-id: 51c1bcc5cdd826c2469af201979a39600e779bce
      8f2bee67
  28. 14 12月, 2019 1 次提交
    • A
      Introduce a new storage specific Env API (#5761) · afa2420c
      anand76 提交于
      Summary:
      The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc.
      
      This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO.
      
      The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before.
      
      This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection.
      
      The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761
      
      Differential Revision: D18868376
      
      Pulled By: anand1976
      
      fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
      afa2420c
  29. 27 11月, 2019 2 次提交
    • P
      Allow fractional bits/key in BloomFilterPolicy (#6092) · 57f30322
      Peter Dillinger 提交于
      Summary:
      There's no technological impediment to allowing the Bloom
      filter bits/key to be non-integer (fractional/decimal) values, and it
      provides finer control over the memory vs. accuracy trade-off. This is
      especially handy in using the format_version=5 Bloom filter in place
      of the old one, because bits_per_key=9.55 provides the same accuracy as
      the old bits_per_key=10.
      
      This change not only requires refining the logic for choosing the best
      num_probes for a given bits/key setting, it revealed a flaw in that logic.
      As bits/key gets higher, the best num_probes for a cache-local Bloom
      filter is closer to bpk / 2 than to bpk * 0.69, the best choice for a
      standard Bloom filter. For example, at 16 bits per key, the best
      num_probes is 9 (FP rate = 0.0843%) not 11 (FP rate = 0.0884%).
      This change fixes and refines that logic (for the format_version=5
      Bloom filter only, just in case) based on empirical tests to find
      accuracy inflection points between each num_probes.
      
      Although bits_per_key is now specified as a double, the new Bloom
      filter converts/rounds this to "millibits / key" for predictable/precise
      internal computations. Just in case of unforeseen compatibility
      issues, we round to the nearest whole number bits / key for the
      legacy Bloom filter, so as not to unlock new behaviors for it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6092
      
      Test Plan: unit tests included
      
      Differential Revision: D18711313
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 1aa73295f152a995328cb846ef9157ae8a05522a
      57f30322
    • P
      Remove unused/undefined ImmutableCFOptions() (#6086) · 4f17d33d
      Peter Dillinger 提交于
      Summary:
      default constructor not used or even defined
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6086
      
      Differential Revision: D18695669
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 6b6ac46029f4fb6edf1c11ee6ce1d9f172b2eaf2
      4f17d33d
  30. 21 9月, 2019 1 次提交
  31. 20 9月, 2019 1 次提交
  32. 19 9月, 2019 1 次提交
  33. 17 9月, 2019 1 次提交
    • S
      Divide file_reader_writer.h and .cc (#5803) · b931f84e
      sdong 提交于
      Summary:
      file_reader_writer.h and .cc contain several files and helper function, and it's hard to navigate. Separate it to multiple files and put them under file/
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5803
      
      Test Plan: Build whole project using make and cmake.
      
      Differential Revision: D17374550
      
      fbshipit-source-id: 10efca907721e7a78ed25bbf74dc5410dea05987
      b931f84e
  34. 12 9月, 2019 1 次提交
  35. 10 9月, 2019 1 次提交
  36. 03 9月, 2019 1 次提交
    • V
      Persistent globally unique DB ID in manifest (#5725) · 979fbdc6
      Vijay Nadimpalli 提交于
      Summary:
      Each DB has a globally unique ID. A DB can be physically copied around, or backed-up and restored, and the users should be identify the same DB. This unique ID right now is stored as plain text in file IDENTITY under the DB directory. This approach introduces at least two problems: (1) the file is not checksumed; (2) the source of truth of a DB is the manifest file, which can be copied separately from IDENTITY file, causing the DB ID to be wrong.
      The goal of this PR is solve this problem by moving the  DB ID to manifest. To begin with we will write to both identity file and manifest. Write to Manifest is controlled via the flag write_dbid_to_manifest in Options and default is false.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5725
      
      Test Plan: Added unit tests.
      
      Differential Revision: D16963840
      
      Pulled By: vjnadimpalli
      
      fbshipit-source-id: 8a86a4c8c82c716003c40fd6b9d2d758030d92e9
      979fbdc6