- 15 6月, 2021 1 次提交
-
-
由 mrambacher 提交于
Summary: - Added CreateFromString method to Env and FilesSystem to replace LoadEnv/Load. This method/signature is a precursor to making these classes extend Customizable. - Added CreateFromSystem to Env. This method standardizes creating an Env from the environment variables. Previously, some places would check TEST_ENV_URI and others would also check TEST_FS_URI. Now the code is more command/standardized. - Added CreateFromFlags to Env. These method allows Env to be create from string options (such as GFLAGS options) in a more standard way. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8174 Reviewed By: zhichao-cao Differential Revision: D28999603 Pulled By: mrambacher fbshipit-source-id: 88e6911e7e91f908458a7fe10a20e93ecbc275fb
-
- 21 5月, 2021 1 次提交
-
-
由 sdong 提交于
Summary: When a memtable is flushed, it will validate number of entries it reads, and compare the number with how many entries inserted into memtable. This serves as one sanity c\ heck against memory corruption. This change will also allow more counters to be added in the future for better validation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8288 Test Plan: Pass all existing tests Reviewed By: ajkr Differential Revision: D28369194 fbshipit-source-id: 7ff870380c41eab7f99eee508550dcdce32838ad
-
- 20 5月, 2021 1 次提交
-
-
由 Jay Zhuang 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8300 Reviewed By: ajkr Differential Revision: D28464726 Pulled By: jay-zhuang fbshipit-source-id: 49e9f4fb791808a6cbf39a7b1a331373f645fc5e
-
- 14 5月, 2021 1 次提交
-
-
由 Jay Zhuang 提交于
Summary: And replace `reinterpret_cast` with `static_cast` or no cast. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8295 Test Plan: `make check` Reviewed By: mrambacher Differential Revision: D28420303 Pulled By: jay-zhuang fbshipit-source-id: 645be123a0df624dc2bea37cd54a35403fc494fa
-
- 12 5月, 2021 1 次提交
-
-
由 mrambacher 提交于
Summary: Added ParseType, SerializeType, and TypesAreEqual methods to OptionTypeInfo. These methods can be used for serialization and deserialization of basic types. Change the MutableCF/DB Options to use this format. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8249 Reviewed By: jay-zhuang Differential Revision: D28351190 Pulled By: mrambacher fbshipit-source-id: 72a78643b804f2f0bf59c32ffefa63346672ad16
-
- 06 5月, 2021 1 次提交
-
-
由 mrambacher 提交于
Summary: The ImmutableCFOptions contained a bunch of fields that belonged to the ImmutableDBOptions. This change cleans that up by introducing an ImmutableOptions struct. Following the pattern of Options struct, this class inherits from the DB and CFOption structs (of the Immutable form). Only one structural change (the ImmutableCFOptions::fs was changed to a shared_ptr from a raw one) is in this PR. All of the other changes involve moving the member variables from the ImmutableCFOptions into the ImmutableOptions and changing member variables or function parameters as required for compilation purposes. Follow-on PRs may do a further clean-up of the code, such as renaming variables (such as "ImmutableOptions cf_options") and potentially eliminating un-needed function parameters (there is no longer a need to pass both an ImmutableDBOptions and an ImmutableOptions to a function). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8262 Reviewed By: pdillinger Differential Revision: D28226540 Pulled By: mrambacher fbshipit-source-id: 18ae71eadc879dedbe38b1eb8e6f9ff5c7147dbf
-
- 27 4月, 2021 1 次提交
-
-
由 mrambacher 提交于
Summary: Renaming ImmutableCFOptions::info_log and statistics to logger and stats. This is stage 2 in creating an ImmutableOptions class. It is necessary because the names match those in ImmutableOptions and have different types. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8227 Reviewed By: jay-zhuang Differential Revision: D28000967 Pulled By: mrambacher fbshipit-source-id: 3bf2aa04e8f1e8724d825b7deacf41080c14420b
-
- 26 4月, 2021 1 次提交
-
-
由 mrambacher 提交于
Summary: As previously coded, a Configurable extension would need access to code not in the public API. This change moves RegisterOptions into the Configurable class and therefore available to public extensions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8223 Reviewed By: anand1976 Differential Revision: D27960188 Pulled By: mrambacher fbshipit-source-id: ac88b19397183df633902def5b5701b9b65fbf40
-
- 23 4月, 2021 1 次提交
-
-
由 mrambacher 提交于
Summary: This PR is a first step at attempting to clean up some of the Mutable/Immutable Options code. With this change, a DBOption and a ColumnFamilyOption can be reconstructed from their Mutable and Immutable equivalents, respectively. readrandom tests do not show any performance degradation versus master (though both are slightly slower than the current 6.19 release). There are still fields in the ImmutableCFOptions that are not CF options but DB options. Eventually, I would like to move those into an ImmutableOptions (= ImmutableDBOptions+ImmutableCFOptions). But that will be part of a future PR to minimize changes and disruptions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8176 Reviewed By: pdillinger Differential Revision: D27954339 Pulled By: mrambacher fbshipit-source-id: ec6b805ba9afe6e094bffdbd76246c2d99aa9fad
-
- 15 3月, 2021 1 次提交
-
-
由 mrambacher 提交于
Summary: For performance purposes, the lower level routines were changed to use a SystemClock* instead of a std::shared_ptr<SystemClock>. The shared ptr has some performance degradation on certain hardware classes. For most of the system, there is no risk of the pointer being deleted/invalid because the shared_ptr will be stored elsewhere. For example, the ImmutableDBOptions stores the Env which has a std::shared_ptr<SystemClock> in it. The SystemClock* within the ImmutableDBOptions is essentially a "short cut" to gain access to this constant resource. There were a few classes (PeriodicWorkScheduler?) where the "short cut" property did not hold. In those cases, the shared pointer was preserved. Using db_bench readrandom perf_level=3 on my EC2 box, this change performed as well or better than 6.17: 6.17: readrandom : 28.046 micros/op 854902 ops/sec; 61.3 MB/s (355999 of 355999 found) 6.18: readrandom : 32.615 micros/op 735306 ops/sec; 52.7 MB/s (290999 of 290999 found) PR: readrandom : 27.500 micros/op 871909 ops/sec; 62.5 MB/s (367999 of 367999 found) (Note that the times for 6.18 are prior to revert of the SystemClock). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8033 Reviewed By: pdillinger Differential Revision: D27014563 Pulled By: mrambacher fbshipit-source-id: ad0459eba03182e454391b5926bf5cdd45657b67
-
- 20 2月, 2021 1 次提交
-
-
由 mrambacher 提交于
Fix handling of Mutable options; Allow DB::SetOptions to update mutable TableFactory Options (#7936) Summary: Added a "only_mutable_options" flag to the ConfigOptions. When set, the Configurable methods will only look at/update options that are marked as kMutable. Fixed DB::SetOptions to allow for the update of any mutable TableFactory options. Fixes https://github.com/facebook/rocksdb/issues/7385. Added tests for the new flag. Updated HISTORY.md Pull Request resolved: https://github.com/facebook/rocksdb/pull/7936 Reviewed By: akankshamahajan15 Differential Revision: D26389646 Pulled By: mrambacher fbshipit-source-id: 6dc247f6e999fa2814059ebbd0af8face109fea0
-
- 11 2月, 2021 1 次提交
-
-
由 Zhichao Cao 提交于
Summary: in PR https://github.com/facebook/rocksdb/issues/7419 , we introduce the new Append and PositionedAppend APIs to WritableFile at File System, which enable RocksDB to pass the data verification information (e.g., checksum of the data) to the lower layer. In this PR, we use the new API in WritableFileWriter, such that the file created via WritableFileWrite can pass the checksum to the storage layer. To control which types file should apply the checksum handoff, we add checksum_handoff_file_types to DBOptions. User can use this option to control which file types (Currently supported file tyes: kLogFile, kTableFile, kDescriptorFile.) should use the new Append and PositionedAppend APIs to handoff the verification information. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7523 Test Plan: add new unit test, pass make check/ make asan_check Reviewed By: pdillinger Differential Revision: D24313271 Pulled By: zhichao-cao fbshipit-source-id: aafd69091ae85c3318e3e17cbb96fe7338da11d0
-
- 20 10月, 2020 1 次提交
-
-
由 anand76 提交于
Summary: This PR adds support for writing a location identifier of the DB host to SST files as a table property. By default, the hostname is used, but can be overridden by the user. There have been some recent corruptions in files written by ```SstFileWriter``` before checksumming, so this property can be used to trace it back to the writing host and checking the host for hardware isues. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7479 Test Plan: Add new unit tests Reviewed By: pdillinger Differential Revision: D24340671 Pulled By: anand1976 fbshipit-source-id: 2038949fd8d160c0633ccb4f9da77740f19fa2a2
-
- 10 10月, 2020 1 次提交
-
-
由 Cheng Chang 提交于
Summary: This option determines whether WALs will be tracked in MANIFEST and verified on recovery. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7275 Test Plan: db_options_test options_test Reviewed By: pdillinger Differential Revision: D23181418 Pulled By: cheng-chang fbshipit-source-id: 5dd1cdc166f3dfc1c93c094df4a2f7734e3b4547
-
- 30 9月, 2020 1 次提交
-
-
由 Akanksha Mahajan 提交于
Summary: Add a new Option "allow_data_in_errors". When it's set by users, it allows them to opt-in to get error messages containing corrupted keys/values. Corrupt keys, values will be logged in the messages, logs, status etc. that will help users with the useful information regarding affected data. By default value is set false to prevent users data to be exposed in the messages. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7420 Test Plan: 1. make check -j64 2. Add a new test case Reviewed By: ajkr Differential Revision: D23835028 Pulled By: akankshamahajan15 fbshipit-source-id: 8d2eba8fb898e79fcf1fccc07295065a75eb59b1
-
- 15 9月, 2020 1 次提交
-
-
由 mrambacher 提交于
Summary: This PR merges the functionality of making the ColumnFamilyOptions, TableFactory, and DBOptions into Configurable into a single PR, resolving any merge conflicts Pull Request resolved: https://github.com/facebook/rocksdb/pull/5753 Reviewed By: ajkr Differential Revision: D23385030 Pulled By: zhichao-cao fbshipit-source-id: 8b977a7731556230b9b8c5a081b98e49ee4f160a
-
- 23 7月, 2020 1 次提交
-
-
由 Jay Zhuang 提交于
Summary: Make `max-subcompactions` dynamically changeable by passing the `DBOption` to Compaction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7159 Reviewed By: siying Differential Revision: D22671238 Pulled By: jay-zhuang fbshipit-source-id: 311ca9f6bb606965544d8708616d358cfed5be42
-
- 16 7月, 2020 1 次提交
-
-
由 Zhichao Cao 提交于
Summary: In current codebase, in write path, if Retryable IO Error happens, SetBGError is called. The retryable IO Error is converted to hard error and DB is in read only mode. User or application needs to resume it. In this PR, if Retryable IO Error happens in one DB, SetBGError will create a new thread to call Resume (auto resume). otpions.max_bgerror_resume_count controls if auto resume is enabled or not (if max_bgerror_resume_count<=0, auto resume will not be enabled). options.bgerror_resume_retry_interval controls the time interval to call Resume again if the previous resume fails due to the Retryable IO Error. If non-retryable error happens during resume, auto resume will terminate. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6765 Test Plan: Added the unit test cases in error_handler_fs_test and pass make asan_check Reviewed By: anand1976 Differential Revision: D21916789 Pulled By: zhichao-cao fbshipit-source-id: acb8b5e5dc3167adfa9425a5b7fc104f6b95cb0b
-
- 08 6月, 2020 1 次提交
-
-
由 Yanqin Jin 提交于
Summary: In db_options.c, we should avoid including header files in the `db` directory to avoid introducing unnecessary dependency. The reason why `version_edit.h` has been included in `db_options.cc` is because we need two constants, `kUnknownChecksum` and `kUnknownChecksumFuncName`. We can put these two constants as `constexpr` in the public header `file_checksum.h`. Test plan (devserver): make check Pull Request resolved: https://github.com/facebook/rocksdb/pull/6952 Reviewed By: zhichao-cao Differential Revision: D21925341 Pulled By: riversand963 fbshipit-source-id: 2902f3b74c97f0cf16c58ad24c095c787c3a40e2
-
- 06 5月, 2020 1 次提交
-
-
由 mrambacher 提交于
Summary: Add methods and constructors for handling enums to the OptionTypeInfo. This change allows enums to be converted/compared without adding a special "type" to the OptionType. This change addresses a couple of issues: - It allows new enumerated types to be added to the options without editing the OptionType base class (and related methods) - It standardizes the procedure for adding enumerated types to the options, reducing potential mistakes - It moves the enum maps to the location where they are used, allowing them to be static file members rather than global values - It reduces the number of types and cases that need to be handled in the various OptionType methods Pull Request resolved: https://github.com/facebook/rocksdb/pull/6423 Reviewed By: siying Differential Revision: D21408713 Pulled By: zhichao-cao fbshipit-source-id: fc492af285d011822578b95d186a0fce25d35626
-
- 29 4月, 2020 1 次提交
-
-
由 mrambacher 提交于
Summary: Added functions for parsing, serializing, and comparing elements to OptionTypeInfo. These functions allow all of the special cases that could not be handled directly in the map of OptionTypeInfo to be moved into the map. Using these functions, every type can be handled via the map rather than special cased. By adding these functions, the code for handling options can become more standardized (fewer special cases) and (eventually) handled completely by common classes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6422 Test Plan: pass make check Reviewed By: siying Differential Revision: D21269005 Pulled By: zhichao-cao fbshipit-source-id: 9ba71c721a38ebf9ee88259d60bd81b3282b9077
-
- 21 4月, 2020 1 次提交
-
-
由 Akanksha Mahajan 提交于
Summary: 1. Add changes so that max_background_flushes can be set dynamically. 2. Add a testcase DBOptionsTest.SetBackgroundFlushThreads which set the max_background_flushes dynamically using SetDBOptions. TestPlan: 1. make -j64 check 2. Using new testcase DBOptionsTest.SetBackgroundFlushThreads Pull Request resolved: https://github.com/facebook/rocksdb/pull/6701 Reviewed By: ajkr Differential Revision: D21028010 Pulled By: akankshamahajan15 fbshipit-source-id: 5f949e4a8fd3c32537b637947b7ee09a69cfc7c1
-
- 04 4月, 2020 1 次提交
-
-
由 mrambacher 提交于
Summary: This is a predecessor to the Configurable PR. This change moves the OptionTypeInfo maps closer to where they will be used. When the Configurable changes are adopted, these values will become static and not associated with the OptionsHelper. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6198 Reviewed By: siying Differential Revision: D20778108 Pulled By: zhichao-cao fbshipit-source-id: a9f85fc73bc53503656e1958ecc1e764052fd1aa
-
- 30 3月, 2020 1 次提交
-
-
由 Zhichao Cao 提交于
Summary: In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6600 Test Plan: tested with make asan_check Reviewed By: riversand963 Differential Revision: D20717670 Pulled By: zhichao-cao fbshipit-source-id: 2a74c1c280ac11a07a1980185b43b671acaa71c6
-
- 24 3月, 2020 1 次提交
-
-
由 anand76 提交于
Summary: The current Env/FileSystem API separation has a couple of issues - 1. It requires the user to specify 2 options - ```Options::env``` and ```Options::file_system``` - which means they have to make code changes to benefit from the new APIs. Furthermore, there is a risk of accessing the same APIs in two different ways, through Env in the old way and through FileSystem in the new way. The two may not always match, for example, if env is ```PosixEnv``` and FileSystem is a custom implementation. Any stray RocksDB calls to env will use the ```PosixEnv``` implementation rather than the file_system implementation. 2. There needs to be a simple way for the FileSystem developer to instantiate an Env for backward compatibility purposes. This PR solves the above issues and simplifies the migration in the following ways - 1. Embed a shared_ptr to the ```FileSystem``` in the ```Env```, and remove ```Options::file_system``` as a configurable option. This way, no code changes will be required in application code to benefit from the new API. The default Env constructor uses a ```LegacyFileSystemWrapper``` as the embedded ```FileSystem```. 1a. - This also makes it more robust by ensuring that even if RocksDB has some stray calls to Env APIs rather than FileSystem, they will go through the same object and thus there is no risk of getting out of sync. 2. Provide a ```NewCompositeEnv()``` API that can be used to construct a PosixEnv with a custom FileSystem implementation. This eliminates an indirection to call Env APIs, and relieves the FileSystem developer of the burden of having to implement wrappers for the Env APIs. 3. Add a couple of missing FileSystem APIs - ```SanitizeEnvOptions()``` and ```NewLogger()``` Tests: 1. New unit tests 2. make check and make asan_check Pull Request resolved: https://github.com/facebook/rocksdb/pull/6552 Reviewed By: riversand963 Differential Revision: D20592038 Pulled By: anand1976 fbshipit-source-id: c3801ad4153f96d21d5a3ae26c92ba454d1bf1f7
-
- 21 3月, 2020 1 次提交
-
-
由 Yanqin Jin 提交于
Summary: There are situations when RocksDB tries to recover, but the db is in an inconsistent state due to SST files referenced in the MANIFEST being missing. In this case, previous RocksDB will just fail the recovery and return a non-ok status. This PR enables another possibility. During recovery, RocksDB checks possible MANIFEST files, and try to recover to the most recent state without missing table file. `VersionSet::Recover()` applies version edits incrementally and "materializes" a version only when this version does not reference any missing table file. After processing the entire MANIFEST, the version created last will be the latest version. `DBImpl::Recover()` calls `VersionSet::Recover()`. Afterwards, WAL replay will *not* be performed. To use this capability, set `options.best_efforts_recovery = true` when opening the db. Best-efforts recovery is currently incompatible with atomic flush. Test plan (on devserver): ``` $make check $COMPILE_WITH_ASAN=1 make all && make check ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6334 Reviewed By: anand1976 Differential Revision: D19778960 Pulled By: riversand963 fbshipit-source-id: c27ea80f29bc952e7d3311ecf5ee9c54393b40a8
-
- 21 2月, 2020 1 次提交
-
-
由 sdong 提交于
Summary: When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433 Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag. Differential Revision: D19977691 fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
-
- 11 2月, 2020 1 次提交
-
-
由 Zhichao Cao 提交于
Summary: In the current code base, RocksDB generate the checksum for each block and verify the checksum at usage. Current PR enable SST file checksum. After a SST file is generated by Flush or Compaction, RocksDB generate the SST file checksum and store the checksum value and checksum method name in the vs_info and MANIFEST as part for the FileMetadata. Added the enable_sst_file_checksum to Options to enable or disable file checksum. Added sst_file_checksum to Options such that user can plugin their own SST file checksum calculate method via overriding the SstFileChecksum class. The checksum information inlcuding uint32_t checksum value and a checksum name (string). A new tool is added to LDB such that user can dump out a list of file checksum information from MANIFEST. If user enables the file checksum but does not provide the sst_file_checksum instance, RocksDB will use the default crc32checksum implemented in table/sst_file_checksum_crc32c.h Pull Request resolved: https://github.com/facebook/rocksdb/pull/6216 Test Plan: Added the testing case in table_test and ldb_cmd_test to verify checksum is correct in different level. Pass make asan_check. Differential Revision: D19171461 Pulled By: zhichao-cao fbshipit-source-id: b2e53479eefc5bb0437189eaa1941670e5ba8b87
-
- 04 2月, 2020 1 次提交
-
-
由 Mike Kolupaev 提交于
Summary: When paranoid_checks is on, DBImpl::CheckConsistency() iterates over all sst files and calls Env::GetFileSize() for each of them. As far as I could understand, this is pretty arbitrary and doesn't affect correctness - if filesystem doesn't corrupt fsynced files, the file sizes will always match; if it does, it may as well corrupt contents as well as sizes, and rocksdb doesn't check contents on open. If there are thousands of sst files, getting all their sizes takes a while. If, on top of that, Env is overridden to use some remote storage instead of local filesystem, it can be *really* slow and overload the remote storage service. This PR adds an option to not do GetFileSize(); instead it does GetChildren() for parent directory to check that all the expected sst files are at least present, but doesn't check their sizes. We can't just disable paranoid_checks instead because paranoid_checks do a few other important things: make the DB read-only on write errors, print error messages on read errors, etc. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6353 Test Plan: ran the added sanity check unit test. Will try it out in a LogDevice test cluster where the GetFileSize() calls are causing a lot of trouble. Differential Revision: D19656425 Pulled By: al13n321 fbshipit-source-id: c2c421b367633033760d1f56747bad206d1fbf82
-
- 14 12月, 2019 1 次提交
-
-
由 anand76 提交于
Summary: The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc. This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO. The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before. This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection. The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761 Differential Revision: D18868376 Pulled By: anand1976 fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
-
- 12 9月, 2019 1 次提交
-
-
由 Ronak Sisodia 提交于
Summary: The max batch size that we can write to the WAL is controlled by a static manner. So if the leader write is less than 128 KB we will have the batch size as leader write size + 128 KB else the limit will be 1 MB. Both of them are statically defined. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5759 Differential Revision: D17329298 fbshipit-source-id: a3d910629d8d8ca84ea39ad89c2b2d284571ded5
-
- 03 9月, 2019 1 次提交
-
-
由 Vijay Nadimpalli 提交于
Summary: Each DB has a globally unique ID. A DB can be physically copied around, or backed-up and restored, and the users should be identify the same DB. This unique ID right now is stored as plain text in file IDENTITY under the DB directory. This approach introduces at least two problems: (1) the file is not checksumed; (2) the source of truth of a DB is the manifest file, which can be copied separately from IDENTITY file, causing the DB ID to be wrong. The goal of this PR is solve this problem by moving the DB ID to manifest. To begin with we will write to both identity file and manifest. Write to Manifest is controlled via the flag write_dbid_to_manifest in Options and default is false. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5725 Test Plan: Added unit tests. Differential Revision: D16963840 Pulled By: vjnadimpalli fbshipit-source-id: 8a86a4c8c82c716003c40fd6b9d2d758030d92e9
-
- 20 7月, 2019 1 次提交
-
-
由 Eli Pozniansky 提交于
Summary: Added log_readahead_size option to control prefetching for Log::Reader. This is mostly useful for reading a remotely located log, as it can save the number of round-trips when reading it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5592 Differential Revision: D16362989 Pulled By: elipoz fbshipit-source-id: c5d4d5245a44008cd59879640efff70c091ad3e8
-
- 18 6月, 2019 1 次提交
-
-
由 Zhongyi Xie 提交于
Summary: This PR continues the work in https://github.com/facebook/rocksdb/pull/4748 and https://github.com/facebook/rocksdb/pull/4535 by adding a new DBOption `persist_stats_to_disk` which instructs RocksDB to persist stats history to RocksDB itself. When statistics is enabled, and both options `stats_persist_period_sec` and `persist_stats_to_disk` are set, RocksDB will periodically write stats to a built-in column family in the following form: key -> (timestamp in microseconds)#(stats name), value -> stats value. The existing API `GetStatsHistory` will detect the current value of `persist_stats_to_disk` and either read from in-memory data structure or from the hidden column family on disk. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5046 Differential Revision: D15863138 Pulled By: miasantreble fbshipit-source-id: bb82abdb3f2ca581aa42531734ac799f113e931b
-
- 07 6月, 2019 1 次提交
-
-
由 Zhongyi Xie 提交于
Summary: When using `PRIu64` type of printf specifier, current code base does the following: ``` #ifndef __STDC_FORMAT_MACROS #define __STDC_FORMAT_MACROS #endif #include <inttypes.h> ``` However, this can be simplified to ``` #include <cinttypes> ``` as long as flag `-std=c++11` is used. This should solve issues like https://github.com/facebook/rocksdb/issues/5159 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5402 Differential Revision: D15701195 Pulled By: miasantreble fbshipit-source-id: 6dac0a05f52aadb55e9728038599d3d2e4b59d03
-
- 01 6月, 2019 1 次提交
-
-
由 Siying Dong 提交于
Summary: Many logging related source files are under util/. It will be more structured if they are together. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5387 Differential Revision: D15579036 Pulled By: siying fbshipit-source-id: 3850134ed50b8c0bb40a0c8ae1f184fa4081303f
-
- 14 5月, 2019 1 次提交
-
-
由 Maysam Yabandeh 提交于
Summary: Performing unordered writes in rocksdb when unordered_write option is set to true. When enabled the writes to memtable are done without joining any write thread. This offers much higher write throughput since the upcoming writes would not have to wait for the slowest memtable write to finish. The tradeoff is that the writes visible to a snapshot might change over time. If the application cannot tolerate that, it should implement its own mechanisms to work around that. Using TransactionDB with WRITE_PREPARED write policy is one way to achieve that. Doing so increases the max throughput by 2.2x without however compromising the snapshot guarantees. The patch is prepared based on an original by siying Existing unit tests are extended to include unordered_write option. Benchmark Results: ``` TEST_TMPDIR=/dev/shm/ ./db_bench_unordered --benchmarks=fillrandom --threads=32 --num=10000000 -max_write_buffer_number=16 --max_background_jobs=64 --batch_size=8 --writes=3000000 -level0_file_num_compaction_trigger=99999 --level0_slowdown_writes_trigger=99999 --level0_stop_writes_trigger=99999 -enable_pipelined_write=false -disable_auto_compactions --unordered_write=1 ``` With WAL - Vanilla RocksDB: 78.6 MB/s - WRITER_PREPARED with unordered_write: 177.8 MB/s (2.2x) - unordered_write: 368.9 MB/s (4.7x with relaxed snapshot guarantees) Without WAL - Vanilla RocksDB: 111.3 MB/s - WRITER_PREPARED with unordered_write: 259.3 MB/s MB/s (2.3x) - unordered_write: 645.6 MB/s (5.8x with relaxed snapshot guarantees) - WRITER_PREPARED with unordered_write disable concurrency control: 185.3 MB/s MB/s (2.35x) Limitations: - The feature is not yet extended to `max_successive_merges` > 0. The feature is also incompatible with `enable_pipelined_write` = true as well as with `allow_concurrent_memtable_write` = false. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5218 Differential Revision: D15219029 Pulled By: maysamyabandeh fbshipit-source-id: 38f2abc4af8780148c6128acdba2b3227bc81759
-
- 23 4月, 2019 1 次提交
-
-
由 Andrew Kryczka 提交于
Summary: The existing implementation does not guarantee bytes reach disk every `bytes_per_sync` when writing SST files, or every `wal_bytes_per_sync` when writing WALs. This can cause confusing behavior for users who enable this feature to avoid large syncs during flush and compaction, but then end up hitting them anyways. My understanding of the existing behavior is we used `sync_file_range` with `SYNC_FILE_RANGE_WRITE` to submit ranges for async writeback, such that we could continue processing the next range of bytes while that I/O is happening. I believe we can preserve that benefit while also limiting how far the processing can get ahead of the I/O, which prevents huge syncs from happening when the file finishes. Consider this `sync_file_range` usage: `sync_file_range(fd_, 0, static_cast<off_t>(offset + nbytes), SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE)`. Expanding the range to start at 0 and adding the `SYNC_FILE_RANGE_WAIT_BEFORE` flag causes any pending writeback (like from a previous call to `sync_file_range`) to finish before it proceeds to submit the latest `nbytes` for writeback. The latest `nbytes` are still written back asynchronously, unless processing exceeds I/O speed, in which case the following `sync_file_range` will need to wait on it. There is a second change in this PR to use `fdatasync` when `sync_file_range` is unavailable (determined statically) or has some known problem with the underlying filesystem (determined dynamically). The above two changes only apply when the user enables a new option, `strict_bytes_per_sync`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5183 Differential Revision: D14953553 Pulled By: siying fbshipit-source-id: 445c3862e019fb7b470f9c7f314fc231b62706e9
-
- 05 4月, 2019 1 次提交
-
-
由 Adam Simpkins 提交于
Summary: Annotate all of the logging functions to inform the compiler that these use printf-style formatting arguments. This allows the compiler to emit warnings if the format arguments are incorrect. This also fixes many problems reported now that format string checking is enabled. Many of these are simply mix-ups in the argument type (e.g, int vs uint64_t), but in several cases the wrong number of arguments were being passed in which can cause the code to crash. The primary motivation for this was to fix the log message in `DBImpl::SwitchMemtable()` which caused a segfault due to an extra %s format parameter with no argument supplied. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5089 Differential Revision: D14574795 Pulled By: simpkins fbshipit-source-id: 0921b03f0743652bf4ae21e414ff54b3bb65422a
-
- 02 4月, 2019 1 次提交
-
-
由 Mike Kolupaev 提交于
Summary: Just like ReadOptions::background_purge_on_iterator_cleanup but for ColumnFamilyHandle instead of Iterator. In our use case we sometimes call ColumnFamilyHandle's destructor from low-latency threads, and sometimes it blocks the thread for a few seconds deleting the files. To avoid that, we can either offload ColumnFamilyHandle's destruction to a background thread on our side, or add this option on rocksdb side. This PR does the latter, to be consistent with how we solve exactly the same problem for iterators using background_purge_on_iterator_cleanup option. (EDIT: It's avoid_unnecessary_blocking_io now, and affects both CF drops and iterator destructors.) I'm not quite comfortable with having two separate options (background_purge_on_iterator_cleanup and background_purge_on_cf_cleanup) for such a rarely used thing. Maybe we should merge them? Rename background_purge_on_cf_cleanup to something like delete_files_on_background_threads_only or avoid_blocking_io_in_unexpected_places, and make iterators use it instead of the one in ReadOptions? I can do that here if you guys think it's better. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5043 Differential Revision: D14339233 Pulled By: al13n321 fbshipit-source-id: ccf7efa11c85c9a5b91d969bb55627d0fb01e7b8
-