- 20 9月, 2020 1 次提交
-
-
由 Zhichao Cao 提交于
Summary: Fix the flaky test failure in error_handler_fs_test. Add the sync point, solve the dependency. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7415 Test Plan: make asan_check, ~/gtest-parallel/gtest-parallel -r 100 ./error_handler_fs_test Reviewed By: siying Differential Revision: D23804330 Pulled By: zhichao-cao fbshipit-source-id: 5175108651f7652e47e15978f2a9c1669ef59d80
-
- 17 9月, 2020 1 次提交
-
-
由 mrambacher 提交于
Summary: db_options_test options_file_test auto_roll_logger_test options_util_test persistent_cache_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/7367 Reviewed By: jay-zhuang Differential Revision: D23712520 Pulled By: zhichao-cao fbshipit-source-id: 99b331e357f5d6a6aabee89d1bd933002cbb3908
-
- 15 9月, 2020 1 次提交
-
-
由 Levi Tamasi 提交于
Summary: The patch adds support for writing blob files during flush by integrating `BlobFileBuilder` with the flush logic, most importantly, `BuildTable` and `CompactionIterator`. If `enable_blob_files` is set, large values are extracted to blob files and replaced with references. The resulting blob files are then logged to the MANIFEST as part of the flush job's `VersionEdit` and added to the `Version`, similarly to table files. Errors related to writing blob files fail the flush, and any blob files written by such jobs are immediately deleted (again, similarly to how SST files are handled). In addition, the patch extends the logging and statistics around flushes to account for the presence of blob files (e.g. `InternalStats::CompactionStats::bytes_written`, which is used for calculating write amplification, now considers the blob files as well). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7345 Test Plan: Tested using `make check` and `db_bench`. Reviewed By: riversand963 Differential Revision: D23506369 Pulled By: ltamasi fbshipit-source-id: 646885f22dfbe063f650d38a1fedc132f499a159
-
- 09 9月, 2020 1 次提交
-
-
由 Akanksha Mahajan 提交于
Summary: Replace FSWritableFile pointer with FSWritableFilePtr object in WritableFileWriter. This new object wraps FSWritableFile pointer. Objective: If tracing is enabled, FSWritableFile Ptr returns FSWritableFileTracingWrapper pointer that includes all necessary information in IORecord and calls underlying FileSystem and invokes IOTracer to dump that record in a binary file. If tracing is disabled then, underlying FileSystem pointer is returned directly. FSWritableFilePtr wrapper class is added to bypass the FSWritableFileWrapper when tracing is disabled. Test Plan: make check -j64 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7193 Reviewed By: anand1976 Differential Revision: D23355915 Pulled By: akankshamahajan15 fbshipit-source-id: e62a27a13c1fd77e36a6dbafc7006d969bed25cf
-
- 26 8月, 2020 1 次提交
-
-
由 Zhichao Cao 提交于
Summary: When SST file is created, application is able to know the file information through OnTableFileCreated callback in LogAndNotifyTableFileCreationFinished. Since file checksum information can be useful for application when the SST file is created, we add file_checksum and file_checksum_func_name information to TableFileCreationInfo, which will be passed through OnTableFileCreated. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7108 Test Plan: make check, listener_test. Reviewed By: ajkr Differential Revision: D22470240 Pulled By: zhichao-cao fbshipit-source-id: 92c20344d9b986eadfe3480f3769bf4add0dbaae
-
- 07 8月, 2020 1 次提交
-
-
由 jsteemann 提交于
Summary: Rename "paraniod" to "paranoid" in a few places. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7163 Reviewed By: ajkr Differential Revision: D22678242 fbshipit-source-id: 28b1011a736d0a95612676f7e1b9500a70c324b4
-
- 04 8月, 2020 1 次提交
-
-
由 Andrew Kryczka 提交于
Summary: Previously, a `ReadOptions` object was stored in every `BlockBasedTableIterator` and every `LevelIterator`. This redundancy consumes extra memory, resulting in the `Arena` making more allocations, and iteration observing worse cache performance. This PR migrates callers of `NewInternalIterator()` and `MakeInputIterator()` to provide a `ReadOptions` object guaranteed to outlive the returned iterator. When the iterator's lifetime will be managed by the user, this lifetime guarantee is achieved by storing the `ReadOptions` value in `ArenaWrappedDBIter`. Then, sub-iterators of `NewInternalIterator()` and `MakeInputIterator()` can hold a reference-to-const `ReadOptions`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7210 Test Plan: - `make check` under ASAN and valgrind - benchmark: on a DB with 2 L0 files and 3 L1+ levels, this PR reduced `Arena` allocation 4792 -> 4160 bytes. Reviewed By: anand1976 Differential Revision: D22861323 Pulled By: ajkr fbshipit-source-id: 54aebb3e89c872eeab0f5793b4b6e42878d093ce
-
- 23 7月, 2020 1 次提交
-
-
由 mrambacher 提交于
Summary: When paraoid_files_checks=true, a rolling key-value hash is generated and compared to what is written to the file. If the values do not match, the SST file is rejected. Code put in place for the check for both flush and compaction jobs. Corresponding test added to corruption_test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7134 Reviewed By: cheng-chang Differential Revision: D22646149 fbshipit-source-id: 8fde1984a1a11edd3bd82a413acffc5ea7aa683f
-
- 16 7月, 2020 1 次提交
-
-
由 Zhichao Cao 提交于
Summary: In current codebase, in write path, if Retryable IO Error happens, SetBGError is called. The retryable IO Error is converted to hard error and DB is in read only mode. User or application needs to resume it. In this PR, if Retryable IO Error happens in one DB, SetBGError will create a new thread to call Resume (auto resume). otpions.max_bgerror_resume_count controls if auto resume is enabled or not (if max_bgerror_resume_count<=0, auto resume will not be enabled). options.bgerror_resume_retry_interval controls the time interval to call Resume again if the previous resume fails due to the Retryable IO Error. If non-retryable error happens during resume, auto resume will terminate. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6765 Test Plan: Added the unit test cases in error_handler_fs_test and pass make asan_check Reviewed By: anand1976 Differential Revision: D21916789 Pulled By: zhichao-cao fbshipit-source-id: acb8b5e5dc3167adfa9425a5b7fc104f6b95cb0b
-
- 18 6月, 2020 1 次提交
-
-
由 Zitan Chen 提交于
Summary: `db_id` and `db_session_id` are now part of the table properties for all formats and stored in SST files. This adds about 99 bytes to each new SST file. The `TablePropertiesNames` for these two identifiers are `rocksdb.creating.db.identity` and `rocksdb.creating.session.identity`. In addition, SST files generated from SstFileWriter and Repairer have DB identity “SST Writer” and “DB Repairer”, respectively. Their DB session IDs are generated in the same way as `DB::GetDbSessionId`. A table property test is added. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6983 Test Plan: make check and some manual tests. Reviewed By: zhichao-cao Differential Revision: D22048826 Pulled By: gg814 fbshipit-source-id: afdf8c11424a6f509b5c0b06dafad584a80103c9
-
- 10 6月, 2020 1 次提交
-
-
由 Andrew Kryczka 提交于
Summary: Memory pinned by `pin_l0_filter_and_index_blocks_in_cache` needs to be predictable based on user config. This PR makes sure we do not pin extra memory for large files generated by intra-L0 (see https://github.com/facebook/rocksdb/issues/6889). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6911 Test Plan: unit test Reviewed By: siying Differential Revision: D21835818 Pulled By: ajkr fbshipit-source-id: a11a088549d06bed8aacc2548d266e5983f0ead4
-
- 16 4月, 2020 1 次提交
-
-
由 Mike Kolupaev 提交于
Summary: Context: Index type `kBinarySearchWithFirstKey` added the ability for sst file iterator to sometimes report a key from index without reading the corresponding data block. This is useful when sst blocks are cut at some meaningful boundaries (e.g. one block per key prefix), and many seeks land between blocks (e.g. for each prefix, the ranges of keys in different sst files are nearly disjoint, so a typical seek needs to read a data block from only one file even if all files have the prefix). But this added a new error condition, which rocksdb code was really not equipped to deal with: `InternalIterator::value()` may fail with an IO error or Status::Incomplete, but it's just a method returning a Slice, with no way to report error instead. Before this PR, this type of error wasn't handled at all (an empty slice was returned), and kBinarySearchWithFirstKey implementation was considered a prototype. Now that we (LogDevice) have experimented with kBinarySearchWithFirstKey for a while and confirmed that it's really useful, this PR is adding the missing error handling. It's a pretty inconvenient situation implementation-wise. The error needs to be reported from InternalIterator when trying to access value. But there are ~700 call sites of `InternalIterator::value()`, most of which either can't hit the error condition (because the iterator is reading from memtable or from index or something) or wouldn't benefit from the deferred loading of the value (e.g. compaction iterator that reads all values anyway). Adding error handling to all these call sites would needlessly bloat the code. So instead I made the deferred value loading optional: only the call sites that may use deferred loading have to call the new method `PrepareValue()` before calling `value()`. The feature is enabled with a new bool argument `allow_unprepared_value` to a bunch of methods that create iterators (it wouldn't make sense to put it in ReadOptions because it's completely internal to iterators, with virtually no user-visible effect). Lmk if you have better ideas. Note that the deferred value loading only happens for *internal* iterators. The user-visible iterator (DBIter) always prepares the value before returning from Seek/Next/etc. We could go further and add an API to defer that value loading too, but that's most likely not useful for LogDevice, so it doesn't seem worth the complexity for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6621 Test Plan: make -j5 check . Will also deploy to some logdevice test clusters and look at stats. Reviewed By: siying Differential Revision: D20786930 Pulled By: al13n321 fbshipit-source-id: 6da77d918bad3780522e918f17f4d5513d3e99ee
-
- 03 4月, 2020 1 次提交
-
-
由 sdong 提交于
Summary: With https://github.com/facebook/rocksdb/issues/6262, TSAN complains about data race of some variables. Those variables are used to estimate file size and are accessed in writer and background threads. Since file size estimation doesn't have to be 100% accurate, we make some variables atomic and use relaxed memory order. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6636 Test Plan: Run all tests with TSAN. Reviewed By: anand1976 Differential Revision: D20820635 fbshipit-source-id: 1ea45ff38be15e33674ffe06b7d42fc9fe161ea5
-
- 30 3月, 2020 1 次提交
-
-
由 Zhichao Cao 提交于
Summary: In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6600 Test Plan: tested with make asan_check Reviewed By: riversand963 Differential Revision: D20717670 Pulled By: zhichao-cao fbshipit-source-id: 2a74c1c280ac11a07a1980185b43b671acaa71c6
-
- 28 3月, 2020 1 次提交
-
-
由 Zhichao Cao 提交于
Summary: In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status. The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487 Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check Reviewed By: anand1976 Differential Revision: D20685017 Pulled By: zhichao-cao fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
-
- 21 2月, 2020 1 次提交
-
-
由 sdong 提交于
Summary: When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433 Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag. Differential Revision: D19977691 fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
-
- 11 2月, 2020 1 次提交
-
-
由 Zhichao Cao 提交于
Summary: In the current code base, RocksDB generate the checksum for each block and verify the checksum at usage. Current PR enable SST file checksum. After a SST file is generated by Flush or Compaction, RocksDB generate the SST file checksum and store the checksum value and checksum method name in the vs_info and MANIFEST as part for the FileMetadata. Added the enable_sst_file_checksum to Options to enable or disable file checksum. Added sst_file_checksum to Options such that user can plugin their own SST file checksum calculate method via overriding the SstFileChecksum class. The checksum information inlcuding uint32_t checksum value and a checksum name (string). A new tool is added to LDB such that user can dump out a list of file checksum information from MANIFEST. If user enables the file checksum but does not provide the sst_file_checksum instance, RocksDB will use the default crc32checksum implemented in table/sst_file_checksum_crc32c.h Pull Request resolved: https://github.com/facebook/rocksdb/pull/6216 Test Plan: Added the testing case in table_test and ldb_cmd_test to verify checksum is correct in different level. Pass make asan_check. Differential Revision: D19171461 Pulled By: zhichao-cao fbshipit-source-id: b2e53479eefc5bb0437189eaa1941670e5ba8b87
-
- 14 12月, 2019 1 次提交
-
-
由 anand76 提交于
Summary: The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc. This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO. The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before. This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection. The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761 Differential Revision: D18868376 Pulled By: anand1976 fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
-
- 15 10月, 2019 2 次提交
-
-
由 Levi Tamasi 提交于
Summary: This is groundwork for adding garbage collection support to BlobDB. The patch adds logic that keeps track of the oldest blob file referred to by each SST file. The oldest blob file is identified during flush/ compaction (similarly to how the range of keys covered by the SST is identified), and persisted in the manifest as a custom field of the new file edit record. Blob indexes with TTL are ignored for the purposes of identifying the oldest blob file (since such blob files are cleaned up by the TTL logic in BlobDB). Pull Request resolved: https://github.com/facebook/rocksdb/pull/5903 Test Plan: Added new unit tests; also ran db_bench in BlobDB mode, inspected the manifest using ldb, and confirmed (by scanning the SST files using sst_dump) that the value of the oldest blob file number field matches the contents of the file for each SST. Differential Revision: D17859997 Pulled By: ltamasi fbshipit-source-id: 21662c137c6259a6af70446faaf3a9912c550e90
-
由 Yanqin Jin 提交于
Summary: Compaction can call OnTableFileCreationCompleted(). If file is empty, "(nil)" is used as the file name. Do the same for flush. Test plan (dev server): ``` make all make check ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5905 Differential Revision: D17883285 Pulled By: riversand963 fbshipit-source-id: 6565884adbb00e8023d88b17dfb3b6eb92220b59
-
- 17 9月, 2019 1 次提交
-
-
由 sdong 提交于
Summary: file_reader_writer.h and .cc contain several files and helper function, and it's hard to navigate. Separate it to multiple files and put them under file/ Pull Request resolved: https://github.com/facebook/rocksdb/pull/5803 Test Plan: Build whole project using make and cmake. Differential Revision: D17374550 fbshipit-source-id: 10efca907721e7a78ed25bbf74dc5410dea05987
-
- 21 6月, 2019 1 次提交
-
-
由 haoyuhuang 提交于
Summary: This PR adds more callers for table readers. These information are only used for block cache analysis so that we can know which caller accesses a block. 1. It renames the BlockCacheLookupCaller to TableReaderCaller as passing the caller from upstream requires changes to table_reader.h and TableReaderCaller is a more appropriate name. 2. It adds more table reader callers in table/table_reader_caller.h, e.g., kCompactionRefill, kExternalSSTIngestion, and kBuildTable. This PR is long as it requires modification of interfaces in table_reader.h, e.g., NewIterator. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5454 Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32. Differential Revision: D15819451 Pulled By: HaoyuHuang fbshipit-source-id: b6caa704c8fb96ddd15b9a934b7e7ea87f88092d
-
- 01 6月, 2019 1 次提交
-
-
由 Vijay Nadimpalli 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5390 Differential Revision: D15579388 Pulled By: vjnadimpalli fbshipit-source-id: 5bfc95e31554b8ff05b97b76d6534113f527f366
-
- 31 5月, 2019 3 次提交
-
-
由 Siying Dong 提交于
Summary: Move arena, allocator, and memory tools under util to a separate memory/ directory. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5382 Differential Revision: D15564655 Pulled By: siying fbshipit-source-id: 9cd6b5d0d3d52b39606e19221fa154596e5852a5
-
由 Vijay Nadimpalli 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5373 Differential Revision: D15559425 Pulled By: vjnadimpalli fbshipit-source-id: 5d6d6d615582bedd96a4b879bb25d429a6de8b55
-
由 Siying Dong 提交于
Summary: There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo ve them to a new directory test_util/, so that util/ is cleaner. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377 Differential Revision: D15551366 Pulled By: siying fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
-
- 30 5月, 2019 1 次提交
-
-
由 Siying Dong 提交于
Summary: util/ means for lower level libraries, so it's a good idea to move the files which requires knowledge to DB out. Create a file/ and move some files there. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5375 Differential Revision: D15550935 Pulled By: siying fbshipit-source-id: 61a9715dcde5386eebfb43e93f847bba1ae0d3f2
-
- 11 4月, 2019 1 次提交
-
-
由 Sagar Vemuri 提交于
Summary: Introducing Periodic Compactions. This feature allows all the files in a CF to be periodically compacted. It could help in catching any corruptions that could creep into the DB proactively as every file is constantly getting re-compacted. And also, of course, it helps to cleanup data older than certain threshold. - Introduced a new option `periodic_compaction_time` to control how long a file can live without being compacted in a CF. - This works across all levels. - The files are put in the same level after going through the compaction. (Related files in the same level are picked up as `ExpandInputstoCleanCut` is used). - Compaction filters, if any, are invoked as usual. - A new table property, `file_creation_time`, is introduced to implement this feature. This property is set to the time at which the SST file was created (and that time is given by the underlying Env/OS). This feature can be enabled on its own, or in conjunction with `ttl`. It is possible to set a different time threshold for the bottom level when used in conjunction with ttl. Since `ttl` works only on 0 to last but one levels, you could set `ttl` to, say, 1 day, and `periodic_compaction_time` to, say, 7 days. Since `ttl < periodic_compaction_time` all files in last but one levels keep getting picked up based on ttl, and almost never based on periodic_compaction_time. The files in the bottom level get picked up for compaction based on `periodic_compaction_time`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5166 Differential Revision: D14884441 Pulled By: sagar0 fbshipit-source-id: 408426cbacb409c06386a98632dcf90bfa1bda47
-
- 19 3月, 2019 1 次提交
-
-
由 Shobhit Dayal 提交于
Summary: This is a feature to sample data-block compressibility and and report them as stats. 1 in N (tunable) blocks is sampled for compressibility using two algorithms: 1. lz4 or snappy for fast compression 2. zstd or zlib for slow but higher compression. The stats are reported to the caller as raw-bytes and compressed-bytes. The block continues to be compressed for storage using the specified CompressionType. The db_bench_tool how has a command line option for specifying the sampling rate. It's default value is 0 (no sampling). To test the overhead for a certain value, users can compare the performance of db_bench_tool, varying the sampling rate. It is unlikely to have a noticeable impact for high values like 20. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4842 Differential Revision: D13629011 Pulled By: shobhitdayal fbshipit-source-id: 14ca668bcab6499b2a1734edf848eb62a4f4fafa
-
- 15 2月, 2019 1 次提交
-
-
由 Andrew Kryczka 提交于
Summary: If `CompressionOptions::max_dict_bytes` and/or `CompressionOptions::zstd_max_train_bytes` are set, `SstFileWriter` will now generate files respecting those options. I refactored the logic a bit for deciding when to use dictionary compression. Previously we plumbed `is_bottommost_level` down to the table builder and used that. However it was kind of confusing in `SstFileWriter`'s context since we don't know what level the file will be ingested to. Instead, now the higher-level callers (e.g., flush, compaction, file writer) are responsible for building the right `CompressionOptions` to give the table builder. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4978 Differential Revision: D14060763 Pulled By: ajkr fbshipit-source-id: dc802c327896df2b319dc162d6acc82b9cdb452a
-
- 12 2月, 2019 1 次提交
-
-
由 Andrew Kryczka 提交于
Summary: Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio. So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include: - The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called. - After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up. - Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952 Differential Revision: D13967980 Pulled By: ajkr fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f
-
- 30 1月, 2019 1 次提交
-
-
由 Alexander Zinoviev 提交于
Summary: Measure CPU time consumed for a compaction and report it in the stats report Enable NowCPUNanos() to work for MacOS Pull Request resolved: https://github.com/facebook/rocksdb/pull/4889 Differential Revision: D13701276 Pulled By: zinoale fbshipit-source-id: 5024e5bbccd4dd10fd90d947870237f436445055
-
- 18 12月, 2018 2 次提交
-
-
由 Abhishek Madan 提交于
Summary: Now that v2 is fully functional, the v1 aggregator is removed. The v2 aggregator has been renamed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4778 Differential Revision: D13495930 Pulled By: abhimadan fbshipit-source-id: 9d69500a60a283e79b6c4fa938fc68a8aa4d40d6
-
由 Abhishek Madan 提交于
Summary: RangeDelAggregatorV2 now supports ShouldDelete calls on snapshot stripes and creation of range tombstone compaction iterators. RangeDelAggregator is no longer used on any non-test code path, and will be removed in a future commit. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4758 Differential Revision: D13439254 Pulled By: abhimadan fbshipit-source-id: fe105bcf8e3d4a2df37a622d5510843cd71b0401
-
- 10 11月, 2018 1 次提交
-
-
由 Sagar Vemuri 提交于
Summary: Ran the following commands to recursively change all the files under RocksDB: ``` find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} + find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} + find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} + find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} + ``` Running `make format` updated some formatting on the files touched. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638 Differential Revision: D12934992 Pulled By: sagar0 fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
-
- 13 10月, 2018 1 次提交
-
-
由 Yanqin Jin 提交于
Summary: We would like to collect file-system-level statistics including file name, offset, length, return code, latency, etc., which requires to add callbacks to intercept file IO function calls when RocksDB is running. To collect file-system-level statistics, users can inherit the class `EventListener`, as in `TestFileOperationListener `. Note that `TestFileOperationListener::ShouldBeNotifiedOnFileIO()` returns true. Pull Request resolved: https://github.com/facebook/rocksdb/pull/3933 Differential Revision: D10219571 Pulled By: riversand963 fbshipit-source-id: 7acc577a2d31097766a27adb6f78eaf8b1e8ff15
-
- 24 8月, 2018 1 次提交
-
-
由 Yanqin Jin 提交于
Summary: We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039 Differential Revision: D8670178 Pulled By: riversand963 fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a
-
- 12 8月, 2018 1 次提交
-
-
由 Anand Ananthabhotla 提交于
Summary: Revert this change. Not generating the OnTableFileCreated() notification for a 0 byte SST on flush breaks the assumption that every OnTableFileCreationStarted() notification is followed by a corresponding OnTableFileCreated(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4263 Differential Revision: D9285623 Pulled By: anand1976 fbshipit-source-id: 808c3dcd498b4b4f4ed4be947a29a24b2296aa8d
-
- 14 7月, 2018 1 次提交
-
-
由 Peter Mattis 提交于
Summary: Do not consider the range tombstone sentinel key as causing 2 adjacent sstables in a level to overlap. When a range tombstone's end key is the largest key in an sstable, the sstable's end key is so to a "sentinel" value that is the smallest key in the next sstable with a sequence number of kMaxSequenceNumber. This "sentinel" is guaranteed to not overlap in internal-key space with the next sstable. Unfortunately, GetOverlappingFiles uses user-keys to determine overlap and was thus considering 2 adjacent sstables in a level to overlap if they were separated by this sentinel key. This in turn would cause compactions to be larger than necessary. Note that this conflicts with https://github.com/facebook/rocksdb/pull/2769 and cases `DBRangeDelTest.CompactionTreatsSplitInputLevelDeletionAtomically` to fail. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4050 Differential Revision: D8844423 Pulled By: ajkr fbshipit-source-id: df3f9f1db8f4cff2bff77376b98b83c2ae1d155b
-
- 13 7月, 2018 1 次提交
-
-
由 Nikhil Benesch 提交于
Summary: This fixes the same performance issue that #3992 fixes but with much more invasive cleanup. I'm more excited about this PR because it paves the way for fixing another problem we uncovered at Cockroach where range deletion tombstones can cause massive compactions. For example, suppose L4 contains deletions from [a, c) and [x, z) and no other keys, and L5 is entirely empty. L6, however, is full of data. When compacting L4 -> L5, we'll end up with one file that spans, massively, from [a, z). When we go to compact L5 -> L6, we'll have to rewrite all of L6! If, instead of range deletions in L4, we had keys a, b, x, y, and z, RocksDB would have been smart enough to create two files in L5: one for a and b and another for x, y, and z. With the changes in this PR, it will be possible to adjust the compaction logic to split tombstones/start new output files when they would span too many files in the grandparent level. ajkr please take a look when you have a minute! Pull Request resolved: https://github.com/facebook/rocksdb/pull/4014 Differential Revision: D8773253 Pulled By: ajkr fbshipit-source-id: ec62fa85f648fdebe1380b83ed997f9baec35677
-