- 13 4月, 2019 7 次提交
-
-
由 anand76 提交于
Summary: Add bounds check when looping through empty levels in FilePickerMultiGet Pull Request resolved: https://github.com/facebook/rocksdb/pull/5189 Differential Revision: D14925334 Pulled By: anand1976 fbshipit-source-id: 65d53247cf443153e28ce2b8b753fa51c6ae4566
-
由 yiwu-arbug 提交于
Summary: Before using prefix extractor `InDomain()` should be check. All uses in memtable.cc didn't check `InDomain()`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5190 Differential Revision: D14923773 Pulled By: miasantreble fbshipit-source-id: b3ad60bcca5f3a1a2b929a6eb34b0b7ba6326f04
-
由 Manuel Ung 提交于
Summary: In `PessimisticTransaction::TryLock`, we were calling `TrackKey` even when assume_tracked=true, which defeats the purpose of assume_tracked. Remove this. For keys that are already tracked, TrackKey will actually bump some counters (num_reads/num_writes) which are consumed in `TransactionBaseImpl::GetTrackedKeysSinceSavePoint`, and this is used to determine which keys were tracked since the last savepoint. I believe this functionality should still work, since I think the user should not call GetForUpdate/Put(assume_tracked=true) across savepoints, and if they do, they should not expect the Put(assume_tracked=true) to show up as a tracked key in the second savepoint. This is another 2-3% cpu improvement. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5173 Differential Revision: D14883809 Pulled By: lth fbshipit-source-id: 7d09f0772da422384af0519773e310c22b0cbca3
-
由 Maysam Yabandeh 提交于
Summary: When ReadOption doesn't specify a snapshot, WritePrepared::Get used kMaxSequenceNumber to avoid the cost of creating a new snapshot object (that requires sync over db_mutex). This creates a race condition if it is reading from the writes of a transaction that had duplicate keys: each instance of duplicate key is inserted with a different sequence number and depending on the ordering the ::Get might skip the newer one and read the older one that is obsolete. The patch fixes that by using last published seq as the snapshot sequence number. It also adds a check after the read is done to ensure that the max_evicted_seq has not advanced the aforementioned seq, which is a very unlikely event. If it did, then the read is not valid since the seq is not backed by an actually snapshot to let IsInSnapshot handle that properly when an overlapping commit is evicted from commit cache. A unit test is added to reproduce the race condition with duplicate keys. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5147 Differential Revision: D14758815 Pulled By: maysamyabandeh fbshipit-source-id: a56915657132cf6ba5e3f5ea1b5d78c803407719
-
由 ableegoldman 提交于
Summary: I would like to be able to read out the current Filter that has been set (or not) for a BlockBasedTableConfig. Added one public method to BlockBasedTableConfig: public Filter filterPolicy() { return filterPolicy; } Pull Request resolved: https://github.com/facebook/rocksdb/pull/5186 Differential Revision: D14921415 Pulled By: siying fbshipit-source-id: 2a63c8685480197862b49fc48916c757cd6daf95
-
由 Siying Dong 提交于
Summary: Since Statistics::measureTime() is deprecated, StatisticsImpl::measureTime() is not implemented. We realized that users might have a wrapped Statistics implementation in which measureTime() is implemented as forwarded to StatisticsImpl, and causes assert failure. In order to make the change less intrusive, we implement StatisticsImpl::measureTime(). We will revisit whether we need to remove it after several releases. Also, add a test to make sure that a Statistics implementation using the old interface still works. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5181 Differential Revision: D14907089 Pulled By: siying fbshipit-source-id: 29b6202fd04e30ed6f6adcaeb1000e87f10d1e1a
-
由 Yanqin Jin 提交于
Summary: as titled. False positive included, fixed anyway to make the check pass. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5185 Differential Revision: D14909384 Pulled By: riversand963 fbshipit-source-id: dc5177e72b1929ccfd6175a60e2cd7bdb9bd80f3
-
- 12 4月, 2019 3 次提交
-
-
由 vijaynadimpalli 提交于
Summary: When a new SST file is created via flush or compaction, we dump out the table properties, however only a few table properties are logged. The change here is to log all the table properties Pull Request resolved: https://github.com/facebook/rocksdb/pull/5168 Differential Revision: D14876928 Pulled By: vjnadimpalli fbshipit-source-id: 1aca42ad00f9f650761d39e187f8beeb8700149b
-
由 anand76 提交于
Summary: This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching. Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to - 1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch() 2. Bloom filter cachelines can be prefetched, hiding the cache miss latency The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress. Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32). Batch Sizes 1 | 2 | 4 | 8 | 16 | 32 Random pattern (Stride length 0) 4.158 | 4.109 | 4.026 | 4.05 | 4.1 | 4.074 - Get 4.438 | 4.302 | 4.165 | 4.122 | 4.096 | 4.075 - MultiGet (no batching) 4.461 | 4.256 | 4.277 | 4.11 | 4.182 | 4.14 - MultiGet (w/ batching) Good locality (Stride length 16) 4.048 | 3.659 | 3.248 | 2.99 | 2.84 | 2.753 4.429 | 3.728 | 3.406 | 3.053 | 2.911 | 2.781 4.452 | 3.45 | 2.833 | 2.451 | 2.233 | 2.135 Good locality (Stride length 256) 4.066 | 3.786 | 3.581 | 3.447 | 3.415 | 3.232 4.406 | 4.005 | 3.644 | 3.49 | 3.381 | 3.268 4.393 | 3.649 | 3.186 | 2.882 | 2.676 | 2.62 Medium locality (Stride length 4096) 4.012 | 3.922 | 3.768 | 3.61 | 3.582 | 3.555 4.364 | 4.057 | 3.791 | 3.65 | 3.57 | 3.465 4.479 | 3.758 | 3.316 | 3.077 | 2.959 | 2.891 dbbench command used (on a DB with 4 levels, 12 million keys)- TEST_TMPDIR=/dev/shm numactl -C 10 ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011 Differential Revision: D14348703 Pulled By: anand1976 fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b
-
由 Siying Dong 提交于
Summary: Change the behavior of OptimizeForSmallDb() so that it is less likely to go out of memory. Change the behavior of OptimizeForPointLookup() to take advantage of the new memtable whole key filter, and move away from prefix extractor as well as hash-based indexing, as they are prone to misuse. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5165 Differential Revision: D14880709 Pulled By: siying fbshipit-source-id: 9af30e3c9e151eceea6d6b38701a58f1f9fb692d
-
- 11 4月, 2019 2 次提交
-
-
由 Sagar Vemuri 提交于
Summary: Introducing Periodic Compactions. This feature allows all the files in a CF to be periodically compacted. It could help in catching any corruptions that could creep into the DB proactively as every file is constantly getting re-compacted. And also, of course, it helps to cleanup data older than certain threshold. - Introduced a new option `periodic_compaction_time` to control how long a file can live without being compacted in a CF. - This works across all levels. - The files are put in the same level after going through the compaction. (Related files in the same level are picked up as `ExpandInputstoCleanCut` is used). - Compaction filters, if any, are invoked as usual. - A new table property, `file_creation_time`, is introduced to implement this feature. This property is set to the time at which the SST file was created (and that time is given by the underlying Env/OS). This feature can be enabled on its own, or in conjunction with `ttl`. It is possible to set a different time threshold for the bottom level when used in conjunction with ttl. Since `ttl` works only on 0 to last but one levels, you could set `ttl` to, say, 1 day, and `periodic_compaction_time` to, say, 7 days. Since `ttl < periodic_compaction_time` all files in last but one levels keep getting picked up based on ttl, and almost never based on periodic_compaction_time. The files in the bottom level get picked up for compaction based on `periodic_compaction_time`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5166 Differential Revision: D14884441 Pulled By: sagar0 fbshipit-source-id: 408426cbacb409c06386a98632dcf90bfa1bda47
-
由 Manuel Ung 提交于
Summary: The LockInfo struct is not easy to copy because it contains std::vector. Reduce copies by using move constructor and `unordered_map::emplace`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5172 Differential Revision: D14882053 Pulled By: lth fbshipit-source-id: 93999ec6ab1a5841fb5115abb764b6c1831a6de1
-
- 09 4月, 2019 3 次提交
-
-
由 jsteemann 提交于
Summary: This fix should help reading from encrypted files if the file-to-be-read is smaller than expected. For example, when using the encrypted env and making it read a journal file of exactly 0 bytes size, the encrypted env code crashes with SIGSEGV in its Decrypt function, as there is no check if the read attempts to read over the file's boundaries (as specified originally by the `dataSize` parameter). The most important problem this patch addresses is however that there is no size underlow check in `CTREncryptionProvider::CreateCipherStream`: The stream to be read will be initialized to a size of always `prefix.size() - (2 * blockSize)`. If the prefix however is smaller than twice the block size, this will obviously assume a _very_ large stream and read over the bounds. The patch adds a check here as follows: // If the prefix is smaller than twice the block size, we would below read a // very large chunk of the file (and very likely read over the bounds) assert(prefix.size() >= 2 * blockSize); if (prefix.size() < 2 * blockSize) { return Status::Corruption("Unable to read from file " + fname + ": read attempt would read beyond file bounds"); } so embedders can catch the error in their release builds. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5160 Differential Revision: D14834633 Pulled By: sagar0 fbshipit-source-id: 47aa39a6db8977252cede054c7eb9a663b9a3484
-
由 Siying Dong 提交于
Summary: Create new function NPHash64() and GetSliceNPHash64(), which are currently implemented using murmurhash. Replace the current direct call of murmurhash() to use the new functions if the hash results are not used in on-disk format. This will make it easier to try out or switch to alternative functions in the uses where data format compatibility doesn't need to be considered. This part shouldn't have any performance impact. Also, the sharded cache hash function is changed to the new format, because it falls into this categoery. It doesn't show visible performance impact in db_bench results. CPU showed by perf is increased from about 0.2% to 0.4% in an extreme benchmark setting (4KB blocks, no-compression, everything cached in block cache). We've known that the current hash function used, our own Hash() has serious hash quality problem. It can generate a lots of conflicts with similar input. In this use case, it means extra lock contention for reads from the same file. This slight CPU regression is worthy to me to counter the potential bad performance with hot keys. And hopefully this will get further improved in the future with a better hash function. cache_test's condition is relaxed a little bit to. The new hash is slightly more skewed in this use case, but I manually checked the data and see the hash results are still in a reasonable range. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5155 Differential Revision: D14834821 Pulled By: siying fbshipit-source-id: ec9a2c0a2f8ae4b54d08b13a5c2e9cc97aa80cb5
-
由 Yanqin Jin 提交于
Summary: remove an unnecessary function `GenerateAndAddFileIngestBehind` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5129 Differential Revision: D14686710 Pulled By: riversand963 fbshipit-source-id: 5698ae63e10f8ef76c2da753bbb07a36024ac065
-
- 06 4月, 2019 3 次提交
-
-
由 Sergei Glushchenko 提交于
Summary: Expose DB methods to lock and unlock the WAL. These methods are intended to use by MyRocks in order to obtain WAL coordinates in consistent way. Usage scenario is following: MySQL has performance_schema.log_status which provides information that enables a backup tool to copy the required log files without locking for the duration of copy. To populate this table MySQL does following: 1. Lock the binary log. Transactions are not allowed to commit now 2. Save the binary log coordinates 3. Walk through the storage engines and lock writes on each engine. For InnoDB, redo log is locked. For MyRocks, WAL should be locked. 4. Ask storage engines for their coordinates. InnoDB reports its current LSN and checkpoint LSN. MyRocks should report active WAL files names and sizes. 5. Release storage engine's locks 6. Unlock binary log Backup tool will then use this information to copy InnoDB, RocksDB and MySQL binary logs up to specified positions to end up with consistent DB state after restore. Currently, RocksDB allows to obtain the list of WAL files. Only missing bit is the method to lock the writes to WAL files. LockWAL method must flush the WAL in order for the reported size to be accurate (GetSortedWALFiles is using file system stat call to return the file size), also, since backup tool is going to copy the WAL, it is better to be flushed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5146 Differential Revision: D14815447 Pulled By: maysamyabandeh fbshipit-source-id: eec9535a6025229ed471119f19fe7b3d8ae888a3
-
由 Siying Dong 提交于
Summary: cache functions heavily use virtual functions. Add some "final" annotations to give compilers more information to optimize. The compiler doesn't seem to take advantage of it though. But it doesn't hurt. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5156 Differential Revision: D14814837 Pulled By: siying fbshipit-source-id: 4423f58eafc93f7dd3c5f04b02b5c993dba2ea94
-
由 Harry Wong 提交于
Summary: This fixed the compile error in Clang-8: ``` error: explicitly defaulted copy assignment operator is implicitly deleted [-Werror,-Wdefaulted-function-deleted] ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5095 Differential Revision: D14811961 Pulled By: riversand963 fbshipit-source-id: d935d1f85a4e8694dca10033fb5af92d8777eca0
-
- 05 4月, 2019 4 次提交
-
-
由 Levi Tamasi 提交于
Summary: The uncompression dictionary object has a Statistics pointer that might dangle if the database closed. This patch evicts the dictionary from the block cache when a table is closed, similarly to how index and filter readers are handled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5150 Differential Revision: D14782422 Pulled By: ltamasi fbshipit-source-id: 0cec9336c742c479aa92206e04521767f1aa9622
-
由 Mike Kolupaev 提交于
Summary: - Some newer methods of Env weren't wrapped in EnvWrapper. Fixed. - Added more wrapper classes similar to WritableFileWrapper: SequentialFileWrapper, RandomAccessFileWrapper, RandomRWFileWrapper, DirectoryWrapper, LoggerWrapper. - Moved the code around a bit, removed some unused friendships, added some comments. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5131 Differential Revision: D14738932 Pulled By: al13n321 fbshipit-source-id: 99a9b1af28f2c629e7b7501389fa920b5ce30218
-
由 Adam Simpkins 提交于
Summary: Annotate all of the logging functions to inform the compiler that these use printf-style formatting arguments. This allows the compiler to emit warnings if the format arguments are incorrect. This also fixes many problems reported now that format string checking is enabled. Many of these are simply mix-ups in the argument type (e.g, int vs uint64_t), but in several cases the wrong number of arguments were being passed in which can cause the code to crash. The primary motivation for this was to fix the log message in `DBImpl::SwitchMemtable()` which caused a segfault due to an extra %s format parameter with no argument supplied. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5089 Differential Revision: D14574795 Pulled By: simpkins fbshipit-source-id: 0921b03f0743652bf4ae21e414ff54b3bb65422a
-
由 datonli 提交于
#5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152) Summary: mv port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output Pull Request resolved: https://github.com/facebook/rocksdb/pull/5152 Differential Revision: D14779409 Pulled By: siying fbshipit-source-id: d4162c47c979c6e8cc6a9e601802864ab3768ecb
-
- 04 4月, 2019 3 次提交
-
-
由 Maysam Yabandeh 提交于
Summary: The ReadCallback was shared between all threads in IteratorWithLocalStatistics. A race condition was hence introduced with recent changes that changes the content of ReadCallback. The patch fixes that by using a separate callback per thread. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5149 Differential Revision: D14761612 Pulled By: maysamyabandeh fbshipit-source-id: 814a316aed046c318cb90e22379a6e32ac528949
-
由 Maysam Yabandeh 提交于
Summary: Ubsna complains that in initialization of WriteUnpreparedTxnReadCallback the method of the child class is used before the parent class is constructed. The patch fixes that by making the aforementioned method static. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5148 Differential Revision: D14760098 Pulled By: maysamyabandeh fbshipit-source-id: cf19b7c1fdb5de0a54e62c1deebe09a0fa048ded
-
由 Zhichao Cao 提交于
Summary: Although user should first call StartTrace to begin the RocksDB tracing function and call EndTrace to stop the tracing process, user can accidentally call EndTrace first. It will cause segment fault and crash the DB instance. The issue is fixed by checking the pointer first. Test case added in db_test2. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5130 Differential Revision: D14691420 Pulled By: zhichao-cao fbshipit-source-id: 3be13d2f944bc453728ef8eef67b68d7ad0939c8
-
- 03 4月, 2019 6 次提交
-
-
由 Zhongyi Xie 提交于
Summary: This PR address two open issues: 1. clang analyzer is paranoid about db_ being nullptr after DB::Open calls in the test. See https://github.com/facebook/rocksdb/pull/5043#discussion_r271394579 Add an assert to keep clang happy 2. PR https://github.com/facebook/rocksdb/pull/5049 introduced a variable shadowing: ``` db/db_iterator_test.cc: In constructor ‘rocksdb::DBIteratorWithReadCallbackTest_ReadCallback_Test::TestBody()::TestReadCallback::TestReadCallback(rocksdb::SequenceNumber)’: db/db_iterator_test.cc:2484:9: error: declaration of ‘max_visible_seq’ shadows a member of 'this' [-Werror=shadow] : ReadCallback(max_visible_seq) {} ^ ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5140 Differential Revision: D14735497 Pulled By: miasantreble fbshipit-source-id: 3219ea75cf4ae04f64d889323f6779e84be98144
-
由 Maysam Yabandeh 提交于
Summary: In prepare phase of 2PC, the db promises to remember the prepared data, for possible future commits. To fulfill the promise the prepared data must be persisted in the WAL so that they could be recovered after a crash. The log that contains a prepare batch that is not committed yet, is marked so that it is not garbage collected before the transaction commits/rollbacks. The bug was that the write to the log file and the mark of the file was not atomic, and WAL gc could have happened before the WAL log is actually marked. This patch moves the marking logic to PreReleaseCallback so that the WAL gc logic that joins both write threads would see the WAL write and WAL mark atomically. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5121 Differential Revision: D14665210 Pulled By: maysamyabandeh fbshipit-source-id: 1d66aeb1c66a296cb4899a5a20c4d40c59e4b534
-
由 Zhongyi Xie 提交于
Summary: Since we are planning to use dictionary compression and to use different compression level, it is quite useful to add compression options to TableProperties. For example, in MyRocks, if the feature is available, we can query from information_schema.rocksdb_sst_props to see if all sst files are converted to ZSTD dictionary compressions. Resolves https://github.com/facebook/rocksdb/issues/4992 With this PR, user can query table properties through `GetPropertiesOfAllTables` API and get compression options as std::string: `window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0;` or table_properties->ToString() will also contain it `# data blocks=1; # entries=13; # deletions=0; # merge operands=0; # range deletions=0; raw key size=143; raw average key size=11.000000; raw value size=39; raw average value size=3.000000; data block size=120; index block size (user-key? 0, delta-value? 0)=27; filter block size=0; (estimated) table size=147; filter policy name=N/A; prefix extractor name=nullptr; column family ID=0; column family name=default; comparator name=leveldb.BytewiseComparator; merge operator name=nullptr; property collectors names=[]; SST file compression algo=Snappy; SST file compression options=window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ; creation time=1552946632; time stamp of earliest key=1552946632;` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5081 Differential Revision: D14716692 Pulled By: miasantreble fbshipit-source-id: 7d2f2cf84e052bff876e71b4212cfdebf5be32dd
-
由 Maysam Yabandeh 提交于
Summary: WriteUnPrepared adds a virtual function, MaxUnpreparedSequenceNumber, to ReadCallback, which returns 0 unless WriteUnPrepared is enabled and the transaction has uncommitted data written to the DB. Together with snapshot sequence number, this determines the last sequence that is visible to reads. The patch clarifies the guarantees of the GetIterator API in WriteUnPrepared transactions and make use of that to statically initialize the read callback and thus avoid the virtual call. Furthermore it increases the minimum value for min_uncommitted from 0 to 1 as seq 0 is used only for last level keys that are committed in all snapshots. The following benchmark shows +0.26% higher throughput in seekrandom benchmark. Benchmark: ./db_bench --benchmarks=fillrandom --use_existing_db=0 --num=1000000 --db=/dev/shm/dbbench ./db_bench --benchmarks=seekrandom[X10] --use_existing_db=1 --db=/dev/shm/dbbench --num=1000000 --duration=60 --seek_nexts=100 seekrandom [AVG 10 runs] : 20355 ops/sec; 225.2 MB/sec seekrandom [MEDIAN 10 runs] : 20425 ops/sec; 225.9 MB/sec ./db_bench_lessvirtual3 --benchmarks=seekrandom[X10] --use_existing_db=1 --db=/dev/shm/dbbench --num=1000000 --duration=60 --seek_nexts=100 seekrandom [AVG 10 runs] : 20409 ops/sec; 225.8 MB/sec seekrandom [MEDIAN 10 runs] : 20487 ops/sec; 226.6 MB/sec Pull Request resolved: https://github.com/facebook/rocksdb/pull/5049 Differential Revision: D14366459 Pulled By: maysamyabandeh fbshipit-source-id: ebaff8908332a5ae9af7defeadabcb624be660ef
-
由 Simon Grätzer 提交于
Summary: I think when PR https://github.com/facebook/rocksdb/pull/4889 added the `IOSTATS_CPU_TIMER_GUARD` define to this header file, the noop version in the `#else` branch was forgotten. Not sure if this is common, but on my MacOS machine it breaks my build Pull Request resolved: https://github.com/facebook/rocksdb/pull/5136 Differential Revision: D14727727 Pulled By: siying fbshipit-source-id: 1076e56bdbe6ecda01d461b371dabf7f1593a149
-
由 Siying Dong 提交于
Summary: This reverts commit f29dc1b9. In BlockBasedTableIterator, index_iter_->key() is sometimes a user key, so it is wrong to call ExtractUserKey() against it. This is a bug introduced by #5101. Temporarily revert the diff to keep the branch clean. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5132 Differential Revision: D14718584 Pulled By: siying fbshipit-source-id: 0ac55dc9b5dbc18c7809092146bdf7eb9364b9ad
-
- 02 4月, 2019 2 次提交
-
-
由 xinbenlv 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5134 Differential Revision: D14719068 Pulled By: siying fbshipit-source-id: c09a544f06ff414dbe2f90792aaf2bb5b8550bee
-
由 Mike Kolupaev 提交于
Summary: Just like ReadOptions::background_purge_on_iterator_cleanup but for ColumnFamilyHandle instead of Iterator. In our use case we sometimes call ColumnFamilyHandle's destructor from low-latency threads, and sometimes it blocks the thread for a few seconds deleting the files. To avoid that, we can either offload ColumnFamilyHandle's destruction to a background thread on our side, or add this option on rocksdb side. This PR does the latter, to be consistent with how we solve exactly the same problem for iterators using background_purge_on_iterator_cleanup option. (EDIT: It's avoid_unnecessary_blocking_io now, and affects both CF drops and iterator destructors.) I'm not quite comfortable with having two separate options (background_purge_on_iterator_cleanup and background_purge_on_cf_cleanup) for such a rarely used thing. Maybe we should merge them? Rename background_purge_on_cf_cleanup to something like delete_files_on_background_threads_only or avoid_blocking_io_in_unexpected_places, and make iterators use it instead of the one in ReadOptions? I can do that here if you guys think it's better. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5043 Differential Revision: D14339233 Pulled By: al13n321 fbshipit-source-id: ccf7efa11c85c9a5b91d969bb55627d0fb01e7b8
-
- 30 3月, 2019 4 次提交
-
-
由 Remington Brasga 提交于
Summary: NewEmptyInternalIterator with arena mistakenly used EmptyIterator to allocate the size from area but then initialized it to a totally different object: EmptyInternalIterator. The patch fixes that. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4905 Differential Revision: D14689840 Pulled By: maysamyabandeh fbshipit-source-id: af64fd8ee93d5a4ad54691c792e5ecc5efabc887
-
由 Maysam Yabandeh 提交于
Summary: Compaction would depend on max_evicted_seq_ value. The ::Initialize method should do that after max_evicted_seq_ is properly initialized. The patch also back ports #4853 from WritePrepared txn to WriteUnPrepared. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5128 Differential Revision: D14686562 Pulled By: maysamyabandeh fbshipit-source-id: b2355025712a72676ac3b20a95258adcf4774490
-
由 Yi Wu 提交于
Summary: `BlockBasedTableIterator` avoid reading next block on `Next()` if it detects the iterator will be out of bound, by checking against index key. The optimization was added in #2239, and by the time it only check the bound per block. It seems later change make it a per-key check, which introduce unnecessary key comparisons. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5101 Differential Revision: D14678707 Pulled By: siying fbshipit-source-id: 2372446116753c7892ea4cec7b4b49ef87ba463e
-
由 Yanqin Jin 提交于
Summary: **This PR updates RepeatableThread::wait, breaking some tests on OS X. The rest of the PR fixes the tests on OS X.** `RepeatableThreadTest.MockEnvTest` uses `MockTimeEnv` and `RepeatableThread`. If `RepeatableThread::wait` calls `TimedWait` with a time smaller than or equal to the current (real) time, `TimedWait` returns immediately on certain platforms, e.g. OS X. #4560 addresses this issue by replacing `TimedWait` with `Wait` in test. This fixes the test but makes test/production code diverge, which is not optimal for test coverage. This PR proposes an alternative fix which unifies test and production code path for `RepeatableThread::wait`. We obtain the current (real) time in seconds and add 10 extra seconds to ensure that `RepeatableThread::wait` invokes `TimedWait` with a time greater than (real) current time. This is to prevent the `TimedWait` function from returning immediately without sleeping and releasing the mutex. If `TimedWait` returns immediately, the mutex will not be released, and `RepeatableThread::TEST_WaitForRun` never has a chance to execute the callback which, in this case, updates the result returned by `mock_env->NowMicros()`. Consequently, `RepeatableThread::wait` cannot break out of the loop, causing test to hang. The extra 10 seconds is a best-effort approach because there seems no reliable and deterministic way to provide the aforementioned guarantee. By the time `RepeatableThread::wait` is called, there is no guarantee that the `delay + mock_env->NowMicros()` will be greater than the current real time. However, 10 seconds should be sufficient in most cases. We will keep an eye for possible flakiness of this test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5107 Differential Revision: D14680885 Pulled By: riversand963 fbshipit-source-id: d1ecbe10e1dacd110bd464cd01e188bfee72b89e
-
- 29 3月, 2019 3 次提交
-
-
由 Yanqin Jin 提交于
Summary: Fix some hdfs-related code so that it can compile and run 'db_stress' Pull Request resolved: https://github.com/facebook/rocksdb/pull/5122 Differential Revision: D14675495 Pulled By: riversand963 fbshipit-source-id: cac280479efcf5451982558947eac1732e8bc45a
-
由 anand76 提交于
Summary: WAL files are currently not subject to deletion rate limiting by DeleteScheduler. If the size of the WAL files is significant, this can cause a high delete rate on SSDs that may affect other operations. To fix it, force WAL file deletions to go through the SstFileManager. Original PR for this is #2768 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5116 Differential Revision: D14669437 Pulled By: anand1976 fbshipit-source-id: c5f62d0640cebaa1574de841a1d01e4ce2faadf0
-
由 Siying Dong 提交于
Summary: Allow customized merge operator to be loaded from option file/map/string by allowing users to pre-regiester merge operators to object registry. Also update HISTORY.md and header files for the same feature for comparator. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5123 Differential Revision: D14658488 Pulled By: siying fbshipit-source-id: 86ea2fbd2a0a04632d8ea9fceaffefd041f6ae61
-