- 19 10月, 2016 1 次提交
-
-
由 Andrew Kryczka 提交于
Summary: This diff introduces RangeDelAggregator, which takes ownership of iterators provided to it via AddTombstones(). The tombstones are organized in a two-level map (snapshot stripe -> begin key -> tombstone). Tombstone creation avoids data copy by holding Slices returned by the iterator, which remain valid thanks to pinning. For compaction, we create a hierarchical range tombstone iterator with structure matching the iterator over compaction input data. An aggregator based on that iterator is used by CompactionIterator to determine which keys are covered by range tombstones. In case of merge operand, the same aggregator is used by MergeHelper. Upon finishing each file in the compaction, relevant range tombstones are added to the output file's range tombstone metablock and file boundaries are updated accordingly. To check whether a key is covered by range tombstone, RangeDelAggregator::ShouldDelete() considers tombstones in the key's snapshot stripe. When this function is used outside of compaction, it also checks newer stripes, which can contain covering tombstones. Currently the intra-stripe check involves a linear scan; however, in the future we plan to collapse ranges within a stripe such that binary search can be used. RangeDelAggregator::AddToBuilder() adds all range tombstones in the table's key-range to a new table's range tombstone meta-block. Since range tombstones may fall in the gap between files, we may need to extend some files' key-ranges. The strategy is (1) first file extends as far left as possible and other files do not extend left, (2) all files extend right until either the start of the next file or the end of the last range tombstone in the gap, whichever comes first. One other notable change is adding release/move semantics to ScopedArenaIterator such that it can be used to transfer ownership of an arena-allocated iterator, similar to how unique_ptr is used for malloc'd data. Depends on D61473 Test Plan: compaction_iterator_test, mock_table, end-to-end tests in D63927 Reviewers: sdong, IslamAbdelRahman, wanning, yhchiang, lightmark Reviewed By: lightmark Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D62205
-
- 18 10月, 2016 3 次提交
-
-
由 Siying Dong 提交于
Summary: Junit and our code generate lots of warning if "-Xcheck:jni" is on and force Travis to fail as the logs are too long. Test Plan: "make jtest" and see the warnings go away.
-
由 Aaron Gao 提交于
Summary: update new feature in history and avoid breaking mongorocks Test Plan: make check Reviewers: sdong, yiwu, andrewkr Reviewed By: andrewkr Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64611
-
由 dhruba borthakur 提交于
column family for compaction.
-
- 15 10月, 2016 4 次提交
-
-
由 Siying Dong 提交于
-
由 Dmitri Smirnov 提交于
-
由 Andrew Kryczka 提交于
Summary: Previously the WAL files that were avoided during recovery would never be considered for deletion. That was because alive_log_files_ was only populated when log files are created. This diff further populates alive_log_files_ with existing log files that aren't flushed during recovery, such that FindObsoleteFiles() can find them later. Depends on D64053. Test Plan: new unit test, verifies it fails before this change and passes after Reviewers: sdong, IslamAbdelRahman, yiwu Reviewed By: yiwu Subscribers: leveldb, dhruba, andrewkr Differential Revision: https://reviews.facebook.net/D64059
-
由 Yi Wu 提交于
Summary: Add DB::SetDBOptions to dynamic change max_background_compactions and base_background_compactions. I'll add more dynamic changeable options soon. Test Plan: unit test. Reviewers: yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64749
-
- 14 10月, 2016 7 次提交
-
-
由 Aaron Gao 提交于
Summary: fix assertion failure in db_stress. It happens because of prefix seek key is larger than merge iterator key when they have the same user key Test Plan: ./db_stress --max_background_compactions=1 --max_write_buffer_number=3 --sync=0 --reopen=20 --write_buffer_size=33554432 --delpercent=5 --log2_keys_per_lock=10 --block_size=16384 --allow_concurrent_memtable_write=0 --test_batches_snapshots=0 --max_bytes_for_level_base=67108864 --progress_reports=0 --mmap_read=0 --writepercent=35 --disable_data_sync=0 --readpercent=50 --subcompactions=4 --ops_per_thread=20000000 --memtablerep=skip_list --prefix_size=0 --target_file_size_multiplier=1 --column_families=1 --threads=32 --disable_wal=0 --open_files=500000 --destroy_db_initially=0 --target_file_size_base=16777216 --nooverwritepercent=1 --iterpercent=10 --max_key=100000000 --prefixpercent=0 --use_clock_cache=false --kill_random_test=888887 --cache_size=1048576 --verify_checksum=1 Reviewers: sdong, andrewkr, yiwu, yhchiang Reviewed By: yhchiang Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D65025
-
由 Dmitri Smirnov 提交于
-
由 Siying Dong 提交于
Summary: Some older tags don't run GCC 4.8 with FB internal setting. Fixed them and created branches. Change the format compatible script accordingly. Also add more releases to check format compatibility.
-
由 Yueh-Hsuan Chiang 提交于
Summary: Previously we have an assertion which triggers when we issue Merges after a single delete. However, merges after a single delete are unrelated to that single delete. Thus this behavior should be allowed. This will address a flakyness of db_stress. Test Plan: db_stress Reviewers: IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64923
-
由 Yueh-Hsuan Chiang 提交于
Summary: In the current implementation of RateLimiter, the difference between the configured rate and the actual rate might be more than 20%, while our test only allows 15% difference. This diff relaxes the acceptable bias RateLimiterTest::Rate test be 25% to make the test less flaky. Test Plan: rate_limiter_test Reviewers: IslamAbdelRahman, andrewkr, yiwu, lightmark, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64941
-
由 Islam AbdelRahman 提交于
Summary: Log successful AddFile Test Plan: visually check LOG file Reviewers: yiwu, andrewkr, lightmark, sdong Reviewed By: sdong Subscribers: andrewkr, jkedgar, dhruba Differential Revision: https://reviews.facebook.net/D65019
-
由 Islam AbdelRahman 提交于
Summary: Issue scenario: (1) We have 3 files in L1 and we issue a compaction that will compact them into 1 file in L2 (2) While compaction (1) is running, we flush a file into L0 and trigger another compaction that decide to move this file to L1 and then move it again to L2 (this file don't overlap with any other files) (3) compaction (1) finishes and install the file it generated in L2, but this file overlap with the file we generated in (2) so we break the LSM consistency Looks like this issue can be triggered by using non-exclusive manual compaction or AddFile() Test Plan: unit tests Reviewers: sdong Reviewed By: sdong Subscribers: hermanlee4, jkedgar, andrewkr, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D64947
-
- 13 10月, 2016 8 次提交
-
-
由 Andrew Kryczka 提交于
Summary: I accidentally left out these changes from my commit of D64053 due to messing up the merge conflict resolution. Test Plan: ./db_wal_test Reviewers: Subscribers: Tasks: Blame Revision: D64053
-
由 Andrew Kryczka 提交于
Summary: This reverts commit https://github.com/facebook/rocksdb/commit/9e4aa798c3d47c6be64324bd9d38f0813c8ead7b, which doesn't handle all cases (see inline comment). I reimplemented the logic as suggested in the initial PR: https://github.com/facebook/rocksdb/pull/1313. This approach has two benefits: - All the parsing/filtering of full_scan_candidate_files is kept together in PurgeObsoleteFiles. - We only need to check whether log file is recycled in one place where we've already determined it's a log file Test Plan: new unit test, verified fails before the original fix, still passes now. Reviewers: IslamAbdelRahman, yiwu, sdong Reviewed By: yiwu, sdong Subscribers: leveldb, dhruba, andrewkr Differential Revision: https://reviews.facebook.net/D64053
-
由 Joel Marcey 提交于
-
由 Joel Marcey 提交于
-
由 Islam AbdelRahman 提交于
Summary: Set no_proxy to fix arcanist Test Plan: will check if tests are triggered Reviewers: arahut, yiwu, lightmark, andrewkr, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D65001
-
由 Adam Retter 提交于
-
-
由 Dmitri Smirnov 提交于
-
- 12 10月, 2016 5 次提交
-
-
由 Peter (Stig) Edwards 提交于
Hello and thanks for RocksDB, When log_write_bench is run with the -bytes_per_sync option, the option does not influence any *sync* behaviour. > strace -e trace=write,sync_file_range ./log_write_bench -record_interval 0 -record_size 1048576 -num_records 11 -bytes_per_sync 2097152 2>&1 | egrep '^(sync|write.*XXXX)' write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 I suspect that this is because the bytes_per_sync option now needs to be using a `WritableFileWriter` and not a `WritableFile`. With the diff below applied, it changes to: > strace -e trace=write,sync_file_range ./log_write_bench -record_interval 0 -record_size 1048576 -num_records 11 -bytes_per_sync 2097152 2>&1 | egrep '^(sync|write.*XXXX)' write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 sync_file_range(0x3, 0, 0x200000, 0x2) = 0 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 sync_file_range(0x3, 0x200000, 0x200000, 0x2) = 0 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 sync_file_range(0x3, 0x400000, 0x200000, 0x2) = 0 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 sync_file_range(0x3, 0x600000, 0x200000, 0x2) = 0 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 sync_file_range(0x3, 0x800000, 0x200000, 0x2) = 0 ( Note that the first 1MB is not synced as mentioned in util/file_reader_writer.cc::WritableFileWriter::Flush() ) This diff also includes the fix from https://github.com/facebook/rocksdb/pull/1373 > diff -du util/log_write_bench.cc.orig util/log_write_bench.cc --- util/log_write_bench.cc.orig 2016-10-04 12:06:29.115122580 -0400 +++ util/log_write_bench.cc 2016-10-05 07:24:09.677037576 -0400 @@ -14,6 +14,7 @@ #include <gflags/gflags.h> #include "rocksdb/env.h" +#include "util/file_reader_writer.h" #include "util/histogram.h" #include "util/testharness.h" #include "util/testutil.h" @@ -38,19 +39,21 @@ env_options.bytes_per_sync = FLAGS_bytes_per_sync; unique_ptr<WritableFile> file; env->NewWritableFile(file_name, &file, env_options); + unique_ptr<WritableFileWriter> writer; + writer.reset(new WritableFileWriter(std::move(file), env_options)); std::string record; - record.assign('X', FLAGS_record_size); + record.assign(FLAGS_record_size, 'X'); HistogramImpl hist; uint64_t start_time = env->NowMicros(); for (int i = 0; i < FLAGS_num_records; i++) { uint64_t start_nanos = env->NowNanos(); - file->Append(record); - file->Flush(); + writer->Append(record); + writer->Flush(); if (FLAGS_enable_sync) { - file->Sync(); + writer->Sync(false); } hist.Add(env->NowNanos() - start_nanos);
-
由 Reid Horuff 提交于
Summary: makes Transaction::GetState() a const function. Test Plan: compiles. Reviewers: mung Reviewed By: mung Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D64929
-
由 Aaron Gao 提交于
Summary: 1) The previous solution for Prev() prefix support is not clean. Since I add api SeekForPrev(), now the Prev() can be symmetric to Next(). and we do not need SeekToLast() to be called in Prev() any more. Also, Next() will Seek(prefix_seek_key_) to solve the problem of possible inconsistency between db_iter and merge_iter when there is merge_operator. And prefix_seek_key is only refreshed when change direction to forward. 2) This diff also solves the bug of Iterator::SeekToLast() with iterate_upper_bound_ with prefix extractor. add test cases for the above two cases. There are some tests for the SeekToLast() in Prev(), I will clean them later. Test Plan: make all check Reviewers: IslamAbdelRahman, andrewkr, yiwu, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D63933
-
由 Yi Wu 提交于
Summary: Adding several missing block cache tickers. Test Plan: make all check Reviewers: IslamAbdelRahman, yhchiang, lightmark Reviewed By: lightmark Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64881
-
由 Yi Wu 提交于
Summary: A convience method to atomically get and reset ticker count. I'm wanting to use it to have a thin wrapper to the statistics object to export ticker counts to ODS for LogDevice (since they don't even use fb303). Test Plan: test in LogDevice shadow cluster. https://fburl.com/461868822 Reviewers: andrewkr, yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64869
-
- 11 10月, 2016 1 次提交
-
-
由 Bassam Tabbara 提交于
Signed-off-by: NBassam Tabbara <bassam.tabbara@quantum.com>
-
- 08 10月, 2016 8 次提交
-
-
由 Islam AbdelRahman 提交于
Summary: We always run consistency checks when compiling in debug mode allow users to set Options::force_consistency_checks to true to be able to run such checks even when compiling in release mode Test Plan: make check -j64 make release Reviewers: lightmark, sdong, yiwu Reviewed By: yiwu Subscribers: hermanlee4, andrewkr, yoshinorim, jkedgar, dhruba Differential Revision: https://reviews.facebook.net/D64701
-
由 Islam AbdelRahman 提交于
Summary: I saw this exception thrown because sometimes we may resize with -ve value if we have empty max_bytes_for_level_multiplier_additional vector Test Plan: run the tests Reviewers: yiwu Reviewed By: yiwu Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D64791
-
由 Joel Marcey 提交于
-
由 Reid Horuff 提交于
Summary: Modifies the lock info export test to test multiple column families after I was experiencing a bug while developing the MyRocks front-end for this. Test Plan: is test. Reviewers: mung Reviewed By: mung Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D64725
-
由 Islam AbdelRahman 提交于
This reverts commit ab01da54.
-
由 Adam Retter 提交于
Summary: - Deprecated RateLimiterConfig and GenericRateLimiterConfig - Introduced RateLimiter It is now possible to use all C++ related methods also in RocksJava. A noteable method is setBytesPerSecond which can change the allowed number of bytes per second at runtime. Test Plan: make rocksdbjava make jtest Reviewers: adamretter, yhchiang, ankgup87 Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D35715
-
由 Reid Horuff 提交于
Summary: This exposes a transactions state through a public api rather than through a public member variable. I also do some name refactoring. ExecutionStatus => TransactionState exec_status_ => trx_state_ Test Plan: It compiles and transaction_test passes. Reviewers: IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: andrewkr, mung, dhruba, sdong Differential Revision: https://reviews.facebook.net/D64689
-
由 Reid Horuff 提交于
Summary: When constructing a write batch a client may now call MarkWalTerminationPoint() on that batch. No batch operations after this call will be added written to the WAL but will still be inserted into the Memtable. This facility is used to remove one of the three WriteImpl calls in 2PC transactions. This produces a ~1% perf improvement. ``` RocksDB - unoptimized 2pc, sync_binlog=1, disable_2pc=off INFO 2016-08-31 14:30:38,814 [main]: REQUEST PHASE COMPLETED. 75000000 requests done in 2619 seconds. Requests/second = 28628 RocksDB - optimized 2pc , sync_binlog=1, disable_2pc=off INFO 2016-08-31 16:26:59,442 [main]: REQUEST PHASE COMPLETED. 75000000 requests done in 2581 seconds. Requests/second = 29054 ``` Test Plan: Two unit tests added. Reviewers: sdong, yiwu, IslamAbdelRahman Reviewed By: yiwu Subscribers: hermanlee4, dhruba, andrewkr Differential Revision: https://reviews.facebook.net/D64599
-
- 07 10月, 2016 3 次提交
-
-
由 Peter (Stig) Edwards 提交于
Hello and thank you for RocksDB, I noticed when using log_write_bench that writes were always 88 bytes: > strace -e trace=write ./log_write_bench -num_records 2 2>&1 | head -n 2 write(3, "\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371"..., 88) = 88 write(3, "\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371"..., 88) = 88 > strace -e trace=write ./log_write_bench -record_size 4096 -num_records 2 2>&1 | head -n 2 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 88) = 88 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 88) = 88 I think this should be: << record.assign('X', FLAGS_record_size); >> record.assign(FLAGS_record_size, 'X'); So fill and not buffer. Otherwise I always see writes of size 88 (the decimal value for chr "X"). string& assign (const char* s, size_t n); buffer - Copies the first n characters from the array of characters pointed by s. string& assign (size_t n, char c); fill - Replaces the current value by n consecutive copies of character c. perl -le 'print ord "X"' 88 With the change: > strace -e trace=write ./log_write_bench -record_size 4096 -num_records 2 2>&1 | head -n 2 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 4096) = 4096 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 4096) = 4096 > strace -e trace=write ./log_write_bench -num_records 2 2>&1 | head -n 2 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 249) = 249 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 249) = 249 Thanks. https://github.com/facebook/rocksdb/commit/01c27be5fb42524c5052b4b4a23e05501e1d1421 https://reviews.facebook.net/D16239
-
由 Sage Weil 提交于
* env_mirror: fix leak from LockFile Signed-off-by: NSage Weil <sage@redhat.com> * env_mirror: instruct EnvMirror whether mirrored Envs should be destroyed The lifecycle rules for Env are frustrating and undocumented. Notably, Env::Default() should *not* be freed, but any Env instances we created should be. Explicitly instruct EnvMirror whether to clean up child Env instances. Default to false so that we do not affect existing callers. Signed-off-by: NSage Weil <sage@redhat.com>
-
由 Igor Mihalik 提交于
Added rocksdb_options_set_memtable_prefix_bloom_size_ratio function implemented in c.cc but not exported via c.h
-