- 11 6月, 2019 2 次提交
-
-
由 Maysam Yabandeh 提交于
Summary: To avoid deadlock mutex_ should never be acquired before log_write_mutex_. The patch documents that and also fixes one case in ::FlushWAL that acquires mutex_ through ::WriteStatusCheck when it already holds lock on log_write_mutex_. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5437 Differential Revision: D15749722 Pulled By: maysamyabandeh fbshipit-source-id: f57b69c44b4b80cc6d7ddf3d3fdf4a9eb5a5a45a
-
由 haoyuhuang 提交于
Summary: BlockCacheLookupContext only contains the caller for now. We will trace block accesses at five places: 1. BlockBasedTable::GetFilter. 2. BlockBasedTable::GetUncompressedDict. 3. BlockBasedTable::MaybeReadAndLoadToCache. (To trace access on data, index, and range deletion block.) 4. BlockBasedTable::Get. (To trace the referenced key and whether the referenced key exists in a fetched data block.) 5. BlockBasedTable::MultiGet. (To trace the referenced key and whether the referenced key exists in a fetched data block.) We create the context at: 1. BlockBasedTable::Get. (kUserGet) 2. BlockBasedTable::MultiGet. (kUserMGet) 3. BlockBasedTable::NewIterator. (either kUserIterator, kCompaction, or external SST ingestion calls this function.) 4. BlockBasedTable::Open. (kPrefetch) 5. Index/Filter::CacheDependencies. (kPrefetch) 6. BlockBasedTable::ApproximateOffsetOf. (kCompaction or kUserApproximateSize). I loaded 1 million key-value pairs into the database and ran the readrandom benchmark with a single thread. I gave the block cache 10 GB to make sure all reads hit the block cache after warmup. The throughput is comparable. Throughput of this PR: 231334 ops/s. Throughput of the master branch: 238428 ops/s. Experiment setup: RocksDB: version 6.2 Date: Mon Jun 10 10:42:51 2019 CPU: 24 * Intel Core Processor (Skylake) CPUCache: 16384 KB Keys: 20 bytes each Values: 100 bytes each (100 bytes after compression) Entries: 1000000 Prefix: 20 bytes Keys per prefix: 0 RawSize: 114.4 MB (estimated) FileSize: 114.4 MB (estimated) Write rate: 0 bytes/second Read rate: 0 ops/second Compression: NoCompression Compression sampling rate: 0 Memtablerep: skip_list Perf Level: 1 Load command: ./db_bench --benchmarks="fillseq" --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000 Run command: ./db_bench --benchmarks="readrandom,stats" --use_existing_db --threads=1 --duration=120 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000 --duration=120 TODOs: 1. Create a caller for external SST file ingestion and differentiate the callers for iterator. 2. Integrate tracer to trace block cache accesses. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5421 Differential Revision: D15704258 Pulled By: HaoyuHuang fbshipit-source-id: 4aa8a55f8cb1576ffb367bfa3186a91d8f06d93a
-
- 07 6月, 2019 1 次提交
-
-
由 Zhongyi Xie 提交于
Summary: When using `PRIu64` type of printf specifier, current code base does the following: ``` #ifndef __STDC_FORMAT_MACROS #define __STDC_FORMAT_MACROS #endif #include <inttypes.h> ``` However, this can be simplified to ``` #include <cinttypes> ``` as long as flag `-std=c++11` is used. This should solve issues like https://github.com/facebook/rocksdb/issues/5159 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5402 Differential Revision: D15701195 Pulled By: miasantreble fbshipit-source-id: 6dac0a05f52aadb55e9728038599d3d2e4b59d03
-
- 06 6月, 2019 1 次提交
-
-
由 Yanqin Jin 提交于
Summary: It's useful to be able to (optionally) associate key-value pairs with user-provided timestamps. This PR is an early effort towards this goal and continues the work of facebook#4942. A suite of new unit tests exist in DBBasicTestWithTimestampWithParam. Support for timestamp requires the user to provide timestamp as a slice in `ReadOptions` and `WriteOptions`. All timestamps of the same database must share the same length, format, etc. The format of the timestamp is the same throughout the same database, and the user is responsible for providing a comparator function (Comparator) to order the <key, timestamp> tuples. Once created, the format and length of the timestamp cannot change (at least for now). Test plan (on devserver): ``` $COMPILE_WITH_ASAN=1 make -j32 all $./db_basic_test --gtest_filter=Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/* $make check ``` All tests must pass. We also run the following db_bench tests to verify whether there is regression on Get/Put while timestamp is not enabled. ``` $TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillseq,readrandom -num=1000000 $TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=1000000 ``` Repeat for 6 times for both versions. Results are as follows: ``` | | readrandom | fillrandom | | master | 16.77 MB/s | 47.05 MB/s | | PR5079 | 16.44 MB/s | 47.03 MB/s | ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5079 Differential Revision: D15132946 Pulled By: riversand963 fbshipit-source-id: 833a0d657eac21182f0f206c910a6438154c742c
-
- 04 6月, 2019 1 次提交
-
-
由 Maysam Yabandeh 提交于
Summary: Currently we validate options in DB::Open. However the validation step is missing when options are dynamically updated in ::SetOptions. The patch fixes that. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5368 Differential Revision: D15540101 Pulled By: maysamyabandeh fbshipit-source-id: d27bbffd8f0252d1b50bcf59e0a70a278ed937f4
-
- 01 6月, 2019 2 次提交
-
-
由 Siying Dong 提交于
Summary: Many logging related source files are under util/. It will be more structured if they are together. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5387 Differential Revision: D15579036 Pulled By: siying fbshipit-source-id: 3850134ed50b8c0bb40a0c8ae1f184fa4081303f
-
由 Vijay Nadimpalli 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5390 Differential Revision: D15579388 Pulled By: vjnadimpalli fbshipit-source-id: 5bfc95e31554b8ff05b97b76d6534113f527f366
-
- 31 5月, 2019 3 次提交
-
-
由 Siying Dong 提交于
Summary: Move arena, allocator, and memory tools under util to a separate memory/ directory. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5382 Differential Revision: D15564655 Pulled By: siying fbshipit-source-id: 9cd6b5d0d3d52b39606e19221fa154596e5852a5
-
由 Vijay Nadimpalli 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5373 Differential Revision: D15559425 Pulled By: vjnadimpalli fbshipit-source-id: 5d6d6d615582bedd96a4b879bb25d429a6de8b55
-
由 Siying Dong 提交于
Summary: There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo ve them to a new directory test_util/, so that util/ is cleaner. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377 Differential Revision: D15551366 Pulled By: siying fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
-
- 30 5月, 2019 1 次提交
-
-
由 Siying Dong 提交于
Summary: util/ means for lower level libraries, so it's a good idea to move the files which requires knowledge to DB out. Create a file/ and move some files there. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5375 Differential Revision: D15550935 Pulled By: siying fbshipit-source-id: 61a9715dcde5386eebfb43e93f847bba1ae0d3f2
-
- 02 5月, 2019 1 次提交
-
-
由 Siying Dong 提交于
Summary: Sometimes, users might make mistake of not releasing snapshots before closing the DB. This is undocumented use of RocksDB and the behavior is unknown. We return DB::Close() to provide a way to check it for the users. Aborted() will be returned to users when they call DB::Close(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/5272 Differential Revision: D15159713 Pulled By: siying fbshipit-source-id: 39369def612398d9f239d83d396b5a28e5af65cd
-
- 26 4月, 2019 1 次提交
-
-
由 Yanqin Jin 提交于
Summary: Currently one thread in RocksDB keeps a WAL file open while another thread deletes it. Although the first thread never writes to the WAL again, it still tries to close it in the end. This is fine on POSIX, but can be problematic on other platforms, e.g. HDFS, etc.. It will either cause a lot of warning messages or throw exceptions. The solution is to let the second thread close the WAL before deleting it. RocksDB keeps the writers of the logs to delete in `logs_to_free_`, which is passed to `job_context` during `FindObsoleteFiles` (holding mutex). Then in `PurgeObsoleteFiles` (without mutex), these writers should close the logs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5233 Differential Revision: D15032670 Pulled By: riversand963 fbshipit-source-id: c55e8a612db8cc2306644001a5e6d53842a8f754
-
- 25 4月, 2019 1 次提交
-
-
由 Zhongyi Xie 提交于
Summary: PR https://github.com/facebook/rocksdb/pull/4899 implemented the general framework for RocksDB secondary instances. This PR adds the support for WAL tailing in `OpenAsSecondary`, which means after the `OpenAsSecondary` call, the secondary is now able to see primary's writes that are yet to be flushed. The secondary can see primary's writes in the WAL up to the moment of `OpenAsSecondary` call starts. Differential Revision: D15059905 Pulled By: miasantreble fbshipit-source-id: 44f71f548a30b38179a7940165e138f622de1f10
-
- 16 4月, 2019 1 次提交
-
-
由 Yi Zhang 提交于
Summary: Found this when test driving the new MultiGet. If you pass unsorted result with sorted_result = false you'll trigger the ASSERT incorrect even though we'll sort down below. I've also added simple test cover sorted_result=true/false scenario copied from MultiGetSimple. anand1976 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5195 Differential Revision: D14935475 Pulled By: yizhang82 fbshipit-source-id: 1d2af5e3a003847d965066a16e3b19da68acf170
-
- 13 4月, 2019 1 次提交
-
-
由 Maysam Yabandeh 提交于
Summary: When ReadOption doesn't specify a snapshot, WritePrepared::Get used kMaxSequenceNumber to avoid the cost of creating a new snapshot object (that requires sync over db_mutex). This creates a race condition if it is reading from the writes of a transaction that had duplicate keys: each instance of duplicate key is inserted with a different sequence number and depending on the ordering the ::Get might skip the newer one and read the older one that is obsolete. The patch fixes that by using last published seq as the snapshot sequence number. It also adds a check after the read is done to ensure that the max_evicted_seq has not advanced the aforementioned seq, which is a very unlikely event. If it did, then the read is not valid since the seq is not backed by an actually snapshot to let IsInSnapshot handle that properly when an overlapping commit is evicted from commit cache. A unit test is added to reproduce the race condition with duplicate keys. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5147 Differential Revision: D14758815 Pulled By: maysamyabandeh fbshipit-source-id: a56915657132cf6ba5e3f5ea1b5d78c803407719
-
- 12 4月, 2019 1 次提交
-
-
由 anand76 提交于
Summary: This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching. Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to - 1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch() 2. Bloom filter cachelines can be prefetched, hiding the cache miss latency The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress. Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32). Batch Sizes 1 | 2 | 4 | 8 | 16 | 32 Random pattern (Stride length 0) 4.158 | 4.109 | 4.026 | 4.05 | 4.1 | 4.074 - Get 4.438 | 4.302 | 4.165 | 4.122 | 4.096 | 4.075 - MultiGet (no batching) 4.461 | 4.256 | 4.277 | 4.11 | 4.182 | 4.14 - MultiGet (w/ batching) Good locality (Stride length 16) 4.048 | 3.659 | 3.248 | 2.99 | 2.84 | 2.753 4.429 | 3.728 | 3.406 | 3.053 | 2.911 | 2.781 4.452 | 3.45 | 2.833 | 2.451 | 2.233 | 2.135 Good locality (Stride length 256) 4.066 | 3.786 | 3.581 | 3.447 | 3.415 | 3.232 4.406 | 4.005 | 3.644 | 3.49 | 3.381 | 3.268 4.393 | 3.649 | 3.186 | 2.882 | 2.676 | 2.62 Medium locality (Stride length 4096) 4.012 | 3.922 | 3.768 | 3.61 | 3.582 | 3.555 4.364 | 4.057 | 3.791 | 3.65 | 3.57 | 3.465 4.479 | 3.758 | 3.316 | 3.077 | 2.959 | 2.891 dbbench command used (on a DB with 4 levels, 12 million keys)- TEST_TMPDIR=/dev/shm numactl -C 10 ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011 Differential Revision: D14348703 Pulled By: anand1976 fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b
-
- 06 4月, 2019 1 次提交
-
-
由 Sergei Glushchenko 提交于
Summary: Expose DB methods to lock and unlock the WAL. These methods are intended to use by MyRocks in order to obtain WAL coordinates in consistent way. Usage scenario is following: MySQL has performance_schema.log_status which provides information that enables a backup tool to copy the required log files without locking for the duration of copy. To populate this table MySQL does following: 1. Lock the binary log. Transactions are not allowed to commit now 2. Save the binary log coordinates 3. Walk through the storage engines and lock writes on each engine. For InnoDB, redo log is locked. For MyRocks, WAL should be locked. 4. Ask storage engines for their coordinates. InnoDB reports its current LSN and checkpoint LSN. MyRocks should report active WAL files names and sizes. 5. Release storage engine's locks 6. Unlock binary log Backup tool will then use this information to copy InnoDB, RocksDB and MySQL binary logs up to specified positions to end up with consistent DB state after restore. Currently, RocksDB allows to obtain the list of WAL files. Only missing bit is the method to lock the writes to WAL files. LockWAL method must flush the WAL in order for the reported size to be accurate (GetSortedWALFiles is using file system stat call to return the file size), also, since backup tool is going to copy the WAL, it is better to be flushed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5146 Differential Revision: D14815447 Pulled By: maysamyabandeh fbshipit-source-id: eec9535a6025229ed471119f19fe7b3d8ae888a3
-
- 04 4月, 2019 1 次提交
-
-
由 Zhichao Cao 提交于
Summary: Although user should first call StartTrace to begin the RocksDB tracing function and call EndTrace to stop the tracing process, user can accidentally call EndTrace first. It will cause segment fault and crash the DB instance. The issue is fixed by checking the pointer first. Test case added in db_test2. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5130 Differential Revision: D14691420 Pulled By: zhichao-cao fbshipit-source-id: 3be13d2f944bc453728ef8eef67b68d7ad0939c8
-
- 03 4月, 2019 1 次提交
-
-
由 Maysam Yabandeh 提交于
Summary: WriteUnPrepared adds a virtual function, MaxUnpreparedSequenceNumber, to ReadCallback, which returns 0 unless WriteUnPrepared is enabled and the transaction has uncommitted data written to the DB. Together with snapshot sequence number, this determines the last sequence that is visible to reads. The patch clarifies the guarantees of the GetIterator API in WriteUnPrepared transactions and make use of that to statically initialize the read callback and thus avoid the virtual call. Furthermore it increases the minimum value for min_uncommitted from 0 to 1 as seq 0 is used only for last level keys that are committed in all snapshots. The following benchmark shows +0.26% higher throughput in seekrandom benchmark. Benchmark: ./db_bench --benchmarks=fillrandom --use_existing_db=0 --num=1000000 --db=/dev/shm/dbbench ./db_bench --benchmarks=seekrandom[X10] --use_existing_db=1 --db=/dev/shm/dbbench --num=1000000 --duration=60 --seek_nexts=100 seekrandom [AVG 10 runs] : 20355 ops/sec; 225.2 MB/sec seekrandom [MEDIAN 10 runs] : 20425 ops/sec; 225.9 MB/sec ./db_bench_lessvirtual3 --benchmarks=seekrandom[X10] --use_existing_db=1 --db=/dev/shm/dbbench --num=1000000 --duration=60 --seek_nexts=100 seekrandom [AVG 10 runs] : 20409 ops/sec; 225.8 MB/sec seekrandom [MEDIAN 10 runs] : 20487 ops/sec; 226.6 MB/sec Pull Request resolved: https://github.com/facebook/rocksdb/pull/5049 Differential Revision: D14366459 Pulled By: maysamyabandeh fbshipit-source-id: ebaff8908332a5ae9af7defeadabcb624be660ef
-
- 02 4月, 2019 1 次提交
-
-
由 Mike Kolupaev 提交于
Summary: Just like ReadOptions::background_purge_on_iterator_cleanup but for ColumnFamilyHandle instead of Iterator. In our use case we sometimes call ColumnFamilyHandle's destructor from low-latency threads, and sometimes it blocks the thread for a few seconds deleting the files. To avoid that, we can either offload ColumnFamilyHandle's destruction to a background thread on our side, or add this option on rocksdb side. This PR does the latter, to be consistent with how we solve exactly the same problem for iterators using background_purge_on_iterator_cleanup option. (EDIT: It's avoid_unnecessary_blocking_io now, and affects both CF drops and iterator destructors.) I'm not quite comfortable with having two separate options (background_purge_on_iterator_cleanup and background_purge_on_cf_cleanup) for such a rarely used thing. Maybe we should merge them? Rename background_purge_on_cf_cleanup to something like delete_files_on_background_threads_only or avoid_blocking_io_in_unexpected_places, and make iterators use it instead of the one in ReadOptions? I can do that here if you guys think it's better. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5043 Differential Revision: D14339233 Pulled By: al13n321 fbshipit-source-id: ccf7efa11c85c9a5b91d969bb55627d0fb01e7b8
-
- 29 3月, 2019 1 次提交
-
-
由 anand76 提交于
Summary: WAL files are currently not subject to deletion rate limiting by DeleteScheduler. If the size of the WAL files is significant, this can cause a high delete rate on SSDs that may affect other operations. To fix it, force WAL file deletions to go through the SstFileManager. Original PR for this is #2768 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5116 Differential Revision: D14669437 Pulled By: anand1976 fbshipit-source-id: c5f62d0640cebaa1574de841a1d01e4ce2faadf0
-
- 28 3月, 2019 1 次提交
-
-
由 Siying Dong 提交于
Summary: Following files were run through automatic formatter: db/db_impl.cc db/db_impl.h db/db_impl_compaction_flush.cc db/db_impl_debug.cc db/db_impl_files.cc db/db_impl_readonly.h db/db_impl_write.cc db/dbformat.cc db/dbformat.h table/block.cc table/block.h table/block_based_filter_block.cc table/block_based_filter_block.h table/block_based_filter_block_test.cc table/block_based_table_builder.cc table/block_based_table_reader.cc table/block_based_table_reader.h table/block_builder.cc table/block_builder.h table/block_fetcher.cc table/block_prefix_index.cc table/block_prefix_index.h table/block_test.cc table/format.cc table/format.h I could easily run all the files, but I don't want people to feel that I'm doing it for lines of code changes :) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5114 Differential Revision: D14633040 Pulled By: siying fbshipit-source-id: 3f346cb53bf21e8c10704400da548dfce1e89a52
-
- 27 3月, 2019 1 次提交
-
-
由 Yanqin Jin 提交于
Summary: This PR allows RocksDB to run in single-primary, multi-secondary process mode. The writer is a regular RocksDB (e.g. an `DBImpl`) instance playing the role of a primary. Multiple `DBImplSecondary` processes (secondaries) share the same set of SST files, MANIFEST, WAL files with the primary. Secondaries tail the MANIFEST of the primary and apply updates to their own in-memory state of the file system, e.g. `VersionStorageInfo`. This PR has several components: 1. (Originally in #4745). Add a `PathNotFound` subcode to `IOError` to denote the failure when a secondary tries to open a file which has been deleted by the primary. 2. (Similar to #4602). Add `FragmentBufferedReader` to handle partially-read, trailing record at the end of a log from where future read can continue. 3. (Originally in #4710 and #4820). Add implementation of the secondary, i.e. `DBImplSecondary`. 3.1 Tail the primary's MANIFEST during recovery. 3.2 Tail the primary's MANIFEST during normal processing by calling `ReadAndApply`. 3.3 Tailing WAL will be in a future PR. 4. Add an example in 'examples/multi_processes_example.cc' to demonstrate the usage of secondary RocksDB instance in a multi-process setting. Instructions to run the example can be found at the beginning of the source code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4899 Differential Revision: D14510945 Pulled By: riversand963 fbshipit-source-id: 4ac1c5693e6012ad23f7b4b42d3c374fecbe8886
-
- 26 3月, 2019 1 次提交
-
-
由 Siying Dong 提交于
Summary: With https://github.com/facebook/rocksdb/pull/3009 we go through every CF to check whether a bottommost compaction is needed to be triggered. This is done within DB mutex. What we do within DB mutex may heavily influece the write throughput we can achieve, so we always want to minimize work there. Here we try to avoid this for-loop by first check a global threshold. In most of the time, the CF loop can be avoided. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5090 Differential Revision: D14582684 Pulled By: siying fbshipit-source-id: 968f6d9bb6affe1a5ebc4910b418300b076f166f
-
- 01 3月, 2019 1 次提交
-
-
由 Siying Dong 提交于
Summary: Statistics cost too much CPU for some use cases. Add two stats levels so that people can choose to skip two types of expensive stats, timers and histograms. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5027 Differential Revision: D14252765 Pulled By: siying fbshipit-source-id: 75ecec9eaa44c06118229df4f80c366115346592
-
- 21 2月, 2019 1 次提交
-
-
由 Zhongyi Xie 提交于
Summary: This PR adds public `GetStatsHistory` API to retrieve stats history in the form of an std map. The key of the map is the timestamp in microseconds when the stats snapshot is taken, the value is another std map from stats name to stats value (stored in std string). Two DBOptions are introduced: `stats_persist_period_sec` (default 10 minutes) controls the intervals between two snapshots are taken; `max_stats_history_count` (default 10) controls the max number of history snapshots to keep in memory. RocksDB will stop collecting stats snapshots if `stats_persist_period_sec` is set to 0. (This PR is the in-memory part of https://github.com/facebook/rocksdb/pull/4535) Pull Request resolved: https://github.com/facebook/rocksdb/pull/4748 Differential Revision: D13961471 Pulled By: miasantreble fbshipit-source-id: ac836d401ecb84ea92216bf9966f969dedf4ad04
-
- 14 2月, 2019 1 次提交
-
-
由 Yanqin Jin 提交于
Summary: if an operation just involves a single column family, then we do not have to set the kInAtomicGroup tag when writing to MANIFEST. This change can fix a compatibility test failure, i.e. 5.15 and earlier cannot recognize kInAtomicGroup tag. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4981 Differential Revision: D14072687 Pulled By: riversand963 fbshipit-source-id: 46b0c61e399f16c6b7169de0b33430d0ed90d6d4
-
- 13 2月, 2019 2 次提交
-
-
由 Yanqin Jin 提交于
Summary: Make file ingestion atomic. as title. Ingesting external SST files into multiple column families should be atomic. If a crash occurs and db reopens, either all column families have successfully ingested the files before the crash, or non of the ingestions have any effect on the state of the db. Also add unit tests for atomic ingestion. Note that the unit test here does not cover the case of incomplete atomic group in the MANIFEST, which is covered in VersionSetTest already. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4895 Differential Revision: D13718245 Pulled By: riversand963 fbshipit-source-id: 7df97cc483af73ad44dd6993008f99b083852198
-
由 Siying Dong 提交于
Summary: Previously, stats were logged in warning level. This was done in that way because people reported that it wasn't logged in MyRocks. However, later we learned that it turns out to be due to a bug in MyRocks, which is fixed in https://github.com/facebook/mysql-5.6/commit/79bb705e74b239d7030b724ea6bbd635eceec531 Now we revert the stats logging to INFO level, so that it doesn't pollute the warning level logging. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4977 Differential Revision: D14058485 Pulled By: siying fbshipit-source-id: 19fab323c19d9bc88184287f209551f9a77ca0e6
-
- 06 2月, 2019 1 次提交
-
-
由 Siying Dong 提交于
Summary: In NotFound cases, stats BYTES_READ and perf_context.get_read_bytes is still be increased. The amount increased will be whatever size of the string or PinnableSlice that users passed in as the output data structure. This is wrong. Fix this by not increasing these two counters. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4938 Differential Revision: D13908963 Pulled By: siying fbshipit-source-id: 60bce42e4fbb9862bba3da36dbc27b2963ea6162
-
- 16 1月, 2019 1 次提交
-
-
由 Yi Wu 提交于
Summary: With WritePrepared transaction, flush/compaction can contain uncommitted keys, and those keys can get committed during compaction. If a snapshot is taken before the key is committed, it should not see the key. On the other hand, compaction grab the list of snapshots at its beginning, and only consider those snapshots to dedup keys. Consider the case: ``` seq = 1: put "foo" = "bar" seq = 2: transaction T: delete "foo", prepare seq = 3: compaction start seq = 4: take snapshot S seq = 5: transaction T: commit. ... seq = N: compaction iterator reached key "foo". ``` When compaction start, the list of snapshot is empty. Compaction doesn't take snapshot S into account. When it reached "foo", transaction T is committed. Compaction may think the value "foo=bar" is not visible by any snapshot (which is wrong), and compact the value out. The fix is to explicitly take a snapshot before compaction grabbing the list of snapshots. Compaction will then has to keep keys visible to this snapshot. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4883 Differential Revision: D13668775 Pulled By: maysamyabandeh fbshipit-source-id: 1cab9615f94b7d3e8522cc3d44c3a14c7d4720e4
-
- 04 1月, 2019 1 次提交
-
-
由 Yanqin Jin 提交于
Summary: as titled. Since different bg flush threads can flush different sets of column families (due to column family creation and drop), we decide not to let one thread perform atomic flush result installation for other threads. Bg flush threads will install their atomic flush results sequentially to MANIFEST, using a conditional variable, i.e. atomic_flush_install_cv_ to coordinate. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4791 Differential Revision: D13498930 Pulled By: riversand963 fbshipit-source-id: dd7482fc41f4bd22dad1e1ef7d4764ef424688d7
-
- 03 1月, 2019 3 次提交
-
-
由 Anand Ananthabhotla 提交于
Summary: Avoid locking the DB mutex in order to reference SuperVersions. Instead, we get the thread local cached SuperVersion for each column family in the list. It depends on finding a sequence number that overlaps with all the open memtables. We start with the latest published sequence number, and if any of the memtables is sealed before we can get all the SuperVersions, the process is repeated. After a few times, give up and lock the DB mutex. Tests: 1. Unit tests 2. make check 3. db_bench - TEST_TMPDIR=/dev/shm ./db_bench -use_existing_db=true -benchmarks=readrandom -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=5000000 -reads=1000000 -threads=32 -compression_type=none -cache_size=1048576000 -batch_size=1 -bloom_bits=1 readrandom : 0.167 micros/op 5983920 ops/sec; 426.2 MB/s (1000000 of 1000000 found) Multireadrandom with batch size 1: multireadrandom : 0.176 micros/op 5684033 ops/sec; (1000000 of 1000000 found) Pull Request resolved: https://github.com/facebook/rocksdb/pull/4754 Differential Revision: D13363550 Pulled By: anand1976 fbshipit-source-id: 6243e8de7dbd9c8bb490a8eca385da0c855b1dd4
-
由 Faustin Lammler 提交于
Summary: Hi, Lintian, the Debian package checker complains about spelling error (spelling-error-in-binary). See https://salsa.debian.org/mariadb-team/mariadb-10.3/-/jobs/98380 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4827 Differential Revision: D13566362 Pulled By: riversand963 fbshipit-source-id: cd4e9212133c73b0591030de6cdedaa47575968d
-
由 Yanqin Jin 提交于
Summary: The `flush_reason` parameter in `DBImpl::InstallSuperVersionAndScheduleWork` is not used. Remove it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4816 Differential Revision: D13543218 Pulled By: riversand963 fbshipit-source-id: 8fc75d49462ce092e85aef0fe0c50936140db153
-
- 21 12月, 2018 1 次提交
-
-
由 Siying Dong 提交于
Summary: Introduce the first CPU timing counter, perf_context.get_cpu_nanos. This opens a door to more CPU counters in the future. Only Posix Env has it implemented using clock_gettime() with CLOCK_THREAD_CPUTIME_ID. How accurate the counter is depends on the platform. Make PerfStepTimer to take an Env as an argument, and sometimes pass it in. The direct reason is to make the unit tests to use SpecialEnv where we can ingest logic there. But in long term, this is a good change. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4741 Differential Revision: D13287798 Pulled By: siying fbshipit-source-id: 090361049d9d5095d1d1a369fe1338d2e2e1c73f
-
- 20 12月, 2018 1 次提交
-
-
由 Jakub Tomanik 提交于
Summary: This PR contains the following fixes: 1. Fixing Makefile to support non-default locations of developer tools 2. Fixing compile error using a patch from https://github.com/facebook/rocksdb/pull/4007 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4687 Differential Revision: D13287263 Pulled By: riversand963 fbshipit-source-id: 4525eb42ba7b6f82af5f9bfb8e52fa4024e27ccc
-
- 18 12月, 2018 1 次提交
-
-
由 Abhishek Madan 提交于
Summary: Now that v2 is fully functional, the v1 aggregator is removed. The v2 aggregator has been renamed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4778 Differential Revision: D13495930 Pulled By: abhimadan fbshipit-source-id: 9d69500a60a283e79b6c4fa938fc68a8aa4d40d6
-
- 14 12月, 2018 1 次提交
-
-
由 Maysam Yabandeh 提交于
Summary: options_file_number_ must be written under db::mutex_ sine its read is protected by mutex_ in ::GetLiveFiles(). However currently it is written in ::RenameTempFileToOptionsFile() which according to its contract must be called without holding db::mutex_. The patch fixes the race condition by also acquitting the mutex_ before writing options_file_number_. Also it does that only if the rename of option file is successful. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4780 Differential Revision: D13461411 Pulled By: maysamyabandeh fbshipit-source-id: 2d5bae96a1f3e969ef2505b737cf2d7ae749787b
-