1. 16 7月, 2022 6 次提交
  2. 15 7月, 2022 4 次提交
    • S
      DB::PutEntity() shouldn't be defined as =0 (#10364) · 00e68e7a
      sdong 提交于
      Summary:
      DB::PutEntity() is defined as 0, but it is actually implemented in db/db_impl/db_impl_write.cc. It is incorrect, and might cause problems when users implement class DB themselves.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10364
      
      Test Plan: See existing tests pass
      
      Reviewed By: riversand963
      
      Differential Revision: D37874886
      
      fbshipit-source-id: b81713ddb707720b52d57a15de56a59414c24f66
      00e68e7a
    • J
      Add seqno to time mapping (#10338) · a3acf2ef
      Jay Zhuang 提交于
      Summary:
      Which will be used for tiered storage to preclude hot data from
      compacting to the cold tier (the last level).
      Internally, adding seqno to time mapping. A periodic_task is scheduled
      to record the current_seqno -> current_time in certain cadence. When
      memtable flush, the mapping informaiton is stored in sstable property.
      During compaction, the mapping information are merged and get the
      approximate time of sequence number, which is used to determine if a key
      is recently inserted or not and preclude it from the last level if it's
      recently inserted (within the `preclude_last_level_data_seconds`).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10338
      
      Test Plan: CI
      
      Reviewed By: siying
      
      Differential Revision: D37810187
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 6953be7a18a99de8b1cb3b162d712f79c2b4899f
      a3acf2ef
    • S
      Fix HISTORY.md for misplaced items (#10362) · 66685d6a
      Siying Dong 提交于
      Summary:
      Some items are misplaced to 7.4 but they are unreleased.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10362
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D37859426
      
      fbshipit-source-id: e2ad099227309ed2e0f3ca450a9a43986d681c7c
      66685d6a
    • S
      Make InternalKeyComparator not configurable (#10342) · c8b20d46
      sdong 提交于
      Summary:
      InternalKeyComparator is an internal class which is a simple wrapper of Comparator. https://github.com/facebook/rocksdb/pull/8336 made Comparator customizeable. As a side effect, internal key comparator was made configurable too. This introduces overhead to this simple wrapper. For example, every InternalKeyComparator will have an std::vector attached to it, which consumes memory and possible allocation overhead too.
      We remove InternalKeyComparator from being customizable by making InternalKeyComparator not a subclass of Comparator.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10342
      
      Test Plan: Run existing CI tests and make sure it doesn't fail
      
      Reviewed By: riversand963
      
      Differential Revision: D37771351
      
      fbshipit-source-id: 917256ee04b2796ed82974549c734fb6c4d8ccee
      c8b20d46
  3. 14 7月, 2022 4 次提交
  4. 13 7月, 2022 4 次提交
    • G
      Temporarily return a LRUCache from NewClockCache (#10351) · 9645e66f
      Guido Tagliavini Ponce 提交于
      Summary:
      ClockCache is still in experimental stage, and currently fails some pre-release fbcode tests. See https://www.internalfb.com/diff/D37772011. API calls to construct ClockCache are done via the function NewClockCache. For now, NewClockCache calls will return an LRUCache (with appropriate arguments), which is stable.
      
      The idea that NewClockCache returns nullptr was also floated, but this would be interpreted as unsupported cache, and a default LRUCache would be constructed instead, potentially causing a performance regression that is harder to identify.
      
      A new version of the NewClockCache function was created for our internal tests.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10351
      
      Test Plan: ``make -j24 check`` and re-run the pre-release tests.
      
      Reviewed By: pdillinger
      
      Differential Revision: D37802685
      
      Pulled By: guidotag
      
      fbshipit-source-id: 0a8d10612ff21e576f7360cb13e20bc36e244972
      9645e66f
    • Y
      Stop tracking syncing live WAL for performance (#10330) · b283f041
      Yanqin Jin 提交于
      Summary:
      With https://github.com/facebook/rocksdb/issues/10087, applications calling `SyncWAL()` or writing with `WriteOptions::sync=true` can suffer
      from performance regression. This PR reverts to original behavior of tracking the syncing of closed WALs.
      After we revert back to old behavior, recovery, whether kPointInTime or kAbsoluteConsistency, may fail to
      detect corruption in synced WALs if the corruption is in the live WAL.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10330
      
      Test Plan:
      make check
      
      Before https://github.com/facebook/rocksdb/issues/10087
      ```bash
      fillsync     :     750.269 micros/op 1332 ops/sec 75.027 seconds 100000 operations;    0.1 MB/s (100 ops)
      fillsync     :     776.492 micros/op 1287 ops/sec 77.649 seconds 100000 operations;    0.1 MB/s (100 ops)
      fillsync [AVG 2 runs] : 1310 (± 44) ops/sec;    0.1 (± 0.0) MB/sec
      fillsync     :     805.625 micros/op 1241 ops/sec 80.563 seconds 100000 operations;    0.1 MB/s (100 ops)
      fillsync [AVG 3 runs] : 1287 (± 51) ops/sec;    0.1 (± 0.0) MB/sec
      fillsync [AVG    3 runs] : 1287 (± 51) ops/sec;    0.1 (± 0.0) MB/sec
      fillsync [MEDIAN 3 runs] : 1287 ops/sec;    0.1 MB/sec
      ```
      
      Before this PR and after https://github.com/facebook/rocksdb/issues/10087
      ```bash
      fillsync     :    1479.601 micros/op 675 ops/sec 147.960 seconds 100000 operations;    0.1 MB/s (100 ops)
      fillsync     :    1626.080 micros/op 614 ops/sec 162.608 seconds 100000 operations;    0.1 MB/s (100 ops)
      fillsync [AVG 2 runs] : 645 (± 59) ops/sec;    0.1 (± 0.0) MB/sec
      fillsync     :    1588.402 micros/op 629 ops/sec 158.840 seconds 100000 operations;    0.1 MB/s (100 ops)
      fillsync [AVG 3 runs] : 640 (± 35) ops/sec;    0.1 (± 0.0) MB/sec
      fillsync [AVG    3 runs] : 640 (± 35) ops/sec;    0.1 (± 0.0) MB/sec
      fillsync [MEDIAN 3 runs] : 629 ops/sec;    0.1 MB/sec
      ```
      
      After this PR
      ```bash
      fillsync     :     749.621 micros/op 1334 ops/sec 74.962 seconds 100000 operations;    0.1 MB/s (100 ops)
      fillsync     :     865.577 micros/op 1155 ops/sec 86.558 seconds 100000 operations;    0.1 MB/s (100 ops)
      fillsync [AVG 2 runs] : 1244 (± 175) ops/sec;    0.1 (± 0.0) MB/sec
      fillsync     :     845.837 micros/op 1182 ops/sec 84.584 seconds 100000 operations;    0.1 MB/s (100 ops)
      fillsync [AVG 3 runs] : 1223 (± 109) ops/sec;    0.1 (± 0.0) MB/sec
      fillsync [AVG    3 runs] : 1223 (± 109) ops/sec;    0.1 (± 0.0) MB/sec
      fillsync [MEDIAN 3 runs] : 1182 ops/sec;    0.1 MB/sec
      ```
      
      Reviewed By: ajkr
      
      Differential Revision: D37725212
      
      Pulled By: riversand963
      
      fbshipit-source-id: 8fa7d13b3c7662be5d56351c42caf3266af937ae
      b283f041
    • S
      Remove customized naming from InternalKeyComparator (#10343) · 769b156e
      sdong 提交于
      Summary:
      InternalKeyComparator is a thin wrapper around user comparator. Storing a string for name is relatively expensive to this small wrapper for both CPU and memory usage. Try to remove it.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10343
      
      Test Plan: Run existing tests
      
      Reviewed By: ajkr
      
      Differential Revision: D37772469
      
      fbshipit-source-id: d2d106a8d022193058fd7f6b220108e3d94aca34
      769b156e
    • Y
      Add coverage for the combination of write-prepared and WAL recycling (#10350) · 7679f22a
      Yanqin Jin 提交于
      Summary:
      as title.
      Test plan
      - make check
      - CI on PR
      - TEST_TMPDIR=/dev/shm make crash_test_with_multiops_wp_txn (tested with successful run)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10350
      
      Reviewed By: ajkr
      
      Differential Revision: D37792872
      
      Pulled By: riversand963
      
      fbshipit-source-id: ff064093b7f715d0acf387af2e3ae87b1278b52b
      7679f22a
  5. 12 7月, 2022 2 次提交
  6. 09 7月, 2022 2 次提交
  7. 08 7月, 2022 1 次提交
  8. 07 7月, 2022 8 次提交
    • G
      Eliminate the copying of blobs when serving reads from the cache (#10297) · c987eb47
      Gang Liao 提交于
      Summary:
      The blob cache enables an optimization on the read path: when a blob is found in the cache, we can avoid copying it into the buffer provided by the application. Instead, we can simply transfer ownership of the cache handle to the target `PinnableSlice`. (Note: this relies on the `Cleanable` interface, which is implemented by `PinnableSlice`.)
      
      This has the potential to save a lot of CPU, especially with large blob values.
      
      This task is a part of https://github.com/facebook/rocksdb/issues/10156
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10297
      
      Reviewed By: riversand963
      
      Differential Revision: D37640311
      
      Pulled By: gangliao
      
      fbshipit-source-id: 92de0e35cc703d06c87c5c1861cc2899ec52234a
      c987eb47
    • G
      Midpoint insertions in ClockCache (#10305) · c277aeb4
      Guido Tagliavini Ponce 提交于
      Summary:
      When an element is first inserted into the ClockCache, it is now assigned either medium or high clock priority, depending on whether its cache priority is low or high, respectively. This is a variant of LRUCache's midpoint insertions. The main difference is that LRUCache can specify the allocated capacity for high-priority elements via the ``high_pri_pool_ratio`` parameter. Contrarily, in ClockCache, low- and high-priority elements compete for all cache slots, and one group can take over the other (of course, it takes more low-priority insertions to push out high-priority elements). However, just as LRUCache, ClockCache provides the following guarantee: a high-priority element will not be evicted before a low-priority element that was inserted earlier in time.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10305
      
      Test Plan: ``make -j24 check``
      
      Reviewed By: pdillinger
      
      Differential Revision: D37607787
      
      Pulled By: guidotag
      
      fbshipit-source-id: 24d9f2523d2f4e6415e7f0029cc061fa275c2040
      c277aeb4
    • Z
      Replace the output split key with its pointer in subcompaction (#10316) · 8debfe2b
      zczhu 提交于
      Summary:
      Earlier implementation of cutting the output files with a compact cursor under Round-Robin priority uses `Valid()` to determine if the `output_split_key` is valid in `ShouldStopBefore`. This contributes to excessive CPU computation, as pointed out by [this issue](https://github.com/facebook/rocksdb/issues/10315). In this PR, we change the type of `output_split_key` to be `InternalKey*` and set it as `nullptr` if it is not going to be used in `ShouldStopBefore`, `Valid()` condition checking can be avoided using that pointer.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10316
      
      Reviewed By: ajkr
      
      Differential Revision: D37661492
      
      Pulled By: littlepig2013
      
      fbshipit-source-id: 66ff1105f3378e5573d3a126fdaff9bb23b5498f
      8debfe2b
    • P
      Have Cache use Status::MemoryLimit (#10262) · e6c5e0ab
      Peter Dillinger 提交于
      Summary:
      I noticed it would clean up some things to have Cache::Insert()
      return our MemoryLimit Status instead of Incomplete for the case in
      which the capacity limit is reached. I suspect this fixes some existing but
      unknown bugs where this Incomplete could be confused with other uses
      of Incomplete, especially no_io cases. This is the most suspicious case I
      noticed, but was not able to reproduce a bug, in part because the existing
      code is not covered by unit tests (FIXME added): https://github.com/facebook/rocksdb/blob/57adbf0e9187331cb39bf5cdb5f5d67faeee5f63/table/get_context.cc#L397
      
      I audited all the existing uses of IsIncomplete and updated those that
      seemed relevant.
      
      HISTORY updated with a clear warning to users of strict_capacity_limit=true
      to update uses of `IsIncomplete()`
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10262
      
      Test Plan: updated unit tests
      
      Reviewed By: hx235
      
      Differential Revision: D37473155
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 4bd9d9353ccddfe286b03ebd0652df8ce20f99cb
      e6c5e0ab
    • M
      Allow user to pass git command to makefile (#10318) · 071fe39c
      Manuel Ung 提交于
      Summary:
      This allows users to pass their git command with extra options if necessary.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10318
      
      Reviewed By: ajkr
      
      Differential Revision: D37661175
      
      Pulled By: lth
      
      fbshipit-source-id: 2a7cf27626c74f167471e6ec57e3870630a582b0
      071fe39c
    • A
      Provide support for direct_reads with async_io (#10197) · 2acbf386
      Akanksha Mahajan 提交于
      Summary:
      Provide support for use_direct_reads with async_io.
      
      TestPlan:
      -  Updated unit tests
      -  db_bench: Results in https://github.com/facebook/rocksdb/pull/10197#issuecomment-1159239420
      - db_stress
      ```
      export CRASH_TEST_EXT_ARGS=" --async_io=1 --use_direct_reads=1"
      make crash_test -j
      ```
      - Ran db_bench on previous RocksDB version before any async_io implementation (as there have many changes in different PRs in this area) https://github.com/facebook/rocksdb/pull/10197#issuecomment-1160781563.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10197
      
      Reviewed By: anand1976
      
      Differential Revision: D37255646
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: fec61ae15bf4d625f79dea56e4f86e0e307ba920
      2acbf386
    • M
      Set the value for --version, add --build_info (#10275) · 177b2fa3
      Mark Callaghan 提交于
      Summary:
      ./db_bench --version
      db_bench version 7.5.0
      
      ./db_bench --build_info
       (RocksDB) 7.5.0
          rocksdb_build_date: 2022-06-29 09:58:04
          rocksdb_build_git_sha: d96febee
          rocksdb_build_git_tag: print_version_githash
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10275
      
      Test Plan: run it
      
      Reviewed By: ajkr
      
      Differential Revision: D37524720
      
      Pulled By: mdcallag
      
      fbshipit-source-id: 0f6c819dbadf7b033a4a3ba2941992bb76b4ff99
      177b2fa3
    • C
      Updated NewDataBlockIterator to not fetch compression dict for non-da… (#10310) · f9cfc6a8
      Changyu Bi 提交于
      Summary:
      …ta blocks
      
      During MyShadow testing, ajkr helped me find out that with partitioned index and dictionary compression enabled, `PartitionedIndexIterator::InitPartitionedIndexBlock()` spent considerable amount of time (1-2% CPU) on fetching uncompression dictionary. Fetching uncompression dict was not needed since the index blocks were not compressed (and even if they were, they use empty dictionary). This should only affect use cases with partitioned index, dictionary compression and without uncompression dictionary pinned. This PR updates NewDataBlockIterator to not fetch uncompression dictionary when it is not for data blocks.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10310
      
      Test Plan:
      1. `make check`
      2. Perf benchmark: 1.5% (143950 -> 146176) improvement in op/sec for partitioned index + dict compression benchmark.
      For default config without partitioned index and without dict compression, there is no regression in readrandom perf from multiple runs of db_bench.
      
      ```
      # Set up for partitioned index with dictionary compression
      TEST_TMPDIR=/dev/shm ./db_bench_main -benchmarks=filluniquerandom,compact -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -partition_index=true  -compression_max_dict_bytes=16384 -compression_zstd_max_train_bytes=1638400
      
      # Pre PR
      TEST_TMPDIR=/dev/shm ./db_bench_main -use_existing_db=true -benchmarks=readrandom[-X50] -partition_index=true
      readrandom [AVG    50 runs] : 143950 (± 1108) ops/sec;   15.9 (± 0.1) MB/sec
      readrandom [MEDIAN 50 runs] : 144406 ops/sec;   16.0 MB/sec
      
      # Post PR
      TEST_TMPDIR=/dev/shm ./db_bench_opt -use_existing_db=true -benchmarks=readrandom[-X50] -partition_index=true
      readrandom [AVG    50 runs] : 146176 (± 1121) ops/sec;   16.2 (± 0.1) MB/sec
      readrandom [MEDIAN 50 runs] : 146014 ops/sec;   16.2 MB/sec
      
      # Set up for no partitioned index and no dictionary compression
      TEST_TMPDIR=/dev/shm/baseline ./db_bench_main -benchmarks=filluniquerandom,compact -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false
      # Pre PR
      TEST_TMPDIR=/dev/shm/baseline/ ./db_bench_main --use_existing_db=true "--benchmarks=readrandom[-X50]"
      readrandom [AVG    50 runs] : 158546 (± 1000) ops/sec;   17.5 (± 0.1) MB/sec
      readrandom [MEDIAN 50 runs] : 158280 ops/sec;   17.5 MB/sec
      
      # Post PR
      TEST_TMPDIR=/dev/shm/baseline/ ./db_bench_opt --use_existing_db=true "--benchmarks=readrandom[-X50]"
      readrandom [AVG    50 runs] : 161061 (± 1520) ops/sec;   17.8 (± 0.2) MB/sec
      readrandom [MEDIAN 50 runs] : 161596 ops/sec;   17.9 MB/sec
      ```
      
      Reviewed By: ajkr
      
      Differential Revision: D37631358
      
      Pulled By: cbi42
      
      fbshipit-source-id: 6ca2665e270e63871968e061ba4a99d3136785d9
      f9cfc6a8
  9. 06 7月, 2022 5 次提交
    • C
      Handoff checksum during WAL replay (#10212) · 0ff77131
      Changyu Bi 提交于
      Summary:
      Added checksum protection for write batch content read from WAL to when per key-value checksum is computed on the write batch. This gives full coverage on write batch integrity of WAL replay to memtable.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10212
      
      Test Plan:
      - Added unit test and the existing tests (replay code path covers the change in this PR): `make -j32 check`
      - Stress test: ran `db_stress` for 30min.
      - Perf regression:
      ```
      # setup
      TEST_TMPDIR=/dev/shm/100MB_WAL_DB/ ./db_bench -benchmarks=fillrandom -write_buffer_size=1048576000
      # benchmark db open time
      TEST_TMPDIR=/dev/shm/100MB_WAL_DB/ /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=overwrite -write_buffer_size=1048576000 -writes=1 -report_open_timing=true
      
      For 20 runs, pre-PR avg: 3734.31ms, post-PR avg: 3790.06 ms (~1.5% regression).
      
      Pre-PR
      OpenDb:     3714.36 milliseconds
      OpenDb:     3622.71 milliseconds
      OpenDb:     3591.17 milliseconds
      OpenDb:     3674.7 milliseconds
      OpenDb:     3615.79 milliseconds
      OpenDb:     3982.83 milliseconds
      OpenDb:     3650.6 milliseconds
      OpenDb:     3809.26 milliseconds
      OpenDb:     3576.44 milliseconds
      OpenDb:     3638.12 milliseconds
      OpenDb:     3845.68 milliseconds
      OpenDb:     3677.32 milliseconds
      OpenDb:     3659.64 milliseconds
      OpenDb:     3837.55 milliseconds
      OpenDb:     3899.64 milliseconds
      OpenDb:     3840.72 milliseconds
      OpenDb:     3802.71 milliseconds
      OpenDb:     3573.27 milliseconds
      OpenDb:     3895.76 milliseconds
      OpenDb:     3778.02 milliseconds
      
      Post-PR:
      OpenDb:     3880.46 milliseconds
      OpenDb:     3709.02 milliseconds
      OpenDb:     3954.67 milliseconds
      OpenDb:     3955.64 milliseconds
      OpenDb:     3958.64 milliseconds
      OpenDb:     3631.28 milliseconds
      OpenDb:     3721 milliseconds
      OpenDb:     3729.89 milliseconds
      OpenDb:     3730.55 milliseconds
      OpenDb:     3966.32 milliseconds
      OpenDb:     3685.54 milliseconds
      OpenDb:     3573.17 milliseconds
      OpenDb:     3703.75 milliseconds
      OpenDb:     3873.62 milliseconds
      OpenDb:     3704.4 milliseconds
      OpenDb:     3820.98 milliseconds
      OpenDb:     3721.62 milliseconds
      OpenDb:     3770.86 milliseconds
      OpenDb:     3949.78 milliseconds
      OpenDb:     3760.07 milliseconds
      ```
      
      Reviewed By: ajkr
      
      Differential Revision: D37302092
      
      Pulled By: cbi42
      
      fbshipit-source-id: 7346e625f453ce4c0e5d708776cd1fb2af6b068b
      0ff77131
    • Y
      Expand stress test coverage for user-defined timestamp (#10280) · caced09e
      Yanqin Jin 提交于
      Summary:
      Before this PR, we call `now()` to get the wall time before performing point-lookup and range
      scans when user-defined timestamp is enabled.
      
      With this PR, we expand the coverage to:
      - read with an older timestamp which is larger then the wall time when the process starts but potentially smaller than now()
      - add coverage for `ReadOptions::iter_start_ts != nullptr`
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10280
      
      Test Plan:
      ```bash
      make check
      ```
      
      Also,
      ```bash
      TEST_TMPDIR=/dev/shm/rocksdb make crash_test_with_ts
      ```
      
      So far, we have had four successful runs of the above
      
      In addition,
      ```bash
      TEST_TMPDIR=/dev/shm/rocksdb make crash_test
      ```
      Succeeded twice showing no regression.
      
      Reviewed By: ltamasi
      
      Differential Revision: D37539805
      
      Pulled By: riversand963
      
      fbshipit-source-id: f2d9887ad95245945ce17a014d55bb93f00e1cb5
      caced09e
    • M
      Add the git hash and full RocksDB version to report.tsv (#10277) · 9eced1a3
      Mark Callaghan 提交于
      Summary:
      Previously the version was displayed as $major.$minor
      This changes it to $major.$minor.$path
      
      This also adds the git hash for the time from which RocksDB was built to the end of report.tsv. I confirmed that benchmark_log_tool.py still parses it and that the people
      who consume/graph these results are OK with it.
      
      Example output:
      ops_sec	mb_sec	lsm_sz	blob_sz	c_wgb	w_amp	c_mbps	c_wsecs	c_csecs	b_rgb	b_wgb	usec_op	p50	p99	p99.9	p99.99	pmax	uptime	stall%	Nstall	u_cpu	s_cpu	rss	test	date	version	job_id	githash
      609488	244.1	1GB	0.0GB,	1.4	0.7	93.3	39	38	0	0	1.6	1.0	4	15	26	5365	15	0.0	0	0.1	0.0	0.5	fillseq.wal_disabled.v400	2022-06-29T13:36:05	7.5.0		61152544
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10277
      
      Test Plan: Run it
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D37532418
      
      Pulled By: mdcallag
      
      fbshipit-source-id: 55e472640d51265819b228d3373c9fa9b62b660d
      9eced1a3
    • S
      Try to trivial move more than one files (#10190) · a9565ccb
      sdong 提交于
      Summary:
      In leveled compaction, try to trivial move more than one files if possible, up to 4 files or max_compaction_bytes. This is to allow higher write throughput for some use cases where data is loaded in sequential order, where appying compaction results is the bottleneck.
      
      When pick up a file to compact and it doesn't have overlapping files in the next level, try to expand to the next file if there is still no overlapping.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10190
      
      Test Plan:
      Add some unit tests.
      For performance, Try to run
      ./db_bench_multi_move --benchmarks=fillseq --compression_type=lz4 --write_buffer_size=5000000 --num=100000000 --value_size=1000 -level_compaction_dynamic_level_bytes
      Together with https://github.com/facebook/rocksdb/pull/10188 , stalling will be eliminated in this benchmark.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D37230647
      
      fbshipit-source-id: 42b260f545c46abc5d90335ac2bbfcd09602b549
      a9565ccb
    • Y
      Update code comment and logging for secondary instance (#10260) · d6b9c4ae
      Yanqin Jin 提交于
      Summary:
      Before this PR, it is required that application open RocksDB secondary
      instance with `max_open_files = -1`. This is a hacky workaround that
      prevents IOErrors on the seconary instance during point-lookup or range
      scan caused by primary instance deleting the table files. This is not
      necessary if the application can coordinate the primary and secondaries
      so that primary does not delete files that are still being used by the
      secondaries. Or users can provide a custom Env/FS implementation that
      deletes the files only after all primary and secondary instances
      indicate files are obsolete and deleted.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10260
      
      Test Plan: make check
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D37462633
      
      Pulled By: riversand963
      
      fbshipit-source-id: 9c2fc939f49663efa61e3d60c8f1e01d64b9d72c
      d6b9c4ae
  10. 04 7月, 2022 1 次提交
  11. 02 7月, 2022 1 次提交
    • G
      Fix CalcHashBits (#10295) · 54f678cd
      Guido Tagliavini Ponce 提交于
      Summary:
      We fix two bugs in CalcHashBits. The first one is an off-by-one error: the desired number of table slots is the real number ``capacity / (kLoadFactor * handle_charge)``, which should not be rounded down. The second one is that we should disallow inputs that set the element charge to 0, namely ``estimated_value_size == 0 && metadata_charge_policy == kDontChargeCacheMetadata``.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10295
      
      Test Plan: CalcHashBits is tested by CalcHashBitsTest (in lru_cache_test.cc). The test now iterates over many more inputs; it covers, in particular, the rounding error edge case. Overall, the test is now more robust. Run ``make -j24 check``.
      
      Reviewed By: pdillinger
      
      Differential Revision: D37573797
      
      Pulled By: guidotag
      
      fbshipit-source-id: ea4f4439f7196ab1c1afb88f566fe92850537262
      54f678cd
  12. 01 7月, 2022 2 次提交