1. 14 6月, 2019 2 次提交
  2. 13 6月, 2019 1 次提交
  3. 12 6月, 2019 3 次提交
  4. 11 6月, 2019 7 次提交
    • L
      Fix DBTest.DynamicMiscOptions so it passes even with Snappy disabled (#5438) · a94aef65
      Levi Tamasi 提交于
      Summary:
      This affects our "no compression" automated tests. Since PR #5368, DBTest.DynamicMiscOptions has been failing with:
      
      db/db_test.cc:4889: Failure
      dbfull()->SetOptions({{"compression", "kSnappyCompression"}})
      Invalid argument: Compression type Snappy is not linked with the binary.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5438
      
      Differential Revision: D15752100
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 3f19eff7cafc03b333965be0203c5853d2a9cb71
      a94aef65
    • M
      Avoid deadlock between mutex_ and log_write_mutex_ (#5437) · c8c1a549
      Maysam Yabandeh 提交于
      Summary:
      To avoid deadlock mutex_ should never be acquired before log_write_mutex_. The patch documents that and also fixes one case in ::FlushWAL that acquires mutex_ through ::WriteStatusCheck when it already holds lock on log_write_mutex_.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5437
      
      Differential Revision: D15749722
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f57b69c44b4b80cc6d7ddf3d3fdf4a9eb5a5a45a
      c8c1a549
    • M
      Remove global locks from FlushScheduler (#5372) · b2584577
      Maysam Yabandeh 提交于
      Summary:
      FlushScheduler's methods are instrumented with debug-time locks to check the scheduler state against a simple container definition. Since https://github.com/facebook/rocksdb/pull/2286 the scope of such locks are widened to the entire methods' body. The result is that the concurrency tested during testing (in debug mode) is stricter than the concurrency level manifested at runtime (in release mode).
      The patch reverts this change to reduce the scope of such locks.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5372
      
      Differential Revision: D15545831
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 01d69191afb1dd807d4bdc990fc74813ae7b5426
      b2584577
    • Y
      Use CreateLoggerFromOptions function (#5427) · 641cc8d5
      Yanqin Jin 提交于
      Summary:
      Use `CreateLoggerFromOptions` function to reduce code duplication.
      
      Test plan (on my machine)
      ```
      $make clean && make -j32 db_secondary_test
      $KEEP_DB=1 ./db_secondary_test
      ```
      Verify all info logs of the secondary instance are properly logged.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5427
      
      Differential Revision: D15748922
      
      Pulled By: riversand963
      
      fbshipit-source-id: bad7261df1b8373efc504f141efc7871e375a311
      641cc8d5
    • H
      Create a BlockCacheLookupContext to enable fine-grained block cache tracing. (#5421) · 5efa0d6b
      haoyuhuang 提交于
      Summary:
      BlockCacheLookupContext only contains the caller for now.
      We will trace block accesses at five places:
      1. BlockBasedTable::GetFilter.
      2. BlockBasedTable::GetUncompressedDict.
      3. BlockBasedTable::MaybeReadAndLoadToCache. (To trace access on data, index, and range deletion block.)
      4. BlockBasedTable::Get. (To trace the referenced key and whether the referenced key exists in a fetched data block.)
      5. BlockBasedTable::MultiGet. (To trace the referenced key and whether the referenced key exists in a fetched data block.)
      
      We create the context at:
      1. BlockBasedTable::Get. (kUserGet)
      2. BlockBasedTable::MultiGet. (kUserMGet)
      3. BlockBasedTable::NewIterator. (either kUserIterator, kCompaction, or external SST ingestion calls this function.)
      4. BlockBasedTable::Open. (kPrefetch)
      5. Index/Filter::CacheDependencies. (kPrefetch)
      6. BlockBasedTable::ApproximateOffsetOf. (kCompaction or kUserApproximateSize).
      
      I loaded 1 million key-value pairs into the database and ran the readrandom benchmark with a single thread. I gave the block cache 10 GB to make sure all reads hit the block cache after warmup. The throughput is comparable.
      Throughput of this PR: 231334 ops/s.
      Throughput of the master branch: 238428 ops/s.
      
      Experiment setup:
      RocksDB:    version 6.2
      Date:       Mon Jun 10 10:42:51 2019
      CPU:        24 * Intel Core Processor (Skylake)
      CPUCache:   16384 KB
      Keys:       20 bytes each
      Values:     100 bytes each (100 bytes after compression)
      Entries:    1000000
      Prefix:    20 bytes
      Keys per prefix:    0
      RawSize:    114.4 MB (estimated)
      FileSize:   114.4 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: NoCompression
      Compression sampling rate: 0
      Memtablerep: skip_list
      Perf Level: 1
      
      Load command: ./db_bench --benchmarks="fillseq" --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000
      
      Run command: ./db_bench --benchmarks="readrandom,stats" --use_existing_db --threads=1 --duration=120 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000 --duration=120
      
      TODOs:
      1. Create a caller for external SST file ingestion and differentiate the callers for iterator.
      2. Integrate tracer to trace block cache accesses.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5421
      
      Differential Revision: D15704258
      
      Pulled By: HaoyuHuang
      
      fbshipit-source-id: 4aa8a55f8cb1576ffb367bfa3186a91d8f06d93a
      5efa0d6b
    • Y
      Improve memtable earliest seqno assignment for secondary instance (#5413) · 6ce55808
      Yanqin Jin 提交于
      Summary:
      In regular RocksDB instance, `MemTable::earliest_seqno_` is "db sequence number at the time of creation". However, we cannot use the db sequence number to set the value of `MemTable::earliest_seqno_` for secondary instance, i.e. `DBImplSecondary` due to the logic of MANIFEST and WAL replay.
      When replaying the log files of the primary, the secondary instance first replays MANIFEST and updates the db sequence number if necessary. Next, the secondary replays WAL files, creates new memtables if necessary and inserts key-value pairs into memtables. The following can occur when the db has two or more column families.
      Assume the db has column family "default" and "cf1". At a certain in time, both "default" and "cf1" have data in memtables.
      1. Primary triggers a flush and flushes "cf1". "default" is **not** flushed.
      2. Secondary replays the MANIFEST updates its db sequence number to the latest value learned from the MANIFEST.
      3. Secondary starts to replay WAL that contains the writes to "default". It is possible that the write batches' sequence numbers are smaller than the db sequence number. In this case, these write batches will be skipped, and these updates will not be visible to reader until "default" is later flushed.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5413
      
      Differential Revision: D15637407
      
      Pulled By: riversand963
      
      fbshipit-source-id: 3de3fe35cfc6f1b9f844f3f926f0df29717b6580
      6ce55808
    • M
      WritePrepared: reduce prepared_mutex_ overhead (#5420) · c292dc85
      Maysam Yabandeh 提交于
      Summary:
      The patch reduces the contention over prepared_mutex_ using these techniques:
      1) Move ::RemovePrepared() to be called from the commit callback when we have two write queues.
      2) Use two separate mutex for PreparedHeap, one prepared_mutex_ needed for ::RemovePrepared, and one ::push_pop_mutex() needed for ::AddPrepared(). Given that we call ::AddPrepared only from the first write queue and ::RemovePrepared mostly from the 2nd, this will result into each the two write queues not competing with each other over a single mutex. ::RemovePrepared might occasionally need to acquire ::push_pop_mutex() if ::erase() ends up with calling ::pop()
      3) Acquire ::push_pop_mutex() on the first callback of the write queue and release it on the last.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5420
      
      Differential Revision: D15741985
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 84ce8016007e88bb6e10da5760ba1f0d26347735
      c292dc85
  5. 08 6月, 2019 2 次提交
    • A
      Potential fix for stress test failure due to "SST file ahead of WAL" error (#5412) · b703a56e
      anand76 提交于
      Summary:
      I'm not able to prove it, but the stress test failure may be caused by the following sequence of events -
      
      1. Crash db_stress while writing the log file. This should result in a corrupted WAL.
      2. Run db_stress with recycle_log_file_num=1. Crash during recovery immediately after writing manifest and updating the current file. The old log from the previous run is left behind, but the memtable would have been flushed during recovery and the CF log number will point to the newer log
      3. Run db_stress with recycle_log_file_num=0. During recovery, the old log file will be processed and the corruption will be detected. Since the CF has moved ahead, we get the "SST file is ahead of WAL" error
      
      Test -
      1. stress_crash
      2. make check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5412
      
      Differential Revision: D15699120
      
      Pulled By: anand1976
      
      fbshipit-source-id: 9092ce81e7c4a0b4b4e66560c23ea4812a4d9cbe
      b703a56e
    • L
      Revert to checking the upper bound on a per-key basis in BlockBasedTableIterator (#5428) · 0f48e56f
      Levi Tamasi 提交于
      Summary:
      PR #5111 reduced the number of key comparisons when iterating with
      upper/lower bounds; however, this caused a regression for MyRocks.
      Reverting to the previous behavior in BlockBasedTableIterator as a hotfix.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5428
      
      Differential Revision: D15721038
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 5450106442f1763bccd17f6cfd648697f2ae8b6c
      0f48e56f
  6. 07 6月, 2019 1 次提交
  7. 06 6月, 2019 1 次提交
    • Y
      Add support for timestamp in Get/Put (#5079) · 340ed4fa
      Yanqin Jin 提交于
      Summary:
      It's useful to be able to (optionally) associate key-value pairs with user-provided timestamps. This PR is an early effort towards this goal and continues the work of facebook#4942. A suite of new unit tests exist in DBBasicTestWithTimestampWithParam. Support for timestamp requires the user to provide timestamp as a slice in `ReadOptions` and `WriteOptions`. All timestamps of the same database must share the same length, format, etc. The format of the timestamp is the same throughout the same database, and the user is responsible for providing a comparator function (Comparator) to order the <key, timestamp> tuples. Once created, the format and length of the timestamp cannot change (at least for now).
      
      Test plan (on devserver):
      ```
      $COMPILE_WITH_ASAN=1 make -j32 all
      $./db_basic_test --gtest_filter=Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/*
      $make check
      ```
      All tests must pass.
      
      We also run the following db_bench tests to verify whether there is regression on Get/Put while timestamp is not enabled.
      ```
      $TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillseq,readrandom -num=1000000
      $TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=1000000
      ```
      Repeat for 6 times for both versions.
      
      Results are as follows:
      ```
      |        | readrandom | fillrandom |
      | master | 16.77 MB/s | 47.05 MB/s |
      | PR5079 | 16.44 MB/s | 47.03 MB/s |
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5079
      
      Differential Revision: D15132946
      
      Pulled By: riversand963
      
      fbshipit-source-id: 833a0d657eac21182f0f206c910a6438154c742c
      340ed4fa
  8. 05 6月, 2019 2 次提交
  9. 04 6月, 2019 3 次提交
    • A
      Ignore shutdown error during compaction (#5400) · 5d6e8df1
      anand76 提交于
      Summary:
      The PR #5275 separated the column dropped and shutdown status codes. However, there were a couple of places in compaction where this change ended up treating a ShutdownInProgress() error as a real error and set bg_error. This caused MyRocks unit test to fail due to WAL writes during shutdown returning this error. Fix it by ignoring the shutdown status during compaction.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5400
      
      Differential Revision: D15611680
      
      Pulled By: anand1976
      
      fbshipit-source-id: c602e97840e3ae24eb420d61e0ce95d3e6258632
      5d6e8df1
    • M
      Call ValidateOptions from SetOptions (#5368) · ae05a83e
      Maysam Yabandeh 提交于
      Summary:
      Currently we validate options in DB::Open. However the validation step is missing when options are dynamically updated in ::SetOptions. The patch fixes that.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5368
      
      Differential Revision: D15540101
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: d27bbffd8f0252d1b50bcf59e0a70a278ed937f4
      ae05a83e
    • S
      Move util/trace_replay.* to trace_replay/ (#5376) · 5851cb7f
      Siying Dong 提交于
      Summary:
      util/ means for lower level libraries. trace_replay is highly integrated to DB and sometimes call DB. Move it out to a separate directory.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5376
      
      Differential Revision: D15550938
      
      Pulled By: siying
      
      fbshipit-source-id: f46dce5ceffdc05a73f26379c7bb1b79ebe6c207
      5851cb7f
  10. 01 6月, 2019 3 次提交
  11. 31 5月, 2019 12 次提交
    • Z
      move LevelCompactionPicker to a separate file (#5369) · ab8f6c01
      Zhongyi Xie 提交于
      Summary:
      In order to improve code readability, this PR moves LevelCompactionBuilder and LevelCompactionPicker to compaction_picker_level.h and .cc
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5369
      
      Differential Revision: D15540172
      
      Pulled By: miasantreble
      
      fbshipit-source-id: c1a578b93f127cd63661b53f32b356e6edd349af
      ab8f6c01
    • S
      Reorder DBImpl's private section (#5385) · ff9d2868
      Sagar Vemuri 提交于
      Summary:
      The methods and fields in the private section of DBImpl were all intermingled, making it hard to figure out where the fields/methods start and where they end. I cleaned up the code a little so that all the type declaration are at the beginning, followed by methods, and all the data fields are at the end. This follows
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5385
      
      Differential Revision: D15566978
      
      Pulled By: sagar0
      
      fbshipit-source-id: 4618a7d819ad4e2d7cc9ae1af2c59f400140bb1b
      ff9d2868
    • Y
      Fix WAL replay by skipping old write batches (#5170) · b9f59006
      Yanqin Jin 提交于
      Summary:
      1. Fix a bug in WAL replay in which write batches with old sequence numbers are mistakenly inserted into memtables.
      2. Add support for benchmarking secondary instance to db_bench_tool.
      With changes made in this PR, we can start benchmarking secondary instance
      using two processes. It is also possible to vary the frequency at which the
      secondary instance tries to catch up with the primary. The info log of the
      secondary can be found in a directory whose path can be specified with
      '-secondary_path'.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5170
      
      Differential Revision: D15564608
      
      Pulled By: riversand963
      
      fbshipit-source-id: ce97688ed3d33f69d3a0b9266ebbbbf887aa0ec8
      b9f59006
    • S
      Move some memory related files from util/ to memory/ (#5382) · 8843129e
      Siying Dong 提交于
      Summary:
      Move arena, allocator, and memory tools under util to a separate memory/ directory.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5382
      
      Differential Revision: D15564655
      
      Pulled By: siying
      
      fbshipit-source-id: 9cd6b5d0d3d52b39606e19221fa154596e5852a5
      8843129e
    • Y
      Add class-level comments to version-related classes (#5348) · f1302eba
      Yanqin Jin 提交于
      Summary:
      As title.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5348
      
      Differential Revision: D15564595
      
      Pulled By: riversand963
      
      fbshipit-source-id: dd45aa86a70e0343c2e9ef702fad165163f548e6
      f1302eba
    • S
      Fix flaky DBTest2.PresetCompressionDict test (#5378) · 1b59a490
      Sagar Vemuri 提交于
      Summary:
      Fix flaky DBTest2.PresetCompressionDict test.
      
      This PR fixes two issues with the test:
      1. Replaces `GetSstFiles` with `TotalSize`, which is based on `DB::GetColumnFamilyMetaData` so that only the size of the live SST files is taken into consideration when computing the total size of all sst files. Earlier, with `GetSstFiles`, even obsolete files were getting picked up.
      1. In ZSTD compression, it is sometimes possible that using a trained dictionary is not better than using an untrained one. Using a trained dictionary performs well in 99% of the cases, but still in the remaining ~1% of the cases (out of 10000 runs) using an untrained dictionary gets better compression results.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5378
      
      Differential Revision: D15559100
      
      Pulled By: sagar0
      
      fbshipit-source-id: c35adbf13871f520a2cec48f8bad9ff27ff7a0b4
      1b59a490
    • V
      Organizing rocksdb/table directory by format · 50e47079
      Vijay Nadimpalli 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5373
      
      Differential Revision: D15559425
      
      Pulled By: vjnadimpalli
      
      fbshipit-source-id: 5d6d6d615582bedd96a4b879bb25d429a6de8b55
      50e47079
    • S
      Fix env_options_for_read spelling in CompactionJob · e6298626
      Sagar Vemuri 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5380
      
      Differential Revision: D15563386
      
      Pulled By: sagar0
      
      fbshipit-source-id: 8b26aef47cfc40ff8016daf815582f21cdd40df2
      e6298626
    • L
      Move the index readers out of the block cache (#5298) · 1e355842
      Levi Tamasi 提交于
      Summary:
      Currently, when the block cache is used for index blocks as well, it is
      not really the index block that is stored in the cache but an
      IndexReader object. Since this object is not pure data (it has, for
      instance, pointers that might dangle), it's not really sharable. To
      avoid the issues around this, the current code uses a dummy unique cache
      key for each TableReader to store the IndexReader, and erases the
      IndexReader entry when the TableReader is closed. Instead of doing this,
      the new code moves the IndexReader out of the cache altogether. In
      particular, instead of the TableReader owning, or caching/pinning the
      IndexReader based on the customer's settings, the TableReader
      unconditionally owns the IndexReader, which in turn owns/caches/pins
      the index block (which is itself sharable and thus can be safely put in
      the cache without any hacks).
      
      Note: the change has two side effects:
      1) Partitions of partitioned indexes no longer affect the read
      amplification statistics.
      2) Eviction statistics for index blocks are temporarily broken. We plan to fix
      this in a separate phase.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5298
      
      Differential Revision: D15303203
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 935a69ba59d87d5e44f42e2310619b790c366e47
      1e355842
    • S
      Move test related files under util/ to test_util/ (#5377) · e9e0101c
      Siying Dong 提交于
      Summary:
      There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo
      ve them to a new directory test_util/, so that util/ is cleaner.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377
      
      Differential Revision: D15551366
      
      Pulled By: siying
      
      fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
      e9e0101c
    • A
      Increase Trash/DB size ratio in DBSSTTest.RateLimitedWALDelete (#5366) · a984040f
      anand76 提交于
      Summary:
      By increasing the ratio, we ensure that all files go through background deletion and eliminate flakiness due to timing of deletions.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5366
      
      Differential Revision: D15549992
      
      Pulled By: anand1976
      
      fbshipit-source-id: d137375cd791fc1a802841412755d6e2b8fd7688
      a984040f
    • Z
      Fix FIFO dynamic options sanitization (#5367) · 87fe4bca
      Zhongyi Xie 提交于
      Summary:
      When dynamically setting options, we check the option type info and skip options that are marked deprecated. However this check is only done at top level, which results in bugs where SetOptions will corrupt option values and cause unexpected system behavior iff a deprecated second level option is set dynamically.
      For exmaple, the following call:
      ```
      dbfull()->SetOptions(
          {{"compaction_options_fifo",
              "{allow_compaction=true;max_table_files_size=1024;ttl=731;}"}});
      ```
      was from pre 6.0 release when `ttl` was part of `compaction_options_fifo`. Now that it got moved out of `compaction_options_fifo`, this call will incorrectly set `compaction_options_fifo.max_table_files_size` to 731 (as `max_table_files_size` is the first one in `OptionsHelper::fifo_compaction_options_type_info` struct) and cause files to gett evicted much faster than expected.
      
      This PR adds verification to second level options like `compaction_options_fifo.ttl` or `compaction_options_fifo.max_table_files_size` when set dynamically, and filter out those marked as deprecated.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5367
      
      Differential Revision: D15530998
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 818258be5c3abe09cd82d62f3c083572d70fecdd
      87fe4bca
  12. 30 5月, 2019 1 次提交
  13. 29 5月, 2019 1 次提交
  14. 25 5月, 2019 1 次提交