1. 15 6月, 2019 5 次提交
    • H
      Integrate block cache tracer in block based table reader. (#5441) · 7a8d7358
      haoyuhuang 提交于
      Summary:
      This PR integrates the block cache tracer into block based table reader. The tracer will write the block cache accesses using the trace_writer. The tracer is null in this PR so that nothing will be logged.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5441
      
      Differential Revision: D15772029
      
      Pulled By: HaoyuHuang
      
      fbshipit-source-id: a64adb92642cd23222e0ba8b10d86bf522b42f9b
      7a8d7358
    • S
      Validate CF Options when creating a new column family (#5453) · f1219644
      Sagar Vemuri 提交于
      Summary:
      It seems like CF Options are not properly validated  when creating a new column family with `CreateColumnFamily` API; only a selected few checks are done. Calling `ColumnFamilyData::ValidateOptions`, which is the single source for all CFOptions validations,  will help fix this. (`ColumnFamilyData::ValidateOptions` is already called at the time of `DB::Open`).
      
      **Test Plan:**
      Added a new test: `DBTest.CreateColumnFamilyShouldFailOnIncompatibleOptions`
      ```
      TEST_TMPDIR=/dev/shm ./db_test --gtest_filter=DBTest.CreateColumnFamilyShouldFailOnIncompatibleOptions
      ```
      Also ran gtest-parallel to make sure the new test is not flaky.
      ```
      TEST_TMPDIR=/dev/shm ~/gtest-parallel/gtest-parallel ./db_test --gtest_filter=DBTest.CreateColumnFamilyShouldFailOnIncompatibleOptions --repeat=10000
      [10000/10000] DBTest.CreateColumnFamilyShouldFailOnIncompatibleOptions (15 ms)
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5453
      
      Differential Revision: D15816851
      
      Pulled By: sagar0
      
      fbshipit-source-id: 9e702b9850f5c4a7e0ef8d39e1e6f9b81e7fe1e5
      f1219644
    • H
      fix compilation error on MSVC (#5458) · b47cfec5
      Huisheng Liu 提交于
      Summary:
      "__attribute__((__weak__))" was introduced in port\jemalloc_helper.h. It's not supported by Microsoft VS 2015, resulting in compile error. This fix adds a #if branch to work around the compile issue.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5458
      
      Differential Revision: D15827285
      
      fbshipit-source-id: 8c5f7ad31de1ac677bd96f16c4450767de834beb
      b47cfec5
    • M
      Set executeLocal on child lego jobs (#5456) · 58c78358
      Maysam Yabandeh 提交于
      Summary:
      This property is needed to run the child jobs on the same host and thus propagate the child job status back to the parent's.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5456
      
      Reviewed By: yancouto
      
      Differential Revision: D15824382
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 42f2efbedaa3a8b399281105f0ce793c1c9a6191
      58c78358
    • H
      Remove unused variable (#5457) · 89695bfb
      haoyuhuang 提交于
      Summary:
      This PR removes the unused variable that causes CLANG build to fail.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5457
      
      Differential Revision: D15825027
      
      Pulled By: HaoyuHuang
      
      fbshipit-source-id: 72c847c39ca310560efcbc5938cffa6f31164068
      89695bfb
  2. 14 6月, 2019 5 次提交
  3. 13 6月, 2019 4 次提交
  4. 12 6月, 2019 6 次提交
  5. 11 6月, 2019 8 次提交
    • L
      Fix DBTest.DynamicMiscOptions so it passes even with Snappy disabled (#5438) · a94aef65
      Levi Tamasi 提交于
      Summary:
      This affects our "no compression" automated tests. Since PR #5368, DBTest.DynamicMiscOptions has been failing with:
      
      db/db_test.cc:4889: Failure
      dbfull()->SetOptions({{"compression", "kSnappyCompression"}})
      Invalid argument: Compression type Snappy is not linked with the binary.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5438
      
      Differential Revision: D15752100
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 3f19eff7cafc03b333965be0203c5853d2a9cb71
      a94aef65
    • M
      Avoid deadlock between mutex_ and log_write_mutex_ (#5437) · c8c1a549
      Maysam Yabandeh 提交于
      Summary:
      To avoid deadlock mutex_ should never be acquired before log_write_mutex_. The patch documents that and also fixes one case in ::FlushWAL that acquires mutex_ through ::WriteStatusCheck when it already holds lock on log_write_mutex_.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5437
      
      Differential Revision: D15749722
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f57b69c44b4b80cc6d7ddf3d3fdf4a9eb5a5a45a
      c8c1a549
    • M
      Remove global locks from FlushScheduler (#5372) · b2584577
      Maysam Yabandeh 提交于
      Summary:
      FlushScheduler's methods are instrumented with debug-time locks to check the scheduler state against a simple container definition. Since https://github.com/facebook/rocksdb/pull/2286 the scope of such locks are widened to the entire methods' body. The result is that the concurrency tested during testing (in debug mode) is stricter than the concurrency level manifested at runtime (in release mode).
      The patch reverts this change to reduce the scope of such locks.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5372
      
      Differential Revision: D15545831
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 01d69191afb1dd807d4bdc990fc74813ae7b5426
      b2584577
    • Y
      Use CreateLoggerFromOptions function (#5427) · 641cc8d5
      Yanqin Jin 提交于
      Summary:
      Use `CreateLoggerFromOptions` function to reduce code duplication.
      
      Test plan (on my machine)
      ```
      $make clean && make -j32 db_secondary_test
      $KEEP_DB=1 ./db_secondary_test
      ```
      Verify all info logs of the secondary instance are properly logged.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5427
      
      Differential Revision: D15748922
      
      Pulled By: riversand963
      
      fbshipit-source-id: bad7261df1b8373efc504f141efc7871e375a311
      641cc8d5
    • H
      Create a BlockCacheLookupContext to enable fine-grained block cache tracing. (#5421) · 5efa0d6b
      haoyuhuang 提交于
      Summary:
      BlockCacheLookupContext only contains the caller for now.
      We will trace block accesses at five places:
      1. BlockBasedTable::GetFilter.
      2. BlockBasedTable::GetUncompressedDict.
      3. BlockBasedTable::MaybeReadAndLoadToCache. (To trace access on data, index, and range deletion block.)
      4. BlockBasedTable::Get. (To trace the referenced key and whether the referenced key exists in a fetched data block.)
      5. BlockBasedTable::MultiGet. (To trace the referenced key and whether the referenced key exists in a fetched data block.)
      
      We create the context at:
      1. BlockBasedTable::Get. (kUserGet)
      2. BlockBasedTable::MultiGet. (kUserMGet)
      3. BlockBasedTable::NewIterator. (either kUserIterator, kCompaction, or external SST ingestion calls this function.)
      4. BlockBasedTable::Open. (kPrefetch)
      5. Index/Filter::CacheDependencies. (kPrefetch)
      6. BlockBasedTable::ApproximateOffsetOf. (kCompaction or kUserApproximateSize).
      
      I loaded 1 million key-value pairs into the database and ran the readrandom benchmark with a single thread. I gave the block cache 10 GB to make sure all reads hit the block cache after warmup. The throughput is comparable.
      Throughput of this PR: 231334 ops/s.
      Throughput of the master branch: 238428 ops/s.
      
      Experiment setup:
      RocksDB:    version 6.2
      Date:       Mon Jun 10 10:42:51 2019
      CPU:        24 * Intel Core Processor (Skylake)
      CPUCache:   16384 KB
      Keys:       20 bytes each
      Values:     100 bytes each (100 bytes after compression)
      Entries:    1000000
      Prefix:    20 bytes
      Keys per prefix:    0
      RawSize:    114.4 MB (estimated)
      FileSize:   114.4 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: NoCompression
      Compression sampling rate: 0
      Memtablerep: skip_list
      Perf Level: 1
      
      Load command: ./db_bench --benchmarks="fillseq" --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000
      
      Run command: ./db_bench --benchmarks="readrandom,stats" --use_existing_db --threads=1 --duration=120 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000 --duration=120
      
      TODOs:
      1. Create a caller for external SST file ingestion and differentiate the callers for iterator.
      2. Integrate tracer to trace block cache accesses.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5421
      
      Differential Revision: D15704258
      
      Pulled By: HaoyuHuang
      
      fbshipit-source-id: 4aa8a55f8cb1576ffb367bfa3186a91d8f06d93a
      5efa0d6b
    • A
      Reuse data block iterator in BlockBasedTableReader::MultiGet() (#5314) · 63ace8ef
      anand76 提交于
      Summary:
      Instead of creating a new DataBlockIterator for every key in a MultiGet batch, reuse it if the next key is in the same block. This results in a small 1-2% cpu improvement.
      
      TEST_TMPDIR=/dev/shm/multiget numactl -C 10  ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4
      
      Without the change -
      multireadrandom :       3.066 micros/op 326122 ops/sec; (29375968 of 29375968 found)
      
      With the change -
      multireadrandom :       3.003 micros/op 332945 ops/sec; (29983968 of 29983968 found)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5314
      
      Differential Revision: D15742108
      
      Pulled By: anand1976
      
      fbshipit-source-id: 220fb0b8eea9a0d602ddeb371528f7af7936d771
      63ace8ef
    • Y
      Improve memtable earliest seqno assignment for secondary instance (#5413) · 6ce55808
      Yanqin Jin 提交于
      Summary:
      In regular RocksDB instance, `MemTable::earliest_seqno_` is "db sequence number at the time of creation". However, we cannot use the db sequence number to set the value of `MemTable::earliest_seqno_` for secondary instance, i.e. `DBImplSecondary` due to the logic of MANIFEST and WAL replay.
      When replaying the log files of the primary, the secondary instance first replays MANIFEST and updates the db sequence number if necessary. Next, the secondary replays WAL files, creates new memtables if necessary and inserts key-value pairs into memtables. The following can occur when the db has two or more column families.
      Assume the db has column family "default" and "cf1". At a certain in time, both "default" and "cf1" have data in memtables.
      1. Primary triggers a flush and flushes "cf1". "default" is **not** flushed.
      2. Secondary replays the MANIFEST updates its db sequence number to the latest value learned from the MANIFEST.
      3. Secondary starts to replay WAL that contains the writes to "default". It is possible that the write batches' sequence numbers are smaller than the db sequence number. In this case, these write batches will be skipped, and these updates will not be visible to reader until "default" is later flushed.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5413
      
      Differential Revision: D15637407
      
      Pulled By: riversand963
      
      fbshipit-source-id: 3de3fe35cfc6f1b9f844f3f926f0df29717b6580
      6ce55808
    • M
      WritePrepared: reduce prepared_mutex_ overhead (#5420) · c292dc85
      Maysam Yabandeh 提交于
      Summary:
      The patch reduces the contention over prepared_mutex_ using these techniques:
      1) Move ::RemovePrepared() to be called from the commit callback when we have two write queues.
      2) Use two separate mutex for PreparedHeap, one prepared_mutex_ needed for ::RemovePrepared, and one ::push_pop_mutex() needed for ::AddPrepared(). Given that we call ::AddPrepared only from the first write queue and ::RemovePrepared mostly from the 2nd, this will result into each the two write queues not competing with each other over a single mutex. ::RemovePrepared might occasionally need to acquire ::push_pop_mutex() if ::erase() ends up with calling ::pop()
      3) Acquire ::push_pop_mutex() on the first callback of the write queue and release it on the last.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5420
      
      Differential Revision: D15741985
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 84ce8016007e88bb6e10da5760ba1f0d26347735
      c292dc85
  6. 08 6月, 2019 3 次提交
    • L
      Fix build errors regarding const qualifier being ignored on cast result type (#5432) · a16d0cc4
      Levi Tamasi 提交于
      Summary:
      This affects some TSAN builds:
      
      env/env_test.cc: In member function ‘virtual void rocksdb::EnvPosixTestWithParam_MultiRead_Test::TestBody()’:
      env/env_test.cc:1126:76: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers]
             auto data = NewAligned(kSectorSize * 8, static_cast<const char>(i + 1));
                                                                                  ^
      env/env_test.cc:1154:77: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers]
             auto buf = NewAligned(kSectorSize * 8, static_cast<const char>(i*2 + 1));
                                                                                   ^
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5432
      
      Differential Revision: D15727277
      
      Pulled By: ltamasi
      
      fbshipit-source-id: dc0e687b123e7c4d703ccc0c16b7167e07d1c9b0
      a16d0cc4
    • A
      Potential fix for stress test failure due to "SST file ahead of WAL" error (#5412) · b703a56e
      anand76 提交于
      Summary:
      I'm not able to prove it, but the stress test failure may be caused by the following sequence of events -
      
      1. Crash db_stress while writing the log file. This should result in a corrupted WAL.
      2. Run db_stress with recycle_log_file_num=1. Crash during recovery immediately after writing manifest and updating the current file. The old log from the previous run is left behind, but the memtable would have been flushed during recovery and the CF log number will point to the newer log
      3. Run db_stress with recycle_log_file_num=0. During recovery, the old log file will be processed and the corruption will be detected. Since the CF has moved ahead, we get the "SST file is ahead of WAL" error
      
      Test -
      1. stress_crash
      2. make check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5412
      
      Differential Revision: D15699120
      
      Pulled By: anand1976
      
      fbshipit-source-id: 9092ce81e7c4a0b4b4e66560c23ea4812a4d9cbe
      b703a56e
    • L
      Revert to checking the upper bound on a per-key basis in BlockBasedTableIterator (#5428) · 0f48e56f
      Levi Tamasi 提交于
      Summary:
      PR #5111 reduced the number of key comparisons when iterating with
      upper/lower bounds; however, this caused a regression for MyRocks.
      Reverting to the previous behavior in BlockBasedTableIterator as a hotfix.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5428
      
      Differential Revision: D15721038
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 5450106442f1763bccd17f6cfd648697f2ae8b6c
      0f48e56f
  7. 07 6月, 2019 5 次提交
  8. 06 6月, 2019 4 次提交
    • Y
      Add support for timestamp in Get/Put (#5079) · 340ed4fa
      Yanqin Jin 提交于
      Summary:
      It's useful to be able to (optionally) associate key-value pairs with user-provided timestamps. This PR is an early effort towards this goal and continues the work of facebook#4942. A suite of new unit tests exist in DBBasicTestWithTimestampWithParam. Support for timestamp requires the user to provide timestamp as a slice in `ReadOptions` and `WriteOptions`. All timestamps of the same database must share the same length, format, etc. The format of the timestamp is the same throughout the same database, and the user is responsible for providing a comparator function (Comparator) to order the <key, timestamp> tuples. Once created, the format and length of the timestamp cannot change (at least for now).
      
      Test plan (on devserver):
      ```
      $COMPILE_WITH_ASAN=1 make -j32 all
      $./db_basic_test --gtest_filter=Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/*
      $make check
      ```
      All tests must pass.
      
      We also run the following db_bench tests to verify whether there is regression on Get/Put while timestamp is not enabled.
      ```
      $TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillseq,readrandom -num=1000000
      $TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=1000000
      ```
      Repeat for 6 times for both versions.
      
      Results are as follows:
      ```
      |        | readrandom | fillrandom |
      | master | 16.77 MB/s | 47.05 MB/s |
      | PR5079 | 16.44 MB/s | 47.03 MB/s |
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5079
      
      Differential Revision: D15132946
      
      Pulled By: riversand963
      
      fbshipit-source-id: 833a0d657eac21182f0f206c910a6438154c742c
      340ed4fa
    • Y
      Fix tsan error (#5414) · cb1bf09b
      Yanqin Jin 提交于
      Summary:
      Previous code has a warning when compile with tsan, leading to an error since we have -Werror.
      Compilation result
      ```
      In file included from ./env/env_chroot.h:12,
                       from env/env_test.cc:40:
      ./include/rocksdb/env.h: In instantiation of ‘rocksdb::Status rocksdb::DynamicLibrary::LoadFunction(const string&, std::function<T>*) [with T = void*(void*, const char*); std::__cxx11::string = std::__cxx11::basic_string<char>]’:
      env/env_test.cc:260:5:   required from here
      ./include/rocksdb/env.h:1010:17: error: cast between incompatible function types from ‘rocksdb::DynamicLibrary::FunctionPtr’ {aka ‘void* (*)()’} to ‘void* (*)(void*, const char*)’ [-Werror=cast-function-type]
           *function = reinterpret_cast<T*>(ptr);
                       ^~~~~~~~~~~~~~~~~~~~~~~~~
      cc1plus: all warnings being treated as errors
      make: *** [env/env_test.o] Error 1
      ```
      It also has another error reported by clang
      ```
      env/env_posix.cc:141:11: warning: Value stored to 'err' during its initialization is never read
          char* err = dlerror();  // Clear any old error
                ^~~   ~~~~~~~~~
      1 warning generated.
      ```
      
      Test plan (on my devserver).
      ```
      $make clean
      $OPT=-g ROCKSDB_FBCODE_BUILD_WITH_PLATFORM007=1 COMPILE_WITH_TSAN=1 make -j32
      $
      $make clean
      $USE_CLANG=1 TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make -j1 analyze
      ```
      Both should pass.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5414
      
      Differential Revision: D15637315
      
      Pulled By: riversand963
      
      fbshipit-source-id: 8e307483761019a4d5998cab92d49516d7edffbf
      cb1bf09b
    • Y
      Disable dynamic extension support by default for CMake (#5419) · 267b9b10
      Yanqin Jin 提交于
      Summary:
      We have users reporting linking error while building RocksDB using CMake, and we do not enable dynamic extension feature for them. The fix is to add `-DROCKSDB_NO_DYNAMIC_EXTENSION` to CMake by default.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5419
      
      Differential Revision: D15676792
      
      Pulled By: riversand963
      
      fbshipit-source-id: d45aaacfc64ea61646fd7329c352cd760145baf3
      267b9b10
    • A
      Add a MultiRead() method to Env (#5311) · 0153e145
      anand76 提交于
      Summary:
      Define the Env:: MultiRead() method to allow callers to request multiple block reads in one shot. The underlying Env implementation can parallelize it if it chooses to in order to reduce the overall IO latency.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5311
      
      Differential Revision: D15502172
      
      Pulled By: anand1976
      
      fbshipit-source-id: 2b228269c2e11b5f54694d6b2bb3119c8a8ce2b9
      0153e145