1. 17 12月, 2019 2 次提交
    • Z
      db_stress: generate the key based on Zipfian distribution (hot key) (#6163) · fbda25f5
      Zhichao Cao 提交于
      Summary:
      In the current db_stress, all the keys are generated randomly and follows the uniform distribution. In order to test some corner cases that some key are always updated or read, we need to generate the key based on other distributions. In this PR, the key is generated based on Zipfian distribution and the skewness can be controlled by setting hot_key_alpha (0.8 to 1.5 is suggested). The larger hot_key_alpha is, the more skewed will be. Not that, usually, if hot_key_alpha is larger than 2, there might be only 1 or 2 keys that are generated. If hot_key_alpha is 0, it generate the key follows uniform distribution (random key)
      
      Testing plan: pass the db_stress and printed the keys to make sure it follows the distribution.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6163
      
      Differential Revision: D18978480
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: e123b4865477f7478e83fb581f9576bada334680
      fbda25f5
    • L
      Fix a data race related to memtable trimming (#6187) · db7c6875
      Levi Tamasi 提交于
      Summary:
      https://github.com/facebook/rocksdb/pull/6177 introduced a data race
      involving `MemTableList::InstallNewVersion` and `MemTableList::NumFlushed`.
      The patch fixes this by caching whether the current version has any
      memtable history (i.e. flushed memtables that are kept around for
      transaction conflict checking) in an `std::atomic<bool>` member called
      `current_has_history_`, similarly to how `current_memory_usage_excluding_last_`
      is handled.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6187
      
      Test Plan:
      ```
      make clean
      COMPILE_WITH_TSAN=1 make db_test -j24
      ./db_test
      ```
      
      Differential Revision: D19084059
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 327a5af9700fb7102baea2cc8903c085f69543b9
      db7c6875
  2. 16 12月, 2019 1 次提交
    • P
      Optimize memory and CPU for building new Bloom filter (#6175) · a92bd0a1
      Peter Dillinger 提交于
      Summary:
      The filter bits builder collects all the hashes to add in memory before adding them (because the number of keys is not known until we've walked over all the keys). Existing code uses a std::vector for this, which can mean up to 2x than necessary space allocated (and not freed) and up to ~2x write amplification in memory. Using std::deque uses close to minimal space (for large filters, the only time it matters), no write amplification, frees memory while building, and no need for large contiguous memory area. The only cost is more calls to allocator, which does not appear to matter, at least in benchmark test.
      
      For now, this change only applies to the new (format_version=5) Bloom filter implementation, to ease before-and-after comparison downstream.
      
      Temporary memory use during build is about the only way the new Bloom filter could regress vs. the old (because of upgrade to 64-bit hash) and that should only matter for full filters. This change should largely mitigate that potential regression.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6175
      
      Test Plan:
      Using filter_bench with -new_builder option and 6M keys per filter is like large full filter (improvement). 10k keys and no -new_builder is like partitioned filters (about the same). (Corresponding configurations run simultaneously on devserver.)
      
      std::vector impl (before)
      
          $ /usr/bin/time -v ./filter_bench -impl=2 -quick -new_builder -working_mem_size_mb=1000 -
          average_keys_per_filter=6000000
          Build avg ns/key: 52.2027
          Maximum resident set size (kbytes): 1105016
          $ /usr/bin/time -v ./filter_bench -impl=2 -quick -working_mem_size_mb=1000 -
          average_keys_per_filter=10000
          Build avg ns/key: 30.5694
          Maximum resident set size (kbytes): 1208152
      
      std::deque impl (after)
      
          $ /usr/bin/time -v ./filter_bench -impl=2 -quick -new_builder -working_mem_size_mb=1000 -
          average_keys_per_filter=6000000
          Build avg ns/key: 39.0697
          Maximum resident set size (kbytes): 1087196
          $ /usr/bin/time -v ./filter_bench -impl=2 -quick -working_mem_size_mb=1000 -
          average_keys_per_filter=10000
          Build avg ns/key: 30.9348
          Maximum resident set size (kbytes): 1207980
      
      Differential Revision: D19053431
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 2888e748723a19d9ea40403934f13cbb8483430c
      a92bd0a1
  3. 15 12月, 2019 2 次提交
  4. 14 12月, 2019 10 次提交
    • L
      Do not schedule memtable trimming if there is no history (#6177) · bd8404fe
      Levi Tamasi 提交于
      Summary:
      We have observed an increase in CPU load caused by frequent calls to
      `ColumnFamilyData::InstallSuperVersion` from `DBImpl::TrimMemtableHistory`
      when using `max_write_buffer_size_to_maintain` to limit the amount of
      memtable history maintained for transaction conflict checking. Part of the issue
      is that trimming can potentially be scheduled even if there is no memtable
      history. The patch adds a check that fixes this.
      
      See also https://github.com/facebook/rocksdb/pull/6169.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6177
      
      Test Plan:
      Compared `perf` output for
      
      ```
      ./db_bench -benchmarks=randomtransaction -optimistic_transaction_db=1 -statistics -stats_interval_seconds=1 -duration=90 -num=500000 --max_write_buffer_size_to_maintain=16000000 --transaction_set_snapshot=1 --threads=32
      ```
      
      before and after the change. There is a significant reduction for the call chain
      `rocksdb::DBImpl::TrimMemtableHistory` -> `rocksdb::ColumnFamilyData::InstallSuperVersion` ->
      `rocksdb::ThreadLocalPtr::StaticMeta::Scrape` even without https://github.com/facebook/rocksdb/pull/6169.
      
      Differential Revision: D19057445
      
      Pulled By: ltamasi
      
      fbshipit-source-id: dff81882d7b280e17eda7d9b072a2d4882c50f79
      bd8404fe
    • M
      CancelAllBackgroundWork before Close in db stress (#6174) · 349bd3ed
      Maysam Yabandeh 提交于
      Summary:
      Close asserts that there is no unreleased snapshots. For WritePrepared transaction, this means that the background work that holds on a snapshot must be canceled first. Update the stress tests to respect the sequence.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6174
      
      Test Plan:
      ```
      make -j32 crash_test
      
      Differential Revision: D19057322
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: c9e9e24f779bbfb0ab72c2717e34576c01bc6362
      349bd3ed
    • A
      Env should also load the native library (#6167) · edbf0e2d
      Adam Retter 提交于
      Summary:
      Closes https://github.com/facebook/rocksdb/issues/6118
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6167
      
      Differential Revision: D19053577
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 86aca9a5bec0947a641649b515da17b3cb12bdde
      edbf0e2d
    • L
      Make it possible to enable periodic compactions for BlobDB (#6172) · 0d2172f1
      Levi Tamasi 提交于
      Summary:
      Periodic compactions ensure that even SSTs that do not get picked up
      otherwise eventually go through compaction; used in conjunction with
      BlobDB's garbage collection, they enable BlobDB to reclaim space when
      old blob files are used by such straggling SSTs.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6172
      
      Test Plan: Ran `make check` and used the BlobDB mode of `db_bench`.
      
      Differential Revision: D19045045
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 04636ecc4b6cfe8d495bf656faa65d54a5eb1a93
      0d2172f1
    • A
      Introduce a new storage specific Env API (#5761) · afa2420c
      anand76 提交于
      Summary:
      The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc.
      
      This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO.
      
      The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before.
      
      This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection.
      
      The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761
      
      Differential Revision: D18868376
      
      Pulled By: anand1976
      
      fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
      afa2420c
    • P
      Add useful idioms to Random API (OneInOpt, PercentTrue) (#6154) · 58d46d19
      Peter Dillinger 提交于
      Summary:
      And clean up related code, especially in stress test.
      
      (More clean up of db_stress_test_base.cc coming after this.)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6154
      
      Test Plan: make check, make blackbox_crash_test for a bit
      
      Differential Revision: D18938180
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 524d27621b8dbb25f6dff40f1081e7c00630357e
      58d46d19
    • L
      Do not create/install new SuperVersion if nothing was deleted during memtable trim (#6169) · 6d54eb3d
      Levi Tamasi 提交于
      Summary:
      We have observed an increase in CPU load caused by frequent calls to
      `ColumnFamilyData::InstallSuperVersion` from `DBImpl::TrimMemtableHistory`
      when using `max_write_buffer_size_to_maintain` to limit the amount of
      memtable history maintained for transaction conflict checking. As it turns out,
      this is caused by the code creating and installing a new `SuperVersion` even if
      no memtables were actually trimmed. The patch adds a check to avoid this.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6169
      
      Test Plan:
      Compared `perf` output for
      
      ```
      ./db_bench -benchmarks=randomtransaction -optimistic_transaction_db=1 -statistics -stats_interval_seconds=1 -duration=90 -num=500000 --max_write_buffer_size_to_maintain=16000000 --transaction_set_snapshot=1 --threads=32
      ```
      
      before and after the change. With the fix, the call chain `rocksdb::DBImpl::TrimMemtableHistory` ->
      `rocksdb::ColumnFamilyData::InstallSuperVersion` -> `rocksdb::ThreadLocalPtr::StaticMeta::Scrape`
      no longer registers in the `perf` report.
      
      Differential Revision: D19031509
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 02686fce594e5b50eba0710e4b28a9b808c8aa20
      6d54eb3d
    • K
      cmake: do not build tests for Release build and cleanups (#5916) · ac304adf
      Kefu Chai 提交于
      Summary:
      fixes https://github.com/facebook/rocksdb/issues/2445
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5916
      
      Differential Revision: D19031236
      
      fbshipit-source-id: bc3107b6b25a01958677d7cb411b1f381aae91c6
      ac304adf
    • M
      Enable unordered_write in stress tests (#6164) · fec7302a
      Maysam Yabandeh 提交于
      Summary:
      With WritePrepared transactions configured with two_write_queues, unordered_write will offer the same guarantees as vanilla rocksdb and thus can be enabled in stress tests.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6164
      
      Test Plan:
      ```
      make -j32 crash_test_with_txn
      
      Differential Revision: D18991899
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: eece5e96b4169b67d7931e5c0afca88540a113e1
      fec7302a
    • L
      Move out valid blobs from the oldest blob files during compaction (#6121) · 583c6953
      Levi Tamasi 提交于
      Summary:
      The patch adds logic that relocates live blobs from the oldest N non-TTL
      blob files as they are encountered during compaction (assuming the BlobDB
      configuration option `enable_garbage_collection` is `true`), where N is defined
      as the number of immutable non-TTL blob files multiplied by the value of
      a new BlobDB configuration option called `garbage_collection_cutoff`.
      (The default value of this parameter is 0.25, that is, by default the valid blobs
      residing in the oldest 25% of immutable non-TTL blob files are relocated.)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6121
      
      Test Plan: Added unit test and tested using the BlobDB mode of `db_bench`.
      
      Differential Revision: D18785357
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 8c21c512a18fba777ec28765c88682bb1a5e694e
      583c6953
  5. 13 12月, 2019 9 次提交
  6. 12 12月, 2019 7 次提交
  7. 11 12月, 2019 9 次提交
    • Y
      Add SyncWAL to db_stress (#6149) · 383f5071
      Yanqin Jin 提交于
      Summary:
      Add SyncWAL to db_stress. Specify with `-sync_wal_one_in=N` so that it will be
      called once every N operations on average.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6149
      
      Test Plan:
      ```
      $make db_stress
      $./db_stress -sync_wal_one_in=100 -ops_per_thread=100000
      ```
      
      Differential Revision: D18922529
      
      Pulled By: riversand963
      
      fbshipit-source-id: 4c0b8cb8fa21852722cffd957deddf688f12ea56
      383f5071
    • S
      db_stress: sometimes call CancelAllBackgroundWork() and Close() before closing DB (#6141) · 7a99162a
      sdong 提交于
      Summary:
      CancelAllBackgroundWork() and Close() are frequently used features but we don't cover it in stress test. Simply execute them before closing the DB with 1/2 chance.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6141
      
      Test Plan: Run "db_stress".
      
      Differential Revision: D18900861
      
      fbshipit-source-id: 49b46ccfae120d0f9de3e0543b82fb6d715949d0
      7a99162a
    • A
      Add Visual Studio 2015 to AppVeyor (#5446) · 984b6e71
      Adam Retter 提交于
      Summary:
      This is required to compile on Windows with Visual Studio 2015, which is used for creating the RocksJava releases.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5446
      
      Differential Revision: D18924811
      
      fbshipit-source-id: a183a62e79a2af5aaf59cd08235458a172fe7dcb
      984b6e71
    • P
      Add PauseBackgroundWork() to db_stress (#6148) · a6538571
      Peter Dillinger 提交于
      Summary:
      Worker thread will occasionally call PauseBackgroundWork(),
      briefly sleep (to avoid stalling itself) and then call
      ContinueBackgroundWork().
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6148
      
      Test Plan:
      some running of 'make blackbox_crash_test' with temporary
      printf output to confirm code occasionally reached.
      
      Differential Revision: D18913886
      
      Pulled By: pdillinger
      
      fbshipit-source-id: ae9356a803390929f3165dfb6a00194692ba92be
      a6538571
    • A
      Add an option to the CMake build to disable building shared libraries (#6122) · 2bb5fc12
      Adam Simpkins 提交于
      Summary:
      Add an option to explicitly disable building shared versions of the
      RocksDB libraries.  The shared libraries cannot be built in cases where
      some dependencies are only available as static libraries.  This allows
      still building RocksDB in these situations.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6122
      
      Differential Revision: D18920740
      
      fbshipit-source-id: d24f66d93c68a1e65635e6e0b663bae62c903bca
      2bb5fc12
    • Y
      Use Env::GetChildren() instead of readdir (#6139) · 2b060c14
      Yanqin Jin 提交于
      Summary:
      For more portability, switch from readdir to Env::GetChildren() in ldb's
      manifest_dump subcommand.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6139
      
      Test Plan:
      ```
      $make check
      ```
      Manually check ldb command.
      
      Differential Revision: D18898197
      
      Pulled By: riversand963
      
      fbshipit-source-id: 92afca379e9fbe78ab70b2eb40d127daad8df5e2
      2b060c14
    • S
      db_stress: sometimes validate compact range data (#6140) · 14c38bac
      sdong 提交于
      Summary:
      Right now, in db_stress, compact range is simply executed without any immediate data validation. Add a simply validation which compares hash for all keys within the compact range to stay the same against the same snapshot before and after the compaction.
      
      Also, randomly tune most knobs of CompactRangeOptions.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6140
      
      Test Plan: Run db_stress with "--compact_range_one_in=2000 --compact_range_width=100000000" for a while. Manually ingest some hacky code and observe the error path.
      
      Differential Revision: D18900230
      
      fbshipit-source-id: d96e75bc8c38dd5ec702571ffe7cf5f4ea93ee10
      14c38bac
    • J
      Fix compile error "folly/xx.h file not found" on Mac OS (#6145) · 1dd3194f
      Jermy Li 提交于
      Summary:
      Error message when running `make` on Mac OS with master branch (v6.6.0):
      ```
      $ make
      $DEBUG_LEVEL is 1
      Makefile:168: Warning: Compiling in debug mode. Don't use the resulting binary in production
      third-party/folly/folly/synchronization/WaitOptions.cpp:6:10: fatal error: 'folly/synchronization/WaitOptions.h' file not found
      #include <folly/synchronization/WaitOptions.h>
               ^
      1 error generated.
      third-party/folly/folly/synchronization/ParkingLot.cpp:6:10: fatal error: 'folly/synchronization/ParkingLot.h' file not found
      #include <folly/synchronization/ParkingLot.h>
               ^
      1 error generated.
      third-party/folly/folly/synchronization/DistributedMutex.cpp:6:10: fatal error: 'folly/synchronization/DistributedMutex.h' file not found
      #include <folly/synchronization/DistributedMutex.h>
               ^
      1 error generated.
      third-party/folly/folly/synchronization/AtomicNotification.cpp:6:10: fatal error: 'folly/synchronization/AtomicNotification.h' file not found
      #include <folly/synchronization/AtomicNotification.h>
               ^
      1 error generated.
      third-party/folly/folly/detail/Futex.cpp:6:10: fatal error: 'folly/detail/Futex.h' file not found
      #include <folly/detail/Futex.h>
               ^
      1 error generated.
        GEN      util/build_version.cc
      $DEBUG_LEVEL is 1
      Makefile:168: Warning: Compiling in debug mode. Don't use the resulting binary in production
      third-party/folly/folly/synchronization/WaitOptions.cpp:6:10: fatal error: 'folly/synchronization/WaitOptions.h' file not found
      #include <folly/synchronization/WaitOptions.h>
               ^
      1 error generated.
      third-party/folly/folly/synchronization/ParkingLot.cpp:6:10: fatal error: 'folly/synchronization/ParkingLot.h' file not found
      #include <folly/synchronization/ParkingLot.h>
               ^
      1 error generated.
      third-party/folly/folly/synchronization/DistributedMutex.cpp:6:10: fatal error: 'folly/synchronization/DistributedMutex.h' file not found
      #include <folly/synchronization/DistributedMutex.h>
               ^
      1 error generated.
      third-party/folly/folly/synchronization/AtomicNotification.cpp:6:10: fatal error: 'folly/synchronization/AtomicNotification.h' file not found
      #include <folly/synchronization/AtomicNotification.h>
               ^
      1 error generated.
      third-party/folly/folly/detail/Futex.cpp:6:10: fatal error: 'folly/detail/Futex.h' file not found
      #include <folly/detail/Futex.h>
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6145
      
      Differential Revision: D18910812
      
      fbshipit-source-id: 5a4475466c2d0601657831a0b48d34316b2f0816
      1dd3194f
    • P
      Vary bloom_bits in db_crashtest (#6103) · 6380df5e
      Peter Dillinger 提交于
      Summary:
      Especially with non-integral bits/key now supported,
      db_crashtest should vary the bloom_bits configuration. The probabilities
      look like this:
      
      1/2 chance of a uniform int from 0 to 19. This includes overall 1/40
      chance of 0 which disables the bloom filter.
      
      1/2 chance of a float from a lognormal distribution with a median of 10.
      This always produces positive values but with a decent chance of < 1
      (overall ~1/40) or > 100 (overall ~1/40), the enforced/coerced
      implementation limits.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6103
      
      Test Plan:
      start 'make blackbox_crash_test' several times and look at
      configuration output
      
      Differential Revision: D18734877
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 4a38cb057d3b3fc1327f93199f65b9a9ffbd7316
      6380df5e