1. 29 3月, 2019 4 次提交
    • Y
      Fix db_stress for custom env (#5122) · d77476ef
      Yanqin Jin 提交于
      Summary:
      Fix some hdfs-related code so that it can compile and run 'db_stress'
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5122
      
      Differential Revision: D14675495
      
      Pulled By: riversand963
      
      fbshipit-source-id: cac280479efcf5451982558947eac1732e8bc45a
      d77476ef
    • A
      Smooth the deletion of WAL files (#5116) · dae3b554
      anand76 提交于
      Summary:
      WAL files are currently not subject to deletion rate limiting by DeleteScheduler. If the size of the WAL files is significant, this can cause a high delete rate on SSDs that may affect other operations. To fix it, force WAL file deletions to go through the SstFileManager. Original PR for this is #2768
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5116
      
      Differential Revision: D14669437
      
      Pulled By: anand1976
      
      fbshipit-source-id: c5f62d0640cebaa1574de841a1d01e4ce2faadf0
      dae3b554
    • S
      Option string/map can set merge operator from object registry (#5123) · a98317f5
      Siying Dong 提交于
      Summary:
      Allow customized merge operator to be loaded from option file/map/string
      by allowing users to pre-regiester merge operators to object registry.
      
      Also update HISTORY.md and header files for the same feature for comparator.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5123
      
      Differential Revision: D14658488
      
      Pulled By: siying
      
      fbshipit-source-id: 86ea2fbd2a0a04632d8ea9fceaffefd041f6ae61
      a98317f5
    • S
      Improve obsolete_files_test (#5125) · 106a94af
      Siying Dong 提交于
      Summary:
      We see a failure of obsolete_files_test but aren't able to identify
      the issue. Improve the test in following way and hope we can debug
      better next time:
      1. Place sync point before automatic compaction runs so race condition
         will always trigger.
      2. Disable sync point before test finishes.
      3. ASSERT_OK() instead of ASSERT_TRUE(status.ok())
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5125
      
      Differential Revision: D14669456
      
      Pulled By: siying
      
      fbshipit-source-id: dccb7648e334501ad651eb212880096eef1f4ab2
      106a94af
  2. 28 3月, 2019 7 次提交
  3. 27 3月, 2019 7 次提交
    • Y
      Support for single-primary, multi-secondary instances (#4899) · 9358178e
      Yanqin Jin 提交于
      Summary:
      This PR allows RocksDB to run in single-primary, multi-secondary process mode.
      The writer is a regular RocksDB (e.g. an `DBImpl`) instance playing the role of a primary.
      Multiple `DBImplSecondary` processes (secondaries) share the same set of SST files, MANIFEST, WAL files with the primary. Secondaries tail the MANIFEST of the primary and apply updates to their own in-memory state of the file system, e.g. `VersionStorageInfo`.
      
      This PR has several components:
      1. (Originally in #4745). Add a `PathNotFound` subcode to `IOError` to denote the failure when a secondary tries to open a file which has been deleted by the primary.
      
      2. (Similar to #4602). Add `FragmentBufferedReader` to handle partially-read, trailing record at the end of a log from where future read can continue.
      
      3. (Originally in #4710 and #4820). Add implementation of the secondary, i.e. `DBImplSecondary`.
      3.1 Tail the primary's MANIFEST during recovery.
      3.2 Tail the primary's MANIFEST during normal processing by calling `ReadAndApply`.
      3.3 Tailing WAL will be in a future PR.
      
      4. Add an example in 'examples/multi_processes_example.cc' to demonstrate the usage of secondary RocksDB instance in a multi-process setting. Instructions to run the example can be found at the beginning of the source code.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4899
      
      Differential Revision: D14510945
      
      Pulled By: riversand963
      
      fbshipit-source-id: 4ac1c5693e6012ad23f7b4b42d3c374fecbe8886
      9358178e
    • J
      remove bundled but unused fbson library (#5108) · 2a5463ae
      jsteemann 提交于
      Summary:
      fbson library is still included in `third-party` directory, but is not needed by RocksDB anymore.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5108
      
      Differential Revision: D14622272
      
      Pulled By: siying
      
      fbshipit-source-id: 52b24ed17d8d870a71364f85e5bac4eafb192df5
      2a5463ae
    • S
      Introduce CPU timers for iterator seek and next (#5076) · 01e6badb
      Shi Feng 提交于
      Summary:
      Introduce CPU timers for iterator seek and next operations. Seek
      counter includes SeekToFirst, SeekToLast and SeekForPrev, w/ the
      caveat that SeekToLast timer doesn't include some post processing
      time if upper bound is defined.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5076
      
      Differential Revision: D14525218
      
      Pulled By: fredfsh
      
      fbshipit-source-id: 03ba25df3b22b06c072621e4de0eacfa1445f0d9
      01e6badb
    • S
      Allow option string to get comparator from object registry (#5106) · 4774a940
      Siying Dong 提交于
      Summary:
      Even customized ldb may not be able to read data from some databases if
      comparator is not standard. We modify option helper to get comparator from
      object registry so that we can use customized ldb to read non-standard
      comparator.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5106
      
      Differential Revision: D14622107
      
      Pulled By: siying
      
      fbshipit-source-id: 151dcb295a35a4c7d54f919cd4e322a89dc601c9
      4774a940
    • S
      BlobDB::Open() should put all existing trash files to delete scheduler (#5103) · fe2bd190
      Siying Dong 提交于
      Summary:
      Right now, BlobDB::Open() fails to put all trash files to delete scheduler,
      which causes some trash files permanently untracked.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5103
      
      Differential Revision: D14606095
      
      Pulled By: siying
      
      fbshipit-source-id: 41a9437a2948abb235c0ed85f9a04612d0e50183
      fe2bd190
    • Y
      Fix SstFileReader not able to open ingested file (#5097) · 75133b1b
      Yi Wu 提交于
      Summary:
      Since `SstFileReader` don't know largest seqno of a file, it will fail this check when it open a file with global seqno: https://github.com/facebook/rocksdb/blob/ca89ac2ba997dfa0e135bd75d4ccf6f5774a7eff/table/block_based_table_reader.cc#L730
      Changes:
      * Pass largest_seqno=kMaxSequenceNumber from `SstFileReader` and allow it to bypass the above check.
      * `BlockBasedTable::VerifyChecksum` also double check if checksum will match when excluding global seqno (this is to make the new test in sst_table_reader_test pass).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5097
      
      Differential Revision: D14607434
      
      Pulled By: riversand963
      
      fbshipit-source-id: 9008599227c5fccbf9b73fee46b3bf4a1523f023
      75133b1b
    • Y
      Fix BlockBasedTableIterator construction missing index_key_is_full parameter · 7ca9eb75
      Yi Wu 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5104
      
      Differential Revision: D14619000
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: c2895794a3f31b826c149dcb698c1952dacc2332
      7ca9eb75
  4. 26 3月, 2019 3 次提交
  5. 22 3月, 2019 3 次提交
    • R
      Make it easier for users to load options from option file and set shared block cache. (#5063) · a4396f92
      Rashmi Sharma 提交于
      Summary:
      [RocksDB] Make it easier for users to load options from option file and set shared block cache.
      Right now, it requires several dynamic casting for users to set the shared block cache to their option struct cast from the option file.
      If people don't do that, every CF of every DB will generate its own 8MB block cache. It's not a usable setting. So we are dragging every user who loads options from the file into such a mess.
      Instead, we should allow them to pass their cache object to LoadLatestOptions() and LoadOptionsFromFile(), so that those loaded option structs will have the shared block cache.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5063
      
      Differential Revision: D14518584
      
      Pulled By: rashmishrm
      
      fbshipit-source-id: c91430ff9425a0e67d76fc67931d755f491ca5aa
      a4396f92
    • B
      fix NowNanos overflow (#5062) · 88d85b68
      Burton Li 提交于
      Summary:
      The original implementation of WinEnvIO::NowNanos() has a constant data overflow by:
      li.QuadPart *= std::nano::den;
      As a result, the api provides a incorrect result.
      e.g.:
      li.QuadPart=13477844301545
      std::nano::den=1e9
      
      The fix uses pre-computed nano_seconds_per_period_ to present the nano seconds per performance counter period, in the case if nano::den is divisible by perf_counter_frequency_. Otherwise it falls back to use high_resolution_clock.
      siying ajkr
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5062
      
      Differential Revision: D14426842
      
      Pulled By: anand1976
      
      fbshipit-source-id: 127f1daf423dd4b30edd0dcf8ea0466f468bec12
      88d85b68
    • M
      Reorder DBIter fields to reduce memory usage (#5078) · c84fad7a
      Maysam Yabandeh 提交于
      Summary:
      The patch reorders DBIter fields to put 1-byte fields together and let the compiler optimize the memory usage by using less 64-bit allocations for bools and enums.
      
      This might have a negative side effect of putting the variables that are accessed together into different cache lines and hence increasing the cache misses. Not sure what benchmark would verify that thought. I ran simple, single-threaded seekrandom benchmarks but the variance in the results is too much to be conclusive.
      
      ./db_bench --benchmarks=fillrandom --use_existing_db=0 --num=1000000 --db=/dev/shm/dbbench
      ./db_bench --benchmarks=seekrandom[X10] --use_existing_db=1 --db=/dev/shm/dbbench --num=1000000 --duration=60 --seek_nexts=100
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5078
      
      Differential Revision: D14562676
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 2284655d46e079b6e9a860e94be5defb6f482167
      c84fad7a
  6. 21 3月, 2019 3 次提交
  7. 20 3月, 2019 4 次提交
  8. 19 3月, 2019 2 次提交
    • S
      Feature for sampling and reporting compressibility (#4842) · b45b1cde
      Shobhit Dayal 提交于
      Summary:
      This is a feature to sample data-block compressibility and and report them as stats. 1 in N (tunable) blocks is sampled for compressibility using two algorithms:
      1. lz4 or snappy for fast compression
      2. zstd or zlib for slow but higher compression.
      
      The stats are reported to the caller as raw-bytes and compressed-bytes. The block continues to be compressed for storage using the specified CompressionType.
      
      The db_bench_tool how has a command line option for specifying the sampling rate. It's default value is 0 (no sampling). To test the overhead for a certain value, users can compare the performance of db_bench_tool, varying the sampling rate. It is unlikely to have a noticeable impact for high values like 20.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4842
      
      Differential Revision: D13629011
      
      Pulled By: shobhitdayal
      
      fbshipit-source-id: 14ca668bcab6499b2a1734edf848eb62a4f4fafa
      b45b1cde
    • H
      utilities: Fix build failure with -Werror=maybe-uninitialized (#5074) · 20d49da9
      He Zhe 提交于
      Summary:
      Initialize magic_number to zero to avoid such failure.
      utilities/blob_db/blob_log_format.cc:91:3: error: 'magic_number' may be used
      uninitialized in this function [-Werror=maybe-uninitialized]
         if (magic_number != kMagicNumber) {
         ^~
      Signed-off-by: NHe Zhe <zhe.he@windriver.com>
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5074
      
      Differential Revision: D14505514
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 4334462958c2b9c5a7c68c6ab24dadf94ad70902
      20d49da9
  9. 16 3月, 2019 2 次提交
    • A
      Update bg_error when log flush fails in SwitchMemtable() (#5072) · b4fa51df
      anand76 提交于
      Summary:
      There is a potential failure case in DBImpl::SwitchMemtable() that is not handled properly. The call to cur_log_writer->WriteBuffer() can fail due to an IO error. In that case, we need to call SetBGError() in order set the background error since the WriteBuffer() failure may result in data loss.
      
      Also, the asserts for !new_mem and !new_log are incorrect, as those would have been allocated by the time this failure is detected.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5072
      
      Differential Revision: D14461384
      
      Pulled By: anand1976
      
      fbshipit-source-id: fb59bce9d61378f37d2dfcd28c0b704b0f43c3cf
      b4fa51df
    • A
      exercise WAL recycling in crash test (#5070) · 2263f869
      Andrew Kryczka 提交于
      Summary:
      Since this feature affects the WAL behavior, it seems important our crash-recovery tests cover it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5070
      
      Differential Revision: D14470085
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 9b9682a718a926d57d055e0a5ec867efbd2eb9c1
      2263f869
  10. 15 3月, 2019 1 次提交
    • Z
      Add the -try_process_corrupted_trace option to trace_analyzer (#5067) · dcde292c
      Zhichao Cao 提交于
      Summary:
      In the current trace_analyzer implementation, once the trace file has corrupted content, which can be caused by unexpected tracing operations or other reasons, trace_analyzer will print the error and stop analyzing.
      
      By adding the -try_process_corrupted_trace option, user can try to process the corrupted trace file and get the analyzing results of the trace records from the beginning to the the first corrupted point in the trace file. Analyzing might fail even this option is enabled.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5067
      
      Differential Revision: D14433037
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: d095233ba371726869af0def0cdee23b69896831
      dcde292c
  11. 13 3月, 2019 2 次提交
  12. 09 3月, 2019 2 次提交